# Homework: Network Models

**DUE**: Wednesday 10/2/2019, 11:59pm

**Posted**: 9/16/2019

**Last Update**: 9/30/2019

In our previous homework you generated the degree distribution of a genetic interaction network:

Note: you can obtain a csv file with the adjacency matrix for the genetic interaction dataset here: https://www.hcbravo.org/networks-across-scales/ data/genetic_interaction.csv.gz.

**Question 1**: Is this degree distribution consistent with a network generated by the Erdös-Renyi network model? Why?

## Fitting the degree distribution

Use a network analysis library to estimate the degree exponent under a power-law model for the genetic interaction network.

**Question 2**:

Include a plot equivalent to Image 4.24b of the Barabasi book

What is the estimated degree exponent of the genetic interaction network?

### Implementation (*Qual*)

Implement and apply the procedures of Section 4.13 in the Barabasi book to estimate the degree exponent of this network.

*Extra challenge*: Implement the goodness of fit method using the synthetic `D`

statistic approach. How well does the power law with your estimated degree exponent fit the observed degree distribution of the genetic interaction network model?

*Extra-extra challenge*: Extend your estimation procedure to include parameters `k_{sat}`

and `k_{cut}`

. What is the estimated degree exponent? Do you observe better goodness of fit?

## Network evolution model

In this section you will explore random network generation models to posit hypotheses of the emergence of the empirically observed structure of the genetic interaction network. A few papers that perform this type of analysis for protein-protein interaction networks are listed below.

- Generate a network of the same size as the genetic interaction network using the Barabasi-Albert model. Comment how you chose parameters of the BA model (e.g., seed graph structure and size
`m_0`

, number of links added per time step`m`

, etc.) Choose a network structure measure and compare the BA generated model and the observed genetic interaction network based on your selected measure. Possible choices are:

a. degree distribution, b. correlation coefficient distribution, c. betweenness distribution, d. module organization, e. other, see papers below for ideas.Perform your comparison both qualitatively (i.e., graphically) and quantitatively. For the latter, you need to decide how to do so, e.g., if your network structure measure is based on distribution, you can use, e.g., the KS statistic to compare distributions as described in the method used to fit the degree exponent for power-law distributions

Generate a network of the same size of the genetic interaction network using a gene duplication model as described below. Comment how you chose parameters of the duplication model (e.g., seed graph structure and size, edge deletion probability

`p`

and rewiring rate`r`

). Compare the resulting network to the observed genetic interaction network using the same measure as question 2.Choose what you think is the most appropriate network evolution model for the genetic interaction network from the taxonomy of network evolution models (Image 6.15), or other model you deem appropriate. Explain why you chose it and compare to the observed genetic interaction network as before.

`igraph`

and other libraries have methods to generate models according to BA-like network emergence models. Feel free to use those.

*Qual*: implement the network generation model yourself

### A simple gene duplication model

The following procedure generates a network of size `n`

based on a seed graph `G_0`

and parameters `p`

(edge deletion) and `r`

(node rewiring). It follows the description in [1]

- Start from a seed graph
`G_0`

of size`n_0`

- For
`t=1,...,(n-n_0)`

:

a. Select vertex`v_t`

uniformly at random from graph`G_{t-1}`

and duplicate it by adding new node`v`

with the same links as`v_t`

. This creates graph`G_t`

.

b. For each edge connecting new vertex`v`

, delete it with probability`p`

c. For each vertex`u`

*not*connected to new vertex`v`

, connect`u`

and`v`

with probability`r/|V(t)|`

where`V(t)`

is the set of vertices in graph`G_t`

.

## Papers for inspiration

[1] Hormozidari et al. (2007). “Not all scale-free networks are born equal: the role of the seed graph in PPI network evolution”. PLOS Computational Biology. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030118

[2] de Silva and Stumpf (2005). “Complex networks and simple models in biology”. Journal of the Royal Society Interface. http://rsif.royalsocietypublishing.org/content/2/5/419.short?casa_token=YkYfAECQUlUAAAAA:ios3YlYfBo6ddDmnUJrWPVQdX2PkmipKgsde6sQQJjwxHgXegDlD3L6T5nFcl2yp4kVFwqLt-L4H0TEw

[3] Pastor-Satorras, Smith and Sole (2003). “Evolving protein interaction networks through gene duplication”. Journal of Theoretical Biology. https://www.sciencedirect.com/science/article/pii/S0022519303000286

[4] Zhao and Mooney (2012). “Functional organization and its implication in evolution of the human protein-protein interaction network”. BMC Genomics. https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-150

[5] Hallinan (2004). “Gene duplication and hierarchical modularity in intracellular interaction networks”. Biosystems. https://www.sciencedirect.com/science/article/pii/S0303264704000309?via%3Dihub

[6] Almaas et al. (2007). “Scale-free networks in biology”. https://www.worldscientific.com/doi/abs/10.1142/9789812772367_0001

## What to turn in

Turn in a single `pdf`

containing plots and answers to each exercise. Make sure to comment on how you processed data to get an adjacency matrix.
If you used the method described above state so, if you did something else, please describe what you did. Include all code used to answer this in your pdf.
If using R, I recommend you use Rmarkdown to do this project. If using Python, I recommend you use a Jupyter notebook.

You can work in self-organizing groups of at most 3 of your classmates. On ELMS, please have every member of the group submit the same pdf.