The Existence and the Socio-Economic Implications of Genetic Networks: A Meta-Analysis
Pasquale Lucio Scandizzo and Alessandra Imperiali
University of Rome Tor Vergata
Genetic networks are recent paradigms for the inheritance of traits from parents. They can be defined as relational structures composed of genes, some of which carry genetic information, and linkages with structural or regulating properties. Previous studies have found that biological networks are characterized by scale-free or power-law algorithms, with the development of highly influential genes with many—but sparse—connections to other genes, and with key regulatory roles in phenotypical expression. The highly influential genes are “hotspots” of regulatory functions and represent the top of the network. Seven basic typologies of genetic networks have been discovered; in these networks, genes play different roles. In this article, we address the issue of the characteristics of genetic networks and their social and economic implications by reviewing recent literature on the subject and developing a meta-analysis of a sample of recent studies. We explore the implications of a model where the desirable traits depend not only on the properties of the individual genes, but also on their connections and the architecture of the network. The model suggests that under reasonable hypotheses and interpretations of past research, several important consequences follow for the interpretation of the roles of genes in everyday life, their interaction with the environment, and their socio-economic role. A major implication for agricultural research on biotechnology is that a strategy aimed to select varieties on the basis of topological properties of the underlying genetic network, and their regulatory role, may be more successful than one depending only on focusing on the direct association between specific genes and desirable traits.
Key words: network analysis, co-expression networks, genetic networks.Introduction
In the recent past, the scientific community experienced a renewed interest in the study of complex networks. Numerous scientists from different fields were interested in studying the topological features and the interactions among the components of complex networks. Intense research activity was directed towards biological networks, where network theory finds its natural application and genes are considered nodes with links as the interconnections among them.
These studies have produced remarkable progress, not only in understanding the topological and chemical structures of the genes (which, today, can be described and determined precisely), but also on improving agricultural crops. Thanks to the deep genetic knowledge acquired, genes can be modified and recombined into the cells of living organisms. For this reason, scientists started to use the rDNA technique to improve crop productivity or to make crops more resistant to stress, diseases, and chemical treatments.
In order to unleash the potential of cultivated crops, agriculturists can use the rDNA technique—which is improving crop breeding—also in an indirect way. This means that scientists do not introduce novel genes, but they are going to achieve good results thanks to the development of marker technologies. This kind of technology allows scientists to identify and map parts of a chromosome, which contain the genes that cover relevant areas of agronomic interest.
These studies have been developed for most crop species and have led to important discoveries. In studying the individual loci that control a quantitatively inherited trait, scientists thought at first that complex traits were determined by a large number of genes, but they more recently noticed that the effects of these loci are very uneven and in many cases only few loci can affect complex traits. Following this discovery, plant biologists performed different experimental studies focusing the attention on the genome of different crops.
In order to define gene targeting for functional identification and for investigations of regulatory mechanisms, plant biologists constructed models where traits are the result of the cooperative expression (co-expression) of genes, organized according to the topology of networks. A co-expression network is constructed by determining the tendency of m transcripts to exhibit similar expression patterns across a set of n microarrays (Ficklin & Feltus, 2011). The information contained in the co-expression network is the key to understanding the biological systems at a molecular level as a series of relationships among co-expression modules in addition to gene-to-gene relationships (Obayashi & Kinoshita, 2010). On the other hand, to obtain reliable estimates of co-expressed gene relationships, biologists need a large amount of data from DNA microarray experiments. Today, these data are available in large public data repository sources (containing information about a huge variety of crops) that are widely used.
Genetic networks have important implications for further research strategies. They also imply socio-economic consequences, in that both conceptual representations of evolution and genetic causality are likely to be substantially changed by a new structuralist paradigm, as an innovative basis for policies and practices both in the private and public domain. More specifically, the chain of causation suggested by the emerging network paradigm suggests that gene expression is a complex phenomenon, where the structure of the genes and their relation may dominate the individual nature of the genes involved. At the same time, and perhaps paradoxically, selected individual genes may be more important because of their roles as hubs of complex networks, where their position as crossroads of connections and engines of co-expression may be key to understand how traits are inherited and expressed in practice.
While the traditional view of population genetics sees evolution as a process involving the change in frequency of distinct gene variants (alleles) differing in fitness over time, the molecular basis of phenotypic variance seems to depend very little on gene and protein sequences, especially for the characters that appear to confer adaptive benefit to the bearers. Moreover, comparison of homologous DNA sequences of various species shows that gene expressions seem to drive evolution more forcefully than the genes themselves. Gene expression in turn may depend in part on the fact that genes are linked to each other in functional networks whose products exhibit interrelated expression profiles. This has led some authors to propose a new theory of the “selfish gene network” (Boldogkoi, 2004), based on four main propositions: (1) Instead of individual genes, gene networks (GNs) are responsible for the determination of traits and behaviors. (2) The primary source of microevolution is the intraspecific polymorphism in GNs and not the allelic variation in either the coding or the regulatory sequences of individual genes. (3) GN polymorphism is generated by the variation in the regulatory regions of the component genes and not by the variance in their coding sequences. (4) Evolution proceeds through continuous restructuring of the composition of GNs rather than fixing of specific alleles or GN variants.
Genetic networks also relate importantly to climate change and green growth. For example, a growing literature explores the way landscape patterns are related to the spread and establishment of plant pathogens and their genetic variations across their geographical distribution. The dispersal of organisms on a geographic scale (Garrett, Dendy, Frank, Rouse, & Travers, 2006; Stukenbrock, Banke, & McDonald, 2006), resilience, and adaptation to climate change may be the consequence of how genetic dispersion depends on the underlying network structure, the presence of dominant genes in a scale-free structure and this relates in turn to biodiversity, plant disease epidemiology, and human and animal pathologies (Burdon, Thrall, & Ericson, 2006; Gilligan, Brenner, & Venkatesh, 2002; Jaeger et al., 2004). Landscape pathology and ecology also appear to be related to intrinsic network properties of gene flow (Geils, 1992; Holdenrieder, Stieber, & Pawel, 2004; Lundquist & Hamelin, 2005; Pautasso, Holdenrieder, & Stenlid, 2005).
A second, most important, field where research on GNs may have significant socioeconomic consequences is the management of gene conservation and crop genetic diversity. Traditional practices, consisting of mere crop or character diversification, may fail because they are approximate and largely based on expressed characters. On the other hand, information on the underlying genetic structure—such as the role played by key nodal genes in the co-expression networks—may be crucial to improve the management of crop diversity both off and on farm.
Finally, economic theory and research may itself be influenced by the development of the new level of inquiry into network connections and dominant genes due to its attention to evolutionary patterns as underlying determinants of socioeconomic systems. Evolutionary economics and some of its more daring hypotheses, in particular, may be linked to the interpretive model provided by gene networks and co-expression clusters. For example, in an important field known as genetic programming—originally developed Holland (1975) and further extended by Koza (1992)—based on biological analogy, artificial adaptive agents (genes or genetic algorithms) with the capacity for autonomous discovery (autonomous programming ability) are introduced in economic modeling. The study of the behavior of a network of such agents in a market-like environment suggests close resemblance with the ways in which genetic network theories are taking shape (see, for example, Andrews & Praeger,1994; Chen & Yeh,1996, 1997, 1999, 2000a, 2000b; Lensberg,1999).
In this study we focus our attention on three crops: maize, rice, and Arabidopsis thaliana to perform a meta-analysis of recent studies about co-expression GNs. Data from these studies provide detailed information about GN constituents, such as the number of genes, the number of edges, the number of co-expressed genes, the number of clusters, and the number of modules, as well as on the related co-expression traits. These data allowed us to analyze the relationship that links the nodes and the edges of the GNs constructed by the studies examined, as well as the other network variables and the co-expression traits.
Our objective in the study is to uncover evidence of self organization that may be relevant for socioeconomic analysis and future research strategies from the three points of view illustrated above—the intrinsic network structure of gene expression and its importance for management of the environment, climate change adaptation and green growth, the implications for gene conservation and crop diversification, and the potential relevance of the new paradigm for evolutionary economics.
In line with the current literature on complex networks (Menezes & Barabasi, 2008), we find that a measure of traffic dispersion (in our case the number of linkages) is linked to a measure of information flow (that we assume proportional to the number of nodes) by a simple, scale-free function, whereby the number of edges tend to increase percentage-wise more than proportionally with respect to the number of nodes, with an important negative effect on such a positive relationship being exercised by co-expression.
In what follows, we introduce the recombinant DNA technique and its application to the biotechnology then present the network theories. We then present the studies and model, followed by our conclusions.
From Multiplicity of Characters to Co-expression
Over the past 100 years, the knowledge of genetic engineers has grown exponentially. Since the 1970s, the classical genetic approaches used for improving organisms were replaced by a more advanced and efficient approach, known as recombinant DNA (rDNA) technique. This new approach has allowed scientists to carry out procedures using genes and DNA that are extremely innovative and powerful. As a consequence, remarkable progress has been made, especially in understanding genes’ chemical structures, divisions, and risk assessment.
The most profound consequence of rDNA technology is our increased knowledge of fundamental life processes. Today, genes do not constitute any more black holes, but can be described in precise chemical terms, manipulated, and reintroduced into the cells of living organisms, with enormous potential for further innovation and progress. An intensive research activity has stimulated the increasing production of hormones, vaccines, therapeutic agents, and diagnostic tools. In turn, this has given great impulse to the biotechnology industry with the creation of a whole range of new products through the development of an intimate relation between universities and industries. Recently, the public debate on rDNA techniques have enhanced the public interest in molecular engineering research even though these kinds of discussions are often exacerbated by the interests in receiving grants and in enhancing the commercialization.
While in the 1970s the principal concern was the effects of rDNA on public health and safety, today the focus is on ethical, legal, and environmental issues and the use of genetically modified plants and animals. The concern consists in the fact that the gene therapy can alter human germ-line genes, modifying the somatic cells.
The scientific community is concerned by the evidence that this technique could modify the germ-line genes with uncontrolled consequences. The environmental and ethical neutrality of modified genes is questionable, and there is no absolute assurance of complete safety. Furthermore, the consequences of the creation of new germ lines are still unknown, so that it appears justified the widespread concern, even in the scientific community, over the uncertainties surrounding the replication of these new forms of life that could unleash energies that cannot be controlled.
rDNA consists of DNA sequences resulting from laboratory methods that bring together genetic material from multiple sources. In this way, it is possible to create sequences that would not have otherwise been found in nature and that constitute a new unit, representing something different and more than the sum of the single original elements.
One of the most important applications of rDNA technologies is the improvement of agricultural crops. Research on maps of genetic linkages have made it possible to study the chromosomal locations of genes for improving crops and other complex traits playing an important role in agriculture (Tanksley & McCouch, 1997).
Scientists succeeded in isolating genes responsible for main adaptive and improvement traits and were able to determine their chemical structure, together with their functions. This knowledge was then used to develop the potential of our wild and cultivated germplasm resources for improving agricultural crops. This means that scientists use DNA to investigate biological information and encoded genetic instructions for the development and the functioning of all known living organisms. Figure 1 illustrates the structure of part of the DNA.
Figure 1. A section of DNA.
Source: Wikipedia page (http://en.wikipedia.org/wiki/DNA)
Because of growing population pressure and lagging production, in recent years higher agricultural productivity has been pursued through various strategies that combine the use of greater farming inputs such as pesticides, fertilizers, and water and innovative agronomic practices. Agriculturists have also pursued genetic crop improvements by crossing genetically related modern varieties. These crops, which can be defined as cross-genetic transgenic and are genetically less productive, have been improved, thanks to the modification of only few genes, which allow them to be more resistant to cold temperatures and impede early germination if they are planted in cold countries.
Although the above techniques and the ensuing construction of seed banks are still important, they are not a guarantee of success for future productivity because scientists must gain deeper knowledge on how to use the genetic material developed. Crop improvement, in fact, requires the ability to introduce into crops genes from a wide variety of sources and is the result of an accurate selection originated from the gene banks.
Furthermore, this procedure works well when it is required only to improve resistance to diseases and insects because this typically requires the introduction of one single dominant gene, but fails for the most important traits in agriculture, which are conditioned not by a single gene, but by several different ones. Scientists noticed that gene banks contain a huge variety of genes, and there is the possibility that, among them, some favorable ones are not yet discovered. In the past few years, biological researchers focused their attention on the genome of different crops. The aim of these analyses was to improve understanding of the domestication and the agricultural improvements of these crops and to set the stage for further investigations. One of the major objectives was the discovery and decoding of genomes of plants, which has been followed by the genome-wide identification of genes. Within crops, understanding complex interactions underlying agronomic traits is of great importance to improve plant breeding.
Plant biologists have performed many different experimental studies to define gene targeting for functional identification, investigation of regulatory mechanisms, or to find potential partners in protein-protein interactions. One increasingly important method used to identify interacting gene sets is represented by the construction of gene co-expression networks. While gene expression is the process by which information from a gene is used in the synthesis of a functional gene product, co-expression networks include genes involved in related biological pathways, which are expressed cooperatively for their functions. Co-expression networks thus constitute a strategy for storing information on the discovery of non-random gene-gene expression dependencies.
In order to build a reliable estimation of co-expressed gene relationships, biologists need a large amount of data from DNA microarray experiments. In recent years, and especially since the decoding of the Arabidopsis genome, all the information relating to the genome sequences has been stored in different databases. Thanks to the various experiments performed by plant biologists, many microarray datasets have been assembled, such as different tissues and chemical treatments, which could be used to predict co-expressed genes and to provide biological information to investigate gene functions (Ogata, Suzuki, Sakurai, & Shibata, 2010).
Co-regulation vs. Co-expression
Since the dawn of microarray technology, scientists were able to investigate and obtain information about gene-to-gene functional relationships. A vast amount of expression data (which were stored in the public databases) were thus produced from various species and evaluated on the basis of the similarity of expression patterns. Gene co-expression data can be used not only to simply classify genes, but also to create gene maps. Furthermore, these data enable scientists to identify new genes that are functionally related to a phenomenon under investigation. Accumulating evidence of this sort indicates that gene order is not completely random and that genes with similar expression levels tend to be clustered within the same genomic neighborhoods (Michalak, 2008). One important goal of the analysis of gene expression data is to qualify the differences between co-regulated and co-expressed genes. Co-regulated genes are those genes which are regulated by common transcription factors, while co-expressed genes are the genes that share similar expression patterns or whose expression levels are highly correlated. Transcription of a gene is determined by the interaction of regulatory proteins (that is, transcription factors) with DNA sequences in the gene’s promoter region (Yeung, Medvedovic, & Bumgarner, 2004). Co-expressed genes are not randomly distributed and tend to cluster within genomic neighborhoods. Co-regulated genes are defined as genes that are regulated by at least one commonly known transcription factor and co-expressed genes as genes that share similar expression patterns as discovered by cluster analysis (Yeung et al., 2004).
In order to investigate the relations between co-regulation and co-expression, Zhang, Zha, Wang, and Chu (2004) retrieved regulator-regulon pairs from the Yeast Promoter Database and examined the expression profiles of the regulons with the same regulator. The data were retrieved by Cho et al. (1998) and the authors used them to generate the plots in Figure 5.
Figure 2 shows a partial co-expression relationship between genes. This means that genes can be co-regulated only in some particular phase of cell cycle or cell development. However, genes can also be regulated by multiple regulators, as it can be noted in Figure 3. Co-regulation does not guarantee a global similarity in gene profiles (Zhang et al., 2004), so new clustering algorithms that consider also the internal connections are needed. To date, despite the many studies which tried to investigate the driving force behind the formation of co-expression clusters, the scientific community still lacks a comprehensive understanding of the creation of a mosaic of co-expression clusters. The different studies performed on a wide variety of species have noted large discrepancies regarding the sizes and locations of these clusters in the same species with no clear-cut boundaries.
Figure 2. Expression profiles of co-regulated genes.
Source: Zhang et al. (2004)
Figure 3. Expression profiles of co-regulated gene groups. Each curve represents the expression profile of a gene. Each sub-plot represents gene expression profiles of a co-regulated gene group. The time range is from 0 min to 160 min.
Source: Zhang et al. (2004)
The Importance of Networks in Biotechnology
After spending decades to disassemble nature, and having provided a wealth of knowledge about the individual components and their functions, biological scientists addressed their attention to a holistic alternative paradigm of investigation, according to which nothing happens in isolation in nature and most of the characteristics of living beings derive from the interactions among their constituents. Scientists thus developed a theory of complexity as a compelling architecture where everything depends on everything else, with networks being the dominant topology. In the words of E.O. Wilson (Strogatz, 2001, p. 1):
Understanding and unraveling the interactions between the elements of a cell constitutes a major goal for biologists of the genome era. The structure of the interaction network appears relevant to the functioning of the cell, and the network approaches are used to integrate various types of genomics data in order to increase the reliability of predicted interactions. In particular, one can envision that the topology of intracellular networks may provide constraints for the manipulation and the design of cells (Noort, Snel, & Huynen, 2004). An example is represented by the yeast protein interaction network, which is illustrated in Figure 4.
Figure 4. Yeast protein interaction network.
Source: Barabasi and Oltvai (2004)
Intracellular networks can be reconstructed using genomes as sources of data. Examples are represented by the protein interaction networks, genomic association networks, and the evolutionarily conserved co-expression networks. It is possible also to translate gene co-expression networks into discrete networks, because, although these networks are continuously observable, their underlying principles are discrete—the sharing of regulatory elements (Noort et al., 2004).
The studies performed on these networks indicate that the intricate interwoven relationships that govern cellular functions follow a universal law, which is shared by the majority of the complex networks present in nature. In other words, these networks share the same architectural features that characterize other complex networks because they are scale-free, modular, hierarchical, and small world types, characterized by short paths between any two nodes with highly clustered connections.
An example can be represented by the gene regulation system or co-expression network, where the genes are the nodes, which are connected to each other when co-expressed. Moreover, not only the physical interactions between molecules can be represented using graph theory, but also more complicated functional interactions can be analyzed through this nomenclature (Barabasi & Oltvai, 2004). Gene regulation is a general name for a number of sequential processes, the most well known and understood being transcription and translation, which control the level of a gene’s expression, and ultimately result with specific quantity of a target protein. More specifically, a gene regulation system consists of genes, cis-elements, and regulators. The regulators are most often proteins, called transcription factors, as well as small molecules, such as RNAs and metabolites. The interactions and binding of regulators to cis-elements in the cis-region of genes is responsible for the mode and the level of gene expression during transcription. The genes, regulators, and the regulatory connections between them—together with an interpretation scheme—form gene networks.
Gene networks thus appear to conform to the evolution of complex systems, which first originate from an orchestrated activity of many interacting components that can be represented as a series of nodes connected with each other through links or edges. A link connects two nodes (or vertices), and the ensemble constituted by nodes and links generates a graph. A simple, undirected graph with three vertices and three graphs is represented in Figure 5.
Figure 5. A simple undirected graph with three vertices and three edges. Each vertex has degree two, so this is also a regular graph.
Source: Wikipedia page (http://en.wikipedia.org/wiki/Graph_(mathematics))
Erdos and Renyi (1959) proposed a random growth network theory, based on the hypothesis that a fixed number of nodes are connected randomly to each other. In random networks, the node degree (defined as the number of nodes to which a node is connected) follows a Poisson distribution, which shows that most nodes have approximately the same number of links, and nodes that have significantly more or less links than the average degree are very rare (Barabasi & Oltvai, 2004).
The random graph model was used for decades, until in 1999 Barabasi and Albert proposed an alternative model based on the observation that most real networks are open systems. These systems grow by the continuous addition of new nodes and, in contrast to the Poisson degree distribution, with the fractions of nodes having k edges, in most cases, according to a power law. The Barabasi-Albert model (1999) was inspired by the topological structure of the World Wide Web, a network in continuous evolution and where the number of sites increases dynamically. By exploring several large databases describing the topology of large networks, Barabasi (2003) found that, for most large networks, the degree distribution deviates from the Poisson law and that, in most of cases, follows a power-law for large k.
In Equation 1, k stands for the node degree, which represents the number of edges incident with the node, while P stands for the probability that a node chosen uniformly at random has degree k. The value of the exponent γ typically varies between 2 and 3.
The topological characteristic of a scale-free network is determined by two mechanisms that interact inside the network and are absent in the classical random network model: growth and preferential attachment. Growth and preferential attachment have a common origin in protein networks, where the scale-free topology traces back to a biological mechanism called gene duplication. Duplicated genes produce identical proteins, which interact with the same protein partners (Barabasi & Oltvai, 2004).
In this way, highly connected proteins have an advantage because they are more likely to gain new links when a protein is duplicated than the weakly connected ones. However, even though this feature is able to lead to a scale-free topology, there is no certain proof that this mechanism is the only one which is responsible for the creation of power laws in cellular networks.
Recently, an important development in our understanding of the cellular network architecture was the finding that different cellular networks exhibit scale-free topology, at least approximately. The main example is represented by metabolic networks, which have been analyzed with respect to 43 different organisms, with the results indicating a scale-free topology. An additional scale-free network is represented by the protein-protein interactions in different eukaryotic species, where most of the proteins participate in only a few interactions, while a few participate in dozens (Barabasi & Oltvai, 2004). This feature is typical of scale-free networks and may have great relevance to explain dynamic phenomena, such as the capacity of ecosystems to rebound under severe exogenous stress, and, more generally, the very uneven correspondence between phenotypical expression and gene presence.
The analysis of intracellular network topology allows us to include genetic regulatory networks in the typical scale-free organizations, even though not all networks within the cell are characterized by a scale-free distribution. The signature of scale-free networks is represented by the power-law distribution that predicts the number of different genes that interact with a transcription factor. The power law synthetically expresses the fact that many nodes have only few connections, but a small but still significant number of nodes are characterized by many connections. Cellular networks, in fact, are characterized by a huge number of highly connected nodes, which play a fundamental role in determining the network behavior.
Gene co-expression networks have been observed and translated into discrete networks, taking into consideration the sharing of regulatory elements. This analysis has been performed in 2004 by Noort et al. In these networks, genes are the nodes that are connected to each other when they are co-expressed. With respect to other kinds of intracellular networks, this model covers a more inclusive array of functional relations between gene products (Noort et al., 2004). The results of Noort et al.’s analysis show that the distribution of number of links per node is scale-free and, although the average number of connections is 32, most genes are connected to only one specific gene, demonstrating the presence of hubs inside the network.
Furthermore, scale-free networks are robust to failures, and that is the reason why:
Random networks appear fragile because they tend to disintegrate in response to the removal of a critical fraction of the nodes. On the contrary, scale-free networks are characterized by a different topology which allows them to be very robust to accidental failures. Scale-free networks do not exhibit a critical threshold for disintegration, because they are composed of hubs with a large amount of links. So, random failures mainly affect the numerous low-degree nodes, whose absence does not disrupt network integrity. Despite their robustness, however, scale-free networks have an Achilles’ heel since they are extremely vulnerable to attacks if several of the largest hubs are simultaneously removed. When random nodes are dismantled, in fact, network integrity is not in danger. The accidental removal of a single hub will not be fatal either, but if we no longer select the nodes randomly, an attack to the hubs would result in a major disruption. Today, the understanding of cellular network robustness is still far from complete, even though recent results supported the hypothesis that these types of networks are robust to many varied perturbations (Barabasi & Oltvai, 2004).
Summarizing, among the inherent properties of networks, robustness and adaptation appear important, with vulnerability arising when hubs are attacked and disrupted. Several studies showed also the existence of an interlink between robustness and modularity, because the ability of a module to evolve plays a key role in developing or limiting robustness (Barabasi & Oltvai, 2004).
Dorogovtsev, Goltsev, and Mendes (2002) found that in the deterministic scale-free networks, clustering coefficients behave according to the following expression.
This indicates that the nodes that have few links are characterized by a high clustering coefficient and highly interconnected small modules, while highly interconnected hubs have a low value of the clustering coefficient, and these nodes are characterized by isolated modules.
The existence of a scaling law in the degree of clustering and the scale-free property were both used by Ravatz and Barabasi (2003) to identify the existence of hierarchical organizations in complex networks. These authors noticed that both properties have generated considerable attention and that a scale-free topology and a high degree of clustering coexist in a large number of real networks.
In gene networks, on the other hand, the strength and the temporal aspects of the interactions must be considered. Despite the significant advances in the past few years, scientists know little about the temporal aspects of the interactions, while they have gained more information about the intensity of connections in the genetic-regulatory networks.
Studies on the degree to which each pair of genes are co-expressed indicate that GNs are characterized by several “hot” links characterized by significant correlation coefficients and that are embedded into a web of less active interactions (Barabasi & Oltvai, 2004). Highly correlated pairs appear to correspond to direct regulatory and protein interactions, as the correlations are higher between proteins that are in the same interactive cluster rather than for proteins that do not interact directly.
The Socio-economic Implications
Gene networks that are dominated by power law present important implications for the economic organization of society and its capacity to evolve and adapt. Randomly dispersed genes would cause evolution to occur as the alteration of the frequency of distinct gene variants (alleles). GNs, on the other hand, based as they are on dominant genes, determine traits and behaviors (through intra-specific polymorphism) mainly by the variation in the regulatory regions of its major component genes and through continuous restructuring of the components of the network.
The first implication of this changing paradigm is a new concept of robustness and fragility of the environment. The power law distribution of genes within networks, in fact, implies robustness toward random attacks, since dominant genes are protected by their low number and the plurality of their connections within the network. At the same time, GNs are vulnerable to attacks directed to connections rather than genes, since these attacks end up by compromising the “hubs” and ultimately threaten the collapse of the whole network.
What does this mean in practice? Under the old paradigm, ecotypes that were threatened by lack of diversification and extinction of selected species could be contrasted by limiting environmental damage and by storing DNA information in gene banks. The new paradigm, instead, implies a different strategy, in that protection of the environment should be highly selective, preserve critical genes, and control contaminants according to their capacity to attack not only directly genes, but also gene connections.
The selective robustness of the ecotypes also provides for different strategies against external shocks, since the network architecture implies a much higher capacity to rebound after seemingly destructive violent shocks, but potentially irreversible destruction under more continuous damage directed to the key elements of the network. Thus, for example, oil spills and other one-shot effects of climatic disasters, such as tornados, hurricanes, and inundations, may be less harmful than persistent contamination of the environment, because the former are unselective attacks that reach the “hub genes” with only low probability, while the latter engineer pervasive damage that attack the connections rather than the genes and thus may compromise the functions of those genes that are connected in more ways with the other genes.
More generally, the results on the widespread existence and regularity of GNs point to more profound potential implications for social life. The genetic-evolutionary paradigm, in fact, has long been based on the notion of bio-organisms as stable entities, inheriting the basic building blocks of their potential—DNA—from their parents and with DNA remaining largely unchanged over lifetimes. This perspective has been assumed true for all biological organisms, including humans, and years of scientific inquiry have been framed in terms of “genes by environment” and “nature versus nurture.” The very basis of the theory of evolution, in this respect, has been that genes are physiologically autonomous from the external environment. In the case of human civilization, this has meant that individuals have been considered fundamentally separate from their ever-changing social environment, despite their obvious dependence on it, on the basis of the notion that genes influence behavior, but not the other way around.
GNs, on the other hand, propose a different model of functioning whereby genes are polymorphous entities, whose expression and frequent co-expression depend on multiple influences of the environment on the other genes and the linkages involved in complex networks. According to this model, genes can be turned on and off by environmental conditions and, because of the power function regulating the distribution of their links in the network, a few crucial genes may be more important than the majority of the other genes involved in determining phenotypical expressions ensuing from network reaction to different environmental conditions. Consistently with these hypotheses, a growing body of evidence (Slavich & Cole, 2013) shows that specific molecular mechanisms mediate the effects of external social conditions on gene expression and that these dynamics can cause social experiences to become biologically embedded.
The discussion on rDNA technique and its application on agricultural crops directed our attention to the regulatory mechanisms in protein-to-protein or gene-to-gene interactions. In the course of the past decade, plant biologists were able to decode different genomes of plants and the information stored have been used and combined in a systems genetic approach to analyze the molecular subsystems underlying complex traits.
The most important crops that were subjected to these studies are those included in the Poaceae family and specifically rice (Oryza sativa), maize (Zea mays), wheat (Triticum spp.), and sugarcane (Saccharum officinarum). Within these crops, understanding the complex interactions underlying agronomic traits is of great significance to improve plant breeding along the difficult road of genetic engineering, especially for what concerns the trade-off that the new varieties seem to offer between desirable and non-desirable traits.
In recent years, the complete Arabidopsis genome was decoded and all the relevant information relating to the genome sequences have been stored in different databases. This has allowed plant biologists to perform various experiments where they could assemble many microarray datasets such as different tissues and chemical treatments that could be used to predict co-expressed genes and to provide biological information in order to facilitate investigation of gene functions (Ogata et al., 2010).
The storage of information in different databases not only has facilitated the diffusion of knowledge of the individual genes, but has also allowed the cross-species comparison of relevant co-expressed gene groups. This kind of comparative analysis can be performed by associating co-expressed gene modules with biological information, such as gene ontology and metabolic pathways, and then compare these modules across plant species. The results showed that genes that are highly connected all participate in similar biological processes.
For this reason, in recent years, plant gene co-expression networks have become increasingly popular in different fields of research. Several research groups have identified subsets of highly correlated genes within large gene co-expression networks in Arabidopsis thaliana, rice (Oryza sativa), and maize (Zea mays).
Maize represents an important model organism for fundamental research into the inheritance and functions of genes, the linkages of genes to chromosomes, the recombination, and transposition (Schnable, 2009). Rice plays an important role as a staple food for more than one-half of the world’s population, and it is also one of the most studied grasses. Rice is characterized by different attractive properties: its genome-size is compact, its genome-sequence is known, and high-density genetic maps are available. Because of its close relationship with other cereals, furthermore, rice can provide a rich source of genetic hypotheses associated with complex traits which could easily be translated to other grasses with poor genetic resources.
Arabidopsis thaliana, even though it is not an agricultural species, plays an important role in plant biology because of its experimental characteristics (germination time, observability, measurability of relevant traits, etc.), which became especially attractive when the sequencing of its genome was completely decoded. Since then, thousands of experiments have been performed, and the array data obtained were stored in public databases.
In this article, we report our analysis of the GN experiments performed on these three crops (maize, rice, and Arabidopsis thaliana) by focusing on the outcomes of different research projects aimed to identify gene co-expression networks by examining the co-expression patterns of genes over a large number of experimental conditions. Although gene co-expression networks may model and estimate gene interdependencies for a broad range of plants, they suffer from limitations because they cannot capture all the possible interactions arising among genes in different environmental or temporal conditions. Furthermore, co-expression can be measured only if genes are consistently co-expressed, in the sense that correlations among them can be observed over a sufficient broad range of outcomes. Given these limitations, however, co-expression networks appear to constitute a valuable instrument that can be used to provide glimpses into complex gene-gene interactions.
The different research reports examined show that numerous analytical tools have been used to extract gene relationships and functions from microarray data. Most of these tools are based on the weighted gene correlation network analysis (WGCNA), the Pearson correlation coefficient, and Fisher’s test, with most scientists using methods of cluster analysis to group genes that show similar expression patterns under well-designed experimental conditions (Lee, 2009).
The aim of the clustering methods is to calculate pair-wise relations and similarity measures among genes and gene clusters. Other methods used, for which we refer the interested reader to the specialized literature, include the K-means clustering algorithms, the random matrix theory (RMT), the enrichment analysis, the regularized graphical Gaussian method (GGM), the clique finder, and the robust multi-array average method. We first analyzed 28 studies and collected the data presented on their results; we then extended our analysis to a total of 57 studies. Data from these studies include the number of genes (nodes), the number of edges, the number of co-expressed genes, the number of clusters, modules, and the number of genes that can be assigned to a cluster. In order to develop comparisons across species, we also collected the information available about the traits associated with co-expressed gene modules and the 41 and 101 networks analyzed.
Our approach is based on the idea that edges and nodes can be considered proxies for the information circulating in a network. This information is exchanged by the nodes and contributes to determine the traits expressed in the plant. We conjecture that topological complexity is at the root of the difficulties that bio-scientists encounter in identifying the correspondence between the underlying genes and the observed traits. While co-expression further increases this complexity, it can also provide an important avenue to develop networks or clusters of genes that can simultaneously enhance more traits. Given these premises, we separately estimated the following two equations by means of ordinary least squares (OLS).
In Equation 3, L indicates natural logarithm of the number of edges connecting pairs of genes of the network studied, N indicates the natural logarithm of the number of genes in the network, while X′ denotes a vector of dummy variables (see Table 1). Finally ei denotes the error terms.
Table 1. List of dummy variables used in our estimations.
In Equation 4, C denotes the logarithm of the number of co-expressed genes, while N stands for the logarithm of the number of genes inside the network and X′ denotes the same vector of dummy variables as in Equation 3. Finally, vi denotes the error terms, which are assumed to be normally distributed with zero mean and constant variance.
Tables 2, 3, 4, and 5 present the estimation results. We performed two different kinds of estimations of the first model so that they can be divided into two groups of analysis. The first contains only 19 observations, while the second one contains 41 observations. In the first set of regressions, we considered only the studies that contained the same information, while in the second set we tried to consider all the variables by attributing estimated values1 to the missing observations. In general, results appear to be robust, and significant coefficients show the same signs across all regressions.
Our findings in Table 2 point to a scale-free relationship between the number of edges and the number of genes, with a significant increase in the percentage of edges in response to a percentage increase in nodes inside the network. The robustness of this outcome is confirmed by the fact that this relationship is significant and quantitatively stable in all model specifications. The response revolves around (and is not significantly different) from unity. For the variables representing the traits, biosynthesis capacity, stress response, and seed development exhibit a negative impact, while photosynthesis capacity appears to be associated positively with the number of edges.
Table 2. OLS estimate of the (log-log) relationship between the number of edges and the number of nodes (genes) in a sample of genetic networks.
Results in Table 3, on the other hand, show that a significant scale-free relation can also be estimated between the genes co-expressed in a cluster and the total number of genes of an organism. This relationship, however, is strictly non proportional with the number of co-expressed genes exhibiting an elasticity between 0.6 and 0.79 with respect to the total number of genes.
Table 3. OLS estimate of the (log-log) relationship between the number of co-expressed genes and the number of genes in a sample of genetic networks.
The results in Table 4 show that also for the degree distribution (number of edges per nodes), a significant scale-free relationship can be measured both with respect to the number of edges and the number of co-expressed genes, with similar elasticities (around 0.5), but of opposite signs. This implies that the number of edges per node tend to vary positively with the square root of the number of genes and negatively with the square root of the co-expressed genes. In other words, a given percentage increase in the number of genes is met with roughly 0.5% increase in the number of edges per gene (the “degree” of the gene), while a 1% increase in the number of co-expressed genes is met with a reduction of 0.5% increase in the same degree.
Table 4. OLS estimates of the (log-log) relationship between the number of the edges/nodes and the number of genes in a sample of genetic networks.
Table 5 shows that the ratio between co-expressed genes and edges is also a scale-free function of both the number of nodes and of edges, and that this relationship is less than proportional for the nodes and more than proportional for the edges.
Table 5. OLS estimates of the (log-log) relationship between the number of co-expressed genes/number of edges and the number of genes in a sample of genetic networks.
Tables 6-8 summarize the effects of the various traits considered in the studies on connectivity. With few exceptions, these effects are negative, suggesting that the presence of the traits tends to reduce the number of nodes, edges, co-expressed genes, and their ratios.
Table 6. Effect of traits on network connectivity.
Table 7. Effect of traits on network connectivity.
Table 8. Effect of traits on network connectivity.
In sum, connectivity, defined as the number of edges that links a given number of genes, appears to be a network characteristic that is associated through a scale-free relationship to the size of the network (as measured by the number of genes). This relationship can be interpreted as the result of information exchanges, i.e., as a relationship between the information contained and the information exchanged by the genes. Co-expression appears to be a strategy to achieve the same level of information content with lower connection costs and, interestingly, this strategy appears to be echoed by the presence of several co-expressed traits, but not by photosynthesis.
The scale-free relationship between the probability that a node is of degree k and the degree is also known in statistics as the Pareto law and can be expressed mathematically as (K | A,V) = (AV A)/K. In order to test the hypothesis that the average degree of a genetic node follows this law, we first fitted the Pareto distribution to the distribution of the ratios between edges and nodes in our sample. As Table 9 and Figure 6 show, the Pareto distribution fits well the data, and the values of the test allow us to reject the null hypothesis (that the underlying distribution is not a Pareto). Subsequently, we generated the probability values corresponding to our observations and regressed the logarithms of these values against the logarithms of the ratios (i.e., the average degree of each observed node) and the other variables representing the structure of the network (co-expressed genes, clusters, and groups). The results (Table 10) confirm the relationship estimated by Barabasi and Albert (1999), namely P(K) = K -γ, with γ = 1.21. Finally, Table 11 performs the empirical distribution test for K=number of edges/nodes.
Table 9. Effect of traits on network connectivity.
Figure 6. Empirical distribution test for ratio.
Table 10. OLS estimates of the scale free (log-log) relationship between the probability that an average node is of degree K, the degree K, and other network characteristics.
Table 11. Empirical distribution test for K=number of edges/nodes.
Our analysis confirms the existence of several scale-free relations between the components of bionetworks. By focusing on GNs and in particular on maize, rice, and Arabidopsis thaliana, the novelty of our approach consists in the investigation of the role of genes, edges, and co-expressed genes across a wide variety of experiments based on the hypothesis that gene expression follows the network topology. We found that a number of scale-free relations fit well the experimental data and suggest that as the number of the genes increases inside the network, the number of edges increases proportionally, while the number of co-expressed genes increases less than proportionally. We also found that the number of edges per node increases more than proportionally with the number of nodes and less than proportionally both with respect to an increase in the number of edges and a decrease in the number of co-expressed genes. Finally, we found that the probability that a gene has a given number of edges (the “degree”) also follows a scale-free relation with the number of edges of the type found by Barabasi and Albert (1999), and a negative relation with other connective properties of the networks, such as co-expression, clustering, and grouping. These findings appear robust and suggest several conclusions.
First, the hypothesis of GNs appears well supported by quantitative analysis, which confirms both the ubiquity and the consistency of the gene-to-gene relations evidenced by numerous experimental studies. Second, the results of these studies also consistently support the scale-free hypothesis on the topology of the networks and the existence of dominant, hub genes that render the networks particularly robust to non-targeted exogenous shocks. Third, in spite of the ecological resilience which can be inferred from the scale-free topology, the regulatory function of the networks implies an interactive relationship between the genes and the environment, which can make them more sensitive than it was formerly believed. Somewhat paradoxically, and perhaps even more importantly for sustainable economic development, one can thus expect ecotypes to be more robust to external shocks and, at the same time, gene co-expression more sensitive to the conditions of the environment. Finally, a strategy of bio-technological research aimed to identify relevant clusters of co-expression may be more successful than one aimed at identifying single traits or groups of traits and corresponding gene determinants. The fact that the presence of desirable traits is mostly associated with a reduction of the number of genes, edges, and other network connections seems to reinforce this conjecture.
1 For the missing data, we used the values predicted of the dependent variable (based on the equativo estimated with the smaller number of observations) and the average values for the independent variable.
Andrews, M., & Praeger, R. (1994). Genetic programming for the acquisition of double auction market strategies. In K. Kinnear (Ed.), Advances in genetic programming. Cambridge, MA: MIT Press.
Angelovici, R., Fait, A., Zhu, X., Szymanski, J., Feldmesser, E., Fermie, A.R., & Galili, G. (2009). Deciphering transcriptional and metabolic networks associated with lysine metabolism during Arabidopsis seed development. Plant Physiology, 151(4), 2058-2070.
Aoki, K., Ogata, Y., Shibata, D. (2007). Approaches for extracting practical information from gene co-expression networks in plant biology. Plant & Cell Physiology, 48(3), 381-390.
Barabasi, A.L. (2003). Linked. New York: Plume.
Barabasi, A.L. (2009). Scale-free networks: A decade and beyond. Science, 325(5939), 412-413.
Barabasi, A.L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509.
Barabasi, A.L., & Oltvai, Z.N., (2004). Network biology: Understanding the cell’s functional organizations. Nature Reviews Genetics, 5, 101-113.
Bassel, G.W., Glaab, E., Marquez, J., Holdsworth, M.J., & Bacardit, J. (2011). Functional network construction in Arabidopsis using rule-based machine learning on large-scale data sets. The Plant Cell, 23(9), 3101-3116.
Boldogkoi, Z. (2004). Gene network polymorphism is the raw material of natural selection: The selfish gene network hypothesis. Journal of Molecular Evolution, 59(3), 340-357.
Burdon, J.J., Thrall, P.H., & Ericson, L. (2006). The current and future dynamics of disease in plant communities. Annual Review of Phytopathology, 44, 19-39.
Carrera, J., Rodrigo, G., Jaramillo, A., & Elena, S. (2009). Reverse engineering the Arabidopsis thaliana transcriptional network under changing environmental conditions. Genome Biology, 10(9), R96.
Chen, S.H., & Yeh, C.H. (1996). Genetic programming learning and the cobweb model. In P. Angeline (Ed.), Advances in genetic programming (pp. 443-466). Cambridge, MA: MIT Press.
Chen, S.H., & Yeh, C.H. (1997). Towards a computable approach to the efficient market hypothesis: An application of genetic programming. Journal of Economic Dynamics and Control, 21, 1043-1063.
Chen, S.H., & Yeh, C.H. (1999). Modeling the expectations of inflation in the OLG model with genetic programming. Soft Computing, 3(2), 53-62.
Chen, S.H., & Yeh, C.H. (2000a). Simulating economic tradition processes by genetic programming. Annals of Operations Research, 97, 265-286.
Chen, S.H., & Yeh, C.H. (2000b). Evolving traders and the business school with genetic programming: A new architecture of agent-based artificial stock market. Journal of Economic Dynamics and Control, 25, 363-393.
Childs, K.L., Davidson, R.M., & Buell, C.R. (2011). Gene co-expression network analysis as a source of functional annotation for rice genes. PLOS ONE, 6(7), e22196.
Cho, R.J., Campbell, M.J., Winzeler, E.A., Steinmetz, L., Conway, A., Wodicka, L., et al. (1998). A genome-wide transcriptional analysis of the micotic cell cycle. Molecular Cell, 2(1), 65-73.
Davidson, R.M., Hansey, C.N., Gowda, M., Childs, L.K., Lin, H., Vaillancourt, B., et al. (2011). Utility of RNA sequencing for analysis of maize reproductive transcriptomes. The Plant Genome, 4(3), 191-203.
Derbyshire, P., Drea, S., Shaw, P.J., Doonan, J.H., Dolan, L. (2008). Proximal-distal patterns of transcription factor gene expression during Arabidopsis root development. Journal of Experimental Botany, 59(2), 235-245.
Dorogovtsev, S.N., Goltsev, A.V., & Mendes, J.F.F. (2002). Pseudofractal scale-free web. Physical. Review E, 65, 066122.
Erdos, P., & Renyi, A. (1959). On random graphs. Publicationes Mathematicae (Debrecen), 6, 290-297.
Ficklin, S.P., & Feltus, A. (2011). Gene co-expression network alignment and conservation of gene modules between two grass species: Maize and rice. Plant Physiology, 156(3), 1244-1256.
Ficklin, S.P., Luo, F., & Feltus, A. (2010). The association of multiple interacting genes with specific phenotypes in rice using gene co-expression networks. Plant Physiology, 154(1), 13-24.
Fu, F.F., & Xue, H.W. (2010). Co-expression analysis identifies rice starch regulator: A rice AP2/EREBP family transcription factor as a novel rice starch biosynthesis regulator. Plant Physiology, 154(2), 927-938.
Fukushima, A., Kanaya, S., & Arita, M. (2009). Characterizing gene co-expression modules in Oryza sativa based on a graph-clustering approach. Plant Biotechnology, 26(5), 485-494.
Garrett, K.K., Dendy, S.P., Frank, E.E., Rouse, M.N., & Travers, S.E., (2006). Climate change effects on plant disease: Genomes to ecosystems. Annual Review Phytopathology, 44, 489-509.
Geils, BW. (1992). Analyzing landscape patterns caused by forest pathogens: A review of the literature. In S. Frankle (Ed.), Proceedings of the 40th Western International Forest Disease Work Conference. Durango, CO: USDA, Forest Service, Pacific Southwest Region, 21-32.
Gifford, M.L., Dean, A., Gutiérrez, R.A., Coruzzi, G.M., & Birnbaum, K.D. (2008). Cell-specific nitrogen responses mediate developmental plasticity. Proceedings of the National Academy of Sciences, 105(2), 803-808.
Gilligan, P., Brenner, S., & Venkatesh, B. (2002). Fugu and human sequence comparison identifies novel human genes and conserved non-coding sequences. Elsevier Science, 294(1-2), 35-44.
Gutiérrez, R.A., Lejay, L.V., Dean, A., Chiaromonte, F., Shasha, D.E., & Coruzzi, G.M. (2007). Qualitative network models and genome-wide expression data define carbon/nitrogen-responsive molecular machines in Arabidopsis. Genome Biology, 8(1), R7.
Gutiérrez, R.A., Gifford, M.L., Poultney, C., Wang, R., Shasha, D.E., Coruzzi, G.M., Crawford, N.M. (2007). Insights into the genomic nitrate response using genetics and the Sungear Software System. Journal of Experimental Botany, 58(9), 2359-2367.
Gutiérrez, R.A., Stokes, T.L., Thum, K., Xu, X., Obertello, M., Katari, M.S., et al. (2008). Systems approach identifies an organic nitrogen-responsive gene network that is regulated by the master clock control gene CCA1. Proceedings of the National Academy of Sciences, 105(12), 4939-4944.
Hamada, K., Hongo, K., Suwabe, K., Shimizu, A., Nagayama, T., Abe, R., et al. (2011). OryzaExpress: An integrated database of gene expression networks and osmic annotations in rice. Plant Cell Physiology, 52(2), 220-229.
Hirai, M.Y., Yano, M., Goodenowe, D.B., Kanaya, S., Kimura, T., Awazuharas, M., et al. (2004). Integration of transcriptomics and metabolomics for understanding of global responses to nutritional stresses in Arabidopsis thaliana. Proceedings of the National Academy of Sciences, 101(27), 10205-10210.
Holdenrieder, S., Stieber, P., & Pawel, J. (2004). Circulating nucleosomes predict the responses to chemotheraphy in patients with advanced non-small cell lung cancer. Clinical Cancer Research, 10, 5981-5987.
Holland, J.L. (1975). Dilemmas and remedies. Personnel and Guidance Journal, 53, 517-519.
Honys, D., & Twell, D. (2004). Transcriptome analysis of haploid male gametophyte development in Arabidopsis. Genome Biology, 5(11), R85.
Horan, K., Jang, C., Bailey-Serres, J., Mittler, R., Shelton, C., Harper, J.F., et al. (2008), Annotating genes of known and unknown function by large scale co-expression analysis. Plant Physiology, 147(1), 41-57.
Jaeger, J., Blagov, M., Kosman, D., Kozlov, K.N., Manu, Myasnikova, E., et al. (2004). Dynamical analysis of regulatory interactions in the gap gene system of drosophila melanogaster. Genetics, 167(4), 1721-1737.
Jen, C.H., Manfield, I.W., Michalopoulos, I., Pinney, J.W., Willats, W.G., Gilmartin, P.M., & Westhead, D.R. (2006). The Arabidopsis co-expression tool (act): A WWW-based tool and database for microarray-based gene expression analysis. Plant Journal, 46(2), 336-348.
Koichiro, A, Go, S., Keita, S., Tokunori, H., Hirokazu, T., Katsuhiro S., et al. (2011). Comprehensive network analysis of anther-expressed genes in rice by the combination of 33 laser microdissection and 143 spatiotemporal microarrays. PLOS ONE, 6(10), e26162.
Koza, J.R. (1992). Genetic programming: on the programming of computers by means of natural selection. Cambridge, MA: MIT Press.
Lee, T.H., Kim, Y.K., Pham, T.T., Song, S.I., Kim, J.K., Kang, K.Y., et al. (2009). RiceArrayNet: A database for correlating gene expression from transcriptome profiling and its application to the analysis of co-expressed genes in rice. Plant Physiology, 151(1), 16-33.
Leea, I., Seo, Y.S., Coltrane, D., Hwang, S., Oh, T., Marcotte, E.M., & Ronald, P.C. (2011). Genetic dissection of the biotic stress response using a genome-scale gene network for rice. Proceedings of National Academy of Sciences, 108(45), 18548-18553.
Lensberg, T. (1999). Investment behavior under Knightian uncertainty: An evolutionary approach. Journal of Economic and Dynamic Control, 23, 1587-1604.
Liu, X., Fu, J., Gu, D., Liu, W., Liu, T., Peng, Y., et al. (2008). Genome-wide analysis of gene expression profiles during the kernel development of maize (Zea mays L.). Genomics, 91(4), 378-387.
Lian, X., Wang, S., Zhang, J., Feng, Q., Zhang, L., Fan, D., et al. (2006). Expression profiles of 10,422 genes at early stage of low nitrogenstress in rice assayed using a cDNA microarray. Plant Molecular Biology, 60, 617-631.
Lundquist, J.E., & Hamelin, R.C. (2005). Forest pathology: From genes to landscapes. American Phytopathological Society, 175.
Ma, S., Gong, Q., & Bohnert, H.J. (2007). An Arabidopsis gene network based on the graphical Gaussian model. Genome Research, 17(11), 1614-1625.
Mao, L., Hemert, J., Das, S., & Dickerson, J.A. (2009). Arabidopsis gene co-expression network and its functional modules. BMC Bioinformatics, 10, 346.
Meier, S., Gehring, C., Ross, C., MacPherson, Kaur, M., Maqungo, M., et al. <(2008). The promoter signatures in rice LEA genes can be used to build a co-expressing LEA gene network. Rice, 1(2), 177-187.
Menezes, M.A., & Barabasi, A.L. (2004). Fluctuations in network dynamics. Physical Review Letters, 92, 028701.
Michalak, P. (2008). Coexpression, coregulation and cofunctionality of neighboring genes in aukaryotic genomes. Genomics, 91(3), 243-248.
Movahedi, S. Van De Peer, Y. & Vanderpoele, K. (2011), Comparative network analysis reveals that tissue specificity and gene function are important factors influencing the mode of expression evolution in Arabidopsis and rice. Plant Physiology, 156, 1316-1330.
Nikiforova, V., Freitag, J., Kempa, S., Adamik, M., Hesse, H., & Hoefgen, R. (2003). Transcriptome analysis of sulfur depletion in Arabidopsis thaliana: Interlacing of biosynthetic pathways provides response specificity. The Plant Journal: For Cell and Molecular Biology, 33(4), 633-650.
Nikiforova, V.J., Daub, C.O., Hesse, H., Willmitzer, L., Hoefgen, R. (2005). Integrative gene-metabolite network with implemented causality deciphers informational fluxes of sulphur stress response. Journal of Experimental Botany, 56(417), 1887-1896.
Noort, V., Snel, B., & Huynen, M.A. (2004). The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. European Molecular Biology Organization Report, 5(3), 280-284.
Obayashi, T., Kinoshita, K., Nakai, K., Shibaoka, M., Hayashi, S., Saeki, M., et al. (2007). ATTED-II: A database of co-expressed genes and cis-regulatory elements for identifying co-regulated gene groups in Arabidopsis. Nucleid Acid Research, 35, D863-869.
Obayashi, T., & Kinoshita, K. (2010). COEXPRESdb: A database to compare gene co-expression in seven model animals. Nucleic Acid Research, 39, D1016-1022.
Ogata, Y., Suzuki, H., Sakurai, N., & Shibata, D. (2010) Cop: A database for characterizing co-expressed gene modules with biological information in plants. Bioinformatics, 26(9), 1267-1268.
Pautasso, M., Holdenrieder, O., & Stenlid, J. (2005). Susceptibility to fungal pathogens of forests differing in tree diversity. In M. Scherer-Lorenzen, C. Koerner, & D. Schulze (Eds.), Forest diversity and function, 263-289.
Persson, S., Wei, H., Milne, J., Page, G.P., & Somerville, C.R. (2005). Identification of genes required for cellulose synthesis by regression analysis of public microarray data sets. Proceedings of the National Academy of Sciences, 102(24), 8633-8638.
Qiu, D., Xiao, J., Xie, W., Hongbo, L., Xianghua, L., Xiong, L., & Wang, S. (2008). Rice gene network inferred from expression profiling of plants overexpressing OsWRKY13 a positive regulate of disease resistance. Molecular Plant, 1(3), 538-551.
Ravatz, E., & Barabasi, A.L. (2003). Hierarchical organization in complex networks. Physical Review, E67, 026112.
Ruan, J., Perez, J., Hernandez, B., Lei, C., Sunter, G., & Sponsel, M.V. (2011). Systematic identification of functional modules and cis-regulatory elements in Arabidopsis thaliana. BMC Bioinformatics, 12, S2.
Schnable, P.S., Ware, D., Fulton, R.S., Stein, J.C., Wei, F., Pasternak, S., et al. (2009). The B73 maize genome: Complexity, diversity and dynamics. Science, 326(5956), 1112-1115.
Shinozaki, K., & Yamaguchi-Shinozaki, K. (2006). Gene networks involved in drought stress response and tolerance. Journal of Experimental Botany, 58(2), 221-227.
Slavich, G.M., & Cole, S.M. (2013). The emerging field of human social genomics. Clinical Psychological Science, 1(3), 331-348.
Stein, R.J., & Waters, B.M. (2012). Use of natural variation reveals core genes in the transcriptome of iron-deficient Arabidopsis thaliana roots. Journal of Experimental Botany, 63(2), 1039-1055.
Strogatz, S.H. (2001). Exploring complex networks. Nature, 410, 268-276.
Stukenbrock, E.H., Banke, S., & McDonald, B.A. (2006). Global migration patterns in the fungal wheat pathogen Phaeosphaeria nodorum. Molecular Ecology, 15, 2895-2904.
Swarbreck, S.M., Defoin-Platel, M., Hindle, M., Saqi, M., & Habash, D.Z. (2011). New perspectives on glutamine synthetase in grasses. Journal of Experimental Botany, 62(4), 1511-1522.
Tanksley, S.D., & McCouch, S.R. (1997). Seed banks and molecular maps: Unlocking genetic potential from the wild. Science, 277(5329), 1063-1369.
Vandepoele, K., Quimbaya, M., Casneuf, T., Veylder, L., & Van De Peer, Y. (2009). Unraveling transcriptional control in Arabidopsis using cis-regulatory elements and co-expression networks. Plant Physiology, 535-549.
Wang, R., Guegler, K., LaBrie, S.T., & Crawford, N.M. (2000). Genomic analysis of a nutrient response in Arabidopsis reveals diverse expression patterns and novel metabolic and potential regulatory genes induced by nitrate. The Plant Cell, 12(8), 1491-509.
Wei, H., Persson, S., Mehta, T., Srinivasainagendra, V., Chen, L., Page, G.P., et al. (2006). Transcriptional coordination of the metabolic network in Arabidopsis. Plant Physiology, 142(6), 762-774.
Wille, A., Zimmermann, P., Vranová, E., Fürholz, A., Laule, O., Bleuler, S., et al. (2004). Sparse graphical Gaussian modeling of the isoprenoid gene network in Arabidopsis thaliana. Genome Biology, 5, R92.
Wilson, E.O. (1998). Consilience. New York: Knopf.
Xue, L.J., Zhang, J-J, & Xue, H. (2012). Genome-wide analysis of complex transcriptional networks of rice developing seeds. PLOS ONE, 7(2), e31081.
Yang, T.J.W., Lin, W-D., & Schmidt, W. (2010). Transcriptional profiling of the Arabidopsis iron deficiency response reveals conserved transition metal homeostasis networks. Plant Physiology, 152(4), 2130-2141.
Yeung Ka, Y., Medvedovic, M., & Bumgarner, R.E. (2004). From coexpression to coregulation: How many microarray experiments do we need? Genome Biology, 5(7), R48.
Zhang, Y., Zha, H., Wang, J.Z., & Chu, C.H. (2004). Gene co-regulation vs. co-expression. University Park, PA: The Pennsylvania State University.
Zheng, X., Liu, T., Yang Z., & Wang, J. (2011). Large clique in Arabidopsis gene co-expression network and motif discovery. Plant Physiology, 168(6), 611-618.
Tables A1 and A2 summarize the studies considered in our analysis.
Table A1. Summary of the studies considered in our analysis.
Table A2. Summary of the studies examined for our analysis.
Suggested citation: Scandizzo, P.L., & Imperiali, A. (2014). The existence and the socio-economic implications of genetic networks: A meta-analysis. AgBioForum, 17(1), 44-69. Available on the World Wide Web: http://www.agbioforum.org.
In This ArticleFigure 1
|© 2014 AgBioForum | Design and support provided by Express Academic Services | Contact ABF: email@example.com|