- Research
- Open Access
- Open Peer Review
Microarray enriched gene rank
- Eugene Demidenko^{1}Email author
https://doi.org/10.1186/s13040-014-0033-1
© Demidenko; licensee BioMed Central. 2015
- Received: 21 June 2014
- Accepted: 4 December 2014
- Published: 17 January 2015
Abstract
Background
We develop a new concept that reflects how genes are connected based on microarray data using the coefficient of determination (the squared Pearson correlation coefficient). Our gene rank combines a priori knowledge about gene connectivity, say, from the Gene Ontology (GO) database, and the microarray expression data at hand, called the microarray enriched gene rank, or simply gene rank (GR). GR, similarly to Google PageRank, is defined in a recursive fashion and is computed as the left maximum eigenvector of a stochastic matrix derived from microarray expression data. An efficient algorithm is devised that allows computation of GR for 50 thousand genes with 500 samples within minutes on a personal computer using the public domain statistical package R.
Results
Computation of GR is illustrated with several microarray data sets. In particular, we apply GR (1) to answer whether bad genes are more connected than good genes in relation with cancer patient survival, (2) to associate gene connectivity with cluster/subtypes in ovarian cancer tumors, and to determine whether gene connectivity changes (3) from organ to organ within the same organism and (4) between organisms.
Conclusions
We have shown by examples that findings based on GR confirm biological expectations. GR may be used for hypothesis generation on gene pathways. It may be used for a homogeneous sample or for comparison of gene connectivity among cases and controls, or in longitudinal setting.
Keywords
- Ovarian Cancer
- Gene Ontology
- Microarray Data
- Gene Rank
- Stochastic Matrix
Introduction
The key element of Google’s success is the PageRank that determines the order of webpages displayed for a keyword search. The original algorithm by Page et al. [1] received considerable attention. The idea to use the PageRank to rank genes with respect to their connection to other genes seems obvious and indeed is not new: GeneRank developed by Morrison et al. [2] is a straightforward generalization of PageRank of Google. Two versions of GeneRank are suggested: (1) using GO annotation or (2) Pearson correlation coefficient, r. In both cases, the entries of the connectivity matrix are binary: 0 genes are not connected and 1 if genes are connected. In the correlation version (2), the genes defined connected if r>0.5. Both versions suffer from series limitations: (1) The GO annotation version is difficult to realize in practice because, e.g. with n=50,000 number of genes one has to fill in the connectivity matrix with 2.5×10^{9} elements. Moreover, since the set of genes varies from study to study and from organ to organ, the researcher faces a daunting problem with every new set of microarray experiments. (2) The correlation version brushes off negative correlations which may be very important, and the threshold 0.5 has no justification. The gene rank we suggest is free of these shortcomings: (1) it combines GO library with correlation so that only the connectivity of a small proportion of genes can specified, (2) no thresholding is applied since we use the squared correlation coefficient (coefficient of determination).
The goal of this work is to equip the researcher with a new concept, microarray enriched gene rank (GR), that combines a priori knowledge about gene connectivity with researcher-derived microarray data, and can be computed on his/her own personal computer with as many as 50,000 genes and 500 samples within few minutes.
Traditional gene ranking methods using microarray data are based on ordering of t-statistics (or respective p-values) when the means between cases and controls are compared, or in a more general case, when the microarray sample is correlated with a phenotype, see Winter et al. [3], Zuber et al. [4]. Much of the literature studies issues related to minimization of the false discovery rate or correlation between genes; see Opgen-Rhein et al. [5], Nitsch et al. [6], Masoudi-Nejad et al. [7]. We, however, consider the gene connectivity problem regardless of the phenotype. The assumption of the present work is that gene connectivity can be adequately expressed in terms of the gene pairwise squared correlation matrix, therefore the phenotype is not required. However, the association with phenotype or disease status can be examined further, such as through comparison of GRs between controls and cases.
GR reflects the complexity of genetic organization and is illustrated with several existing microarray data sets. This new genetic quantity gives rise to new biological insights, such as connectivity within clusters/subtypes, between organs within the same organism, and between organisms. GR can be used to discover gene pathways and track those under different experimental conditions or time wise.
Methods
We assume that the gene expression data are presented as an n×m matrix, where rows are genes (the number of rows equal n) and columns are samples (the number of samples/conditions equals m). Hereafter, we use boldface to denote vectors and matrices, and the subscript i to indicate the ith gene and j to indicate the jth sample. It is assumed that, for each gene i, the sample {x_{ ij },j=1,2,…,m} consists of independent identically distributed (iid) observations (microarray measurements); moreover, n samples belong to a multivariate normal distribution. In practice, we may apply a nonlinear transformation, such as the log-transformation, to avoid skewness, when it is appropriate—say, when observations are positive. Under these assumptions, the n×n pairwise Pearson correlation matrix computed from the matrix data, R={r_{ ij }}, is an appropriate measure of genes connectivity; the negative values indicate negative relationships and the positive values indicate positive connectivity. Our concern is the gene connectivity regardless of the negative or positive relationship, so the squared correlation coefficient, or the coefficient of determination, should be used. In fact, the squared correlation coefficient is more interpretable in the statistical sense than the traditional correlation coefficient. Namely, \(r_{\textit {ij}}^{2}\) indicates the proportion of the variance of the ith gene explained by the jth gene and vice verse \(\left (r_{\textit {ij}}^{2}=r_{\textit {ji}}^{2}\right)\). If \(r_{\textit {ij}}^{2}\) is close to zero, genes i and j are almost independent; if \(r_{\textit {ij}}^{2}\) is close to 1 genes i and j are almost linearly related.
The n×n matrix with these entries will be referred to as the normalized squared correlation matrix\(\mathbf {R}_{\ast }^{2}\). Matrix \(\mathbf {R}_{\ast }^{2}\) belongs to the family of stochastic matrices: All elements are nonnegative and the sum of elements in each row is one.
Now let p_{ j } represent the rank of gene j. Another way to compute the gene connectivity is to use the weighted sum of squared correlations with weights p_{ i }: \(\sum _{i=1}^{n}p_{i}r_{\ast ij}^{2}\). This means that the squared correlations are weighted with respect to the connectivity and as in the PageRank, leads to a recursive definition of p. Let the nonnegative and less or equal to one a_{ j } be the a priori gene j connectivity measure, 0≤a_{ j }≤1. If a_{ j }=0, gene expression data adds nothing to a priori connectivity; if a_{ j }=1, then the connectivity is solely derived from the expression data. Measure a_{ j } may represent our biological knowledge about gene connectivity, or it may be a noninformative a priori distribution frequently used in the Bayesian approach, Gelman et al. [9].
The first term on the right-hand side, (1−a_{ j })/n, represents our a priori knowledge about the connectivity of gene j. The second term, \( a_{j}\sum p_{i}r_{\ast ij}^{2}\), represents the connectivity derived from the gene expression data at hand. Since p_{ j } combines a priori knowledge about gene connectivity with microarray data experiments, we call it the microarray enriched gene rank, or simply gene rank (GR).
where 1 is the n×1 vector of ones, a is the n×1 vector of {a_{ j },j=1,..,n}, and A is the n×n diagonal matrix with a on the diagonal, A=diag(a).
Definition1.
It is proven in the online methods that matrix H is a stochastic matrix, and by the Perron-Frobenius theorem (Berman et al. [10]) there exists an eigenvector p with nonnegative elements such that p^{′}=p^{′}H, or equivalently p=H^{′}p. This eigenvector has unit length and corresponds to the unit eigenvalue of matrix H^{′}. Because matrix H has all positive elements in our case, this eigenvalue is maximum and other eigenvalues are positive but smaller than one. Hereafter, we refer to GR as the left maximum eigenvector of matrix H. In a special case when all gene connectivities are a priori set to zero, we have a_{ j }=1 and the GR of gene j is proportional to the sum of squared correlations in row j. In our computations below, we assume the noninformative prior connectivity distribution, a_{ j }=0.9=const (Google uses a_{ j }=0.85). Our results reported below are fairly robust to the choice of this constant due to large n.
When n is of the order of a few thousands, standard methods of eigenvector computation may be used. Several authors suggest efficient algorithms for computation of the maximum left eigenvector of a stochastic matrix (Golub and Greif [11], Wu et al. [12]). However, when n is of the order of tens of thousands, new algorithms are required because even storing the squared correlation matrix is problematic. An efficient method for computation of GR using a public domain statistical package R does not require storing R^{2} and is outlined in the online methods with the R [13] script provided. For example, computation of GR for a microarray data with 50,000 genes and 500 samples takes only few minutes on a personal computer.
Gene rank and cluster analysis
This formula hints to a close relationship between cluster analysis and our GR. As follows from this formula, it is plausible to expect that genes from the same cluster have high rank because they are close to each other. Similarly, if the density of GR is a mixture of several components, these components may be associated with gene clusters, so the number of components would be equal to the number of clusters. We illustrate this association with several data examples below.
Connection of a specific gene to other genes
All c_{ ij } are nonnegative and the sum of c_{ ij } over j=1,2,..,n is 100%. We illustrate this decomposition with several examples below.
Results
In this section, we illustrate the computation of the gene rank as the left maximum eigenvector and show how this measure generates insights in our understanding of genes’ connection across organisms and across organs within the same organism.
Gene rank for studying the survival of diffuse large-B-cell lymphoma patients
The paper by Rosenwald et al. [14] is a pioneering work in which 7,399 gene-expression profiles derived from biopsies of 240 diffuse larger-B-cell lymphoma patients are used to predict cancer survival after chemotherapy. Following this phenotype-based ranking, genes have been prioritized with respect to the p-value of the coefficient at the gene expression variable in the Cox proportional hazard model. In other words, a gene had a high rank if it was a good predictor of the survival. The goal of this section is to understand how this phenotype-based ranking is related to our microarray enriched gene rank reflecting the gene connectivity. Specifically, we want to know whether these two gene characteristics are positively (or negatively) correlated.
The solid lines depict linear regression between two measures for bad and good genes with approximate equal slope of about 7 but different intercepts (p-values <10^{−16}). Good genes are more connected than bad ones, but what is remarkable is that an increase by one order of the p-value leads to a 7% increase in the gene connectivity in both groups. One plausible explanation is that good genes are more connected since they represent unaffected genes of normal tissue. Bad genes, which underwent mutation, do not have enough connection with other genes and therefore their rank is smaller on average. The fact that the regression lines are parallel in both groups has yet to be explained.
Ovarian carcinoma microarray data
Classification of ovarian carcinoma could significantly improve treatment outcomes by identifying the subtype of the ovarian cancer [15]. Considerable effort has been devoted by The Cancer Genome Atlas (TCGA) Research Network researchers to carrying out gene expression experiments to identify clusters of genes that could better classify the disease and predict the treatment outcome [16]. We use the TCGA microarray data set with 11,864 gene microarray expressions obtained from 489 ovary biopsies recently analyzed using cluster algorithm techniques, Verhaak et al. [17]. At least four subtypes of genes have been identified with substantial overlap. Below, we suggest gene connectivity analysis based on the GR concept. We hypothesize that genes within the cluster have stronger connectivity and GR density can be used to identify groups of genes.
Several the least connected genes are mentioned in the literature in association with ovarian cancer: The most prominent is TP3 (P53) which mutations lead to various types of cancer as reported in Schuijer and Berns [20]. It was documented by Gates et al. [21] that gene GSTT1 produces enzymes which catalyze carcinogens and increase the chance of the ovarian cancer; gene KL is associated with several types of cancer such as breast and lung, besides ovarian cancer; HTR1F is named among genes associated with ovarian cancer, see Cody et al. [22].
where \(\phi \left (p;\mu _{i},{\sigma _{i}^{2}}\right)\) denotes the Gaussian density with mean μ_{ i } and variance \({\sigma _{i}^{2}}\) and π_{ i } represents the proportion of genes in the ith component, \(\sum \pi _{i}=1\). Since the distribution of GR is skewed and positive, the mixture distribution is plotted on the log scale and then transformed back to original values. The result of parameter estimation using the Expectation-Maximization (EM) algorithm (Murphy [25]) is presented graphically in Figure 4. Four clusters/subtypes correspond to four density components depicted with different color. The majority of the genes (the first subtype probability, π_{1}= 46%, red) have the lowest connectivity with mode = 0.0065; the second and third subtypes (green and blue) have close connectivity with probability close to 25%. Genes from the fourth subtype are closely connected and they constitute the minority (probability, π_{4}= 3%).
Gene rank transformation during rice growth
The paper by Fujita et al. [26] examines how microarray gene expressions change during rice reproduction starting from pollination–fertilization and ending with zygote formation. The rice gene expression data comprise n=57,381 probes/genes and m=99 samples; the data were obtained through the NCBI GEO DataSets website: http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/gds/. Rice, as well as other plants, goes through four stages of developmental progression: (1) anther development, (2) pollination–fertilization, (3) embryogenesis: the bottom quarter, and (4) embryogenesis: the top three quarters. We use GR to answer the following questions. Does the complexity of the gene network characterized by GR reflect the four transformations? If so, can one quantify the proportion of the gene network built at each stage? The assumption is that the correlation/connectivity between genes changes with the phenotype (in this case stage of the rice development progression). For validation purpose, we act in the reverse fashion: compute correlations using all data and see whether the distribution of GR reflects the four groups (the same approach is used for the Drosophila data in the next section). A similar approach has been recently used by Langfelder et al. [27].
Gaussian mixture GR distribution of rice growth
Developmental stage | Mean | SD | Prop |
---|---|---|---|
μ | σ | π | |
Anther development | 15.9 | 4.97 | 0.203 |
Pollination–fertilization | 27.5 | 6.32 | 0.252 |
Embryogenesis: the bottom quarter | 50.6 | 11.8 | 0.311 |
Embryogenesis: the top three quarters | 71.1 | 9.34 | 0.234 |
It seems obvious that later rice genome development should increase the complexity of the gene network. Can this statement be supported by our GR distribution at each stage? We use the cumulative distribution function (cdf) to answer this question.
Definition2.
Let X and Y be two random variables. We say that X is smaller than Y in stochastic sense if Pr(X≤z)≥ Pr(Y≤z), or in terms of the cumulative distribution function F_{ X }(z)≥F_{ Y }(z), for every real z.
Comparison of gene rank across Drosophila organs
Comparison of the gene rank across organisms
Conclusions
We have introduced a new concept that reflects gene connectivity, the gene rank (GR). GR augments our a priori knowledge about gene connection using microarray experiment data. We have shown by examples that the new measure reflects biological expectations and may lead to insights of gene network development in time, across organs of the same organism and across organisms. GR may be used for hypothesis generation in establishment of gene pathways. For example, the fact that many least connected genes are among genes directly associated with ovarian cancer is illuminating.
Traditional gene ranking based on the t-test or, in general, based on correlation with a phenotype and our GR look at the microarray data matrix orthogonally. The phenotype approach uses the horizontal correlation of genes with phenotype. In contrast, GR reflects the vertical correlation across genes, so the phenotype information is not used. Both angles of microarray data analysis are important and valid. However, we believe that GR is more fundamental from the biological point of view, although the t-test may be more clinically important.
We have shown that GR can be used for identification of gene hubs (Choi and Kendziorski [29], Wu et al. [30]) as an alternative to cluster analysis. This cross-gene measure can be used to modify the t-test and the respective threshold for the p-value to reduce the false discovery rate in genome-wide association studies; see Zuber and Strimmer [4], Yassouridis [31].
In this work, a simple constant a priori assumption (constant damping factor) was used for gene connectivity. More work must be done to enrich the GR computation with available gene network connection information such as GO or IMP (Wong et al. [32]). For example, if A_{ ij } is a binary connectivity matrix build on the basis of GO annotations (A_{ ij }=1 if genes i and j are connected and 0 otherwise, assuming that A_{ ii }=1) we may set \(a_{j}=1- \sum _{i=1}^{n}A_{\textit {ij}}/\sum _{i=1}^{n} \sum _{j=1}^{n}A_{\textit {ij}}\). Use of a priori biological information may leverage the possibility of false correlation especially important in the case of a small sample size, m.
We have presented just a small amount of empirical evidence that GR can be used to characterize the complexity of genes’ connections in organs and organisms. More studies are required to understand its usefulness as a characteristic of genetic complexity.
Appendix
Proof that matrix H is a stochastic matrix
matrix H is a stochastic matrix and, therefore, the Perron-Frobenius theorem applies.
Computation of the left maximum eigenvector
Below we describe three methods for computation of GR under three assumptions regarding the size of the microarray matrix. In all computations the noninformative prior on gene connectivity is used with constant a=0.9. In case of an informative prior, vector a should specify connectivity between 0 and 1.
1. Using the built-in function eigen in R for smalln. When the number of genes is fairly small, say, n<5000 computation of the left maximum eigenvector can be done using a built-in function in various software packages. For example, in R this function is eigen.
Note that if p is an eigenvector −p is an eigenvector. too. Therefore, we take the absolute value of the eigenvector of H^{′} in the last line.
This method is referred to as the power method [33]. The following R code realizes this method starting from p_{0}=R^{2}1/∥R^{2}1∥. As in the previous method we set a=0.9;eps and maxit specify the convergence tolerance and the maximum number of iterations.
Typically, this algorithm requires less than 10 iterations to converge. Moreover, for fairly large n, say, n>1000 it may be even faster and more precise then using the eigen function because eigen computes all eigenvalues and eigenvectors, but the power method computes only the maximum eigenvector.
3. Computation of the gene rank for largen, say, n>10000. As in many datasets, even computation of an 50000×50000 correlation matrix using the built-in function cor becomes troublesome (R reports an error due to lack of memory). Below, we suggest an algorithm based on the power method without computation of the correlation matrix. This algorithm is especially effective when n is large and m is relatively small, say, m<500. For example, it took 3 minutes to compute GR for the rice data usint R version 3.0.1 on my Acer laptop (Windows 7, 64 bit, 4 GB RAM, processor Intel(R) CPU @ 1.60 GHz).
so that the rows of the n×m matrix Z are \(\mathbf {z} _{i}^{\prime }\). Importantly, computation of the normalized matrix, Z may be accomplished in a vectorized fashion in R with no loop and therefore very efficiently, as shown below.
Then, obviously, R=ZZ^{′}. In the theorem below, we express R^{2}1 and more generally \((\mathbf {R}_{\ast }^{2})^{\prime }\mathbf {p}\) without computation of R^{2}. The formulas require a double loop over m, but for small m, it is more efficient than computation of a large R^{2}.
Theorem1.
Proof.
Below, we show the R code that implements this algorithm.
Declarations
Acknowledgments
I thank my colleagues at Dartmouth, Drs. Jennifer Doherty and Scott Williams, who read a draft of the manuscript and made useful comments and suggestions, and Jason Moore for his support. Also I am thankful to Brett McKinney, the reviewer for his comments and constructive criticism. This work was supported by the NIH grants P20RR024475 and P20GM103534.
Authors’ Affiliations
References
- Page L, Brin S, Motwani R, Winograd T. The PageRank citation ranking: bringing order to the web, Technical report, Stanford Digital Library Technologies Project 1998. http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf.
- Morrison JL, Breitling R, Higham DJ, Gilbert DR. GeneRank: Using search-engine technology for the analysis of microarray experiments. BMC Bioinformatics. 2005; 6:233–46.View ArticlePubMedPubMed CentralGoogle Scholar
- Winter C, Kristiansen G, Kersting S, Roy J, AustD, Knösel T, Rümmele P, et al. Google goes cancer: Improving outcome prediction for cancer patients by network-based ranking of marker genes. PLOS Comput Biol. 2012; 8:e1002511.View ArticlePubMedPubMed CentralGoogle Scholar
- Zuber V, Strimmer K. Gene ranking and biomarker discovery under correlation. Bioinformatics. 2009; 25:2700–7.View ArticlePubMedGoogle Scholar
- Opgen-Rhein R, Strimmer K. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Stat Appl Genet Mol Biol. 2007; 6:9.Google Scholar
- Nitsch D, Goncalves JP, Ojeda F, de Moor B, Moreau Y. Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinformatics. 2010; 11:460.View ArticlePubMedPubMed CentralGoogle Scholar
- Masoudi-Nejad A, Meshkin A, Haji-Eghrari B, Bidkhori G. Candidate gene prioritization. Mol Genet Genomics. 2012; 287:679–98.View ArticlePubMedGoogle Scholar
- Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008; 9:559.View ArticlePubMedPubMed CentralGoogle Scholar
- Gelman A, Carlin JB, Stern HS, Dunson DB. Bayesian Data Analysis, 3rd ed. Boca Raton: Chapman & Hall; 2013.Google Scholar
- Berman A, Plemmons RG. Nonnegative matrices in the mathematical sciences. PA: SIAM Philadelphia; 1994.View ArticleGoogle Scholar
- Golub GH, Greif C. An Arnoldi-type algorithm for computing page rank. BIT Numer Math. 2006; 46:759–71.View ArticleGoogle Scholar
- Wu G, Xu W, Wei Y. A preconditioned conjugate gradient algorithm for GeneRank with application to microarray data mining. Data Mining Knowl Discov. 2013; 26:27–56.View ArticleGoogle Scholar
- R Development Core Team. A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing; 2012. ISBN 3-900051-07-0, URL http://www.r-project.org/.Google Scholar
- Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, Gascoyne RD, Muller-Hermelink HK, Smeland EB, Staudt LM. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med. 2002; 346:1937–1947.View ArticlePubMedGoogle Scholar
- The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature. 2011; 474:609–15.View ArticlePubMed CentralGoogle Scholar
- The cancer genome atlas completes detailed ovarian cancer analysis. 2013. http://www.cancer.gov/newscenter/newsfromnci/2011/TCGAovarianNature.
- Verhaak RGW, Tamayo P, Yang J-Y, Hubbard D, Zhang H, Creighton CJ, et al. Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. J Clin Invest. 2013; 123:517–25.PubMedGoogle Scholar
- NBR, 2 neighbor of BRCA1 gene 2 (non-protein coding) [Homo sapiens (human). http://0-www.ncbi.nlm.nih.gov.brum.beds.ac.uk/gene/10230.
- Suen TC, Tang MS, Goss PE. Model of transcriptional regulation of the BRCA1-NBR2 bidirectional transcriptional unit. Biochim Biophys Acta. 2005; 1728:126–34.View ArticlePubMedGoogle Scholar
- Schuijer M, Berns EMJ. TP53 and ovarian cancer. Hum Mutat. 2003; 21:285–91.View ArticlePubMedGoogle Scholar
- Gates MA, Tworoger SS, Terry KL, Titus-Ernstoff L, Rosner B, De Vivo I, et al. Talc use, variants of the GSTM1, GSTT1, and NAT2 genes, and risk of epithelial ovarian cancer. Cancer Epidemiol Biomarkers Prev. 2008; 17:2436–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Cody NAL, Shen Z, Ripeau JS, Provencher DM, Mes-Masson AM, Chevrette M, et al. Characterization of the 3p12.3-pcen region associated with tumor suppression in a novel ovarian cancer cell line model genetically modified by chromosome 3 fragment transfer. Mol Carcinog. 2009; 48:1077–1092.View ArticlePubMedGoogle Scholar
- Hansen PC. Analysis of the discrete ill-posed problems by means of the L-curve. SIAM Rev. 1992; 34:561–580.View ArticleGoogle Scholar
- Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning, 2nd ed. New York: Springer; 2009.View ArticleGoogle Scholar
- Murphy KP. Machine learning. a probabilistic perspective. Cambridge, MA: The MIT Press; 2012.Google Scholar
- Fujita M, Horiuchi Y, Ueda Y, Mizuta Y, Kubo T, Yano K, et al. Rice expression atlas in reproductive development. Plants Cell Physiol. 2010; 51:2060–81.View ArticleGoogle Scholar
- Langfelder P, Mischel P, Horvath S. When is hub gene selection better than standard meta-analysis. Plos One. 2013; e61505:8.Google Scholar
- Chintapalli VR, Wang J, Dow JAT. Using FlyAtlas to identify better. Drosophila melanogaster models of human disease. Nat Genet. 2007; 39:715–20.View ArticlePubMedGoogle Scholar
- Choi Y, Kendziorski C.Statistical methods for gene set co-expression analysis. Bioinformatics. 2009; 25:2780–86.View ArticlePubMedPubMed CentralGoogle Scholar
- Wu XG, Huang H, Sonachalam M, Reinhard S, Shen J, Pandey R, et al. Reordering based integrative expression profiling for microarray classification. BMC Bioinformatics. 2012; 13(Suppl. 2):article S1.View ArticlePubMedGoogle Scholar
- Yassouridis A, Ludwig T, Steiger A, Leisch F. A new way of identifying biomarkers in biomedical basic-research studies. PLOS ONE. 2012; e35741:7.Google Scholar
- Wong AK, Park CY, Greene CS, Bongo LA, Guan YF, Troyanskaya OG. IMP: A multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks. Nucleic Acids Res. 2012; 40:W484–90.View ArticlePubMedPubMed CentralGoogle Scholar
- Golub GH, Van Loan CF. Matrix Computations, 3d ed. Baltimore: Johns Hopkins University Press; 1996.Google Scholar
- Schott JR. Matrix Analysis for Statistics, 2nd ed. New York: Wiley; 2005.Google Scholar
Copyright
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.