最近在学习WGCNA时,遇到几个名词不是太懂是什么意思,在网上找了一下,发现有个答案但是正常上网是进不去,所以我就把他拿过来共有需要的人学习学习
one people
I am not sure in what context you are referring to the terms 'eigengene' and 'gene module'. But my best guess is you are talking about it in the context of WGCNA(Weighted Gene Co-expression Network Analysis).
If you want to do a wgcna analysis on a gene expression dataset, the general principle is, first, you build a correlation network between the genes based on their co-expression where a gene is a node and you put an edge between 2 genes if it passes a set threshold of co-expression strength. Sometimes people build a Topological Overlap Matrix (TOM)[1]on top of the correlation network but you do not need to worry about that at the moment. After you get a network, you do hierarchical clustering[2]on the most connected genes. This is an unsupervised learning method where a tree is built up from bottom to top by connecting the 2 most nearest genes in terms of a distance that you decide. That way when your tree is built you will have a number of clusters where the genes are tightly connected.
After getting the tree, you cut the tree at a certain distance, again why and how you do that is beautifully explained on the number 2 reference I provided. After cutting the tree, you get a number of modules where the genes are highly connected and may provide biological insights. These modules are called "gene modules".
When you want to compare one gene module against another, it can be advantageous to take only a representative of that module rather than taking all the genes. That is when you do a Principal Component Analysis[3] which can reduce your data meaningfully and then you take the first principal component as a summary of that module. This first principal component is called "eigengene" in this context.
You can find all of the necessary terminology regarding wgcna here[4]. A brilliant tutorial with every step of the WGCNA analysis can be found here[5]. It is written by the authors of the WGCNA R package.
another people
Genomic data such as gene expression data and variant data have very high dimensionality, i.e. there are too many variables, and few data points. When you have a gene expression dataset, you may be interested in identifying groups of genes which show similar expression patterns.
One of the ways to do this is WGCNA or weighted gene coexpression network analysis. In simple terms, what you're trying to do is identify genes which show similar expression patterns across samples or conditions. These gene groups are called modules. WGCNA identifies modules by using a type of Principle component analysis (PCA). Here, each module is represented by an expression value which belongs to the module 'eigengene'. This value is identified from the PCA. None of the actual genes in the module need to actually have this expression value.
Since each eigengene represents a module, the distance a gene from the eigengene, and therefore the centre of the module, can be calculated. This tells us which module each gene lies in.
网友评论