GSEA富集分析

作者: MJades | 来源:发表于2020-05-05 17:15 被阅读0次

专题：富集分析
GASE相关知识（Linux）
GSEA基因集富集分析
富集分析
单细胞之富集分析-5：一张图展示多组富集分析结果
单细胞之富集分析-6：PROGENy
单细胞之富集分析-7：Pseudobulk differenti
2022-04-17
GSEA笔记
GSEA富集分析流程及解释（R代码）【待完善】

GSEA is designed to analyze ranked lists of all available genes and does not require a threshold.

Y叔的clusterProfiler做GSEA分析超级方便～～

参数解读：

GSEA searches for pathways whose genes are enriched at the top or bottom of the ranked gene list, more so than expected by chance alone.

To calculate an enrichment score (ES) for a pathway, GSEA progressively examines genes from the top to the bottom of the ranked list, increasing the ES if a gene is part of the pathway and decreasing the score otherwise.

These running sum values are weighted, so that enrichment in the very top- (and bottom-) ranking genes is amplified, whereas enrichment in genes with more moderate ranks are not amplified.

The ES score is calculated as the maximum value of the running sum and normalized relative to pathway size, resulting in a normalized enrichment score (NES) that reflects the enrichment of the pathway in the list. Positive and negative NES values represent enrichment at the top and bottom of the list, respectively.

Finally, a permutation-based P value is computed and corrected for multiple testing to produce a permutation-based false-discovery rate (FDR) Q value that ranges from 0 (highly significant) to 1 (not significant)

Resulting pathways are selected using the FDR Q value threshold (e.g., Q < 0.05) and ranked using NES. In addition, the ‘leading edge’ aspect of the GSEA analysis identifies specific genes that most strongly contribute to the detected enrichment signal of a pathway.

从以上参数解读了解到，结果可通过 FDR q-value筛选，利用NES进行排序解读。

# Select gene set form msigdf
msigdf.human <-msigdf::msigdf.human 
GeneS<-msigdf.human[grepl("LYSOSOM",msigdf.human$geneset),]%>% select(geneset,   symbol) %>% as.data.frame

# gene.rank is a data.frame, which contains Gene and FC (Fold-Change)
gene.rank
gene.sort <- arrange(gene.rank, desc(logFC)) # 降序排列
geneList1<-gene.sort$FC
names(geneList1)<-gene.sort$Gene # 命名

# GSEA
gsea<- GSEA(geneList, TERM2GENE = GeneS, verbose=FALSE, pvalueCutoff = 0.5)
# Note: 1. 结果是不断变的，原因是by="fgsea"，可以将nPerm设置的大一些，默认是1000.
#       2. 原始文献中的选择：p value < 0.05， FDR < 0.25,可结合自己需求调整。

# 可视化第一个结果，gseaplot2中by函数可以选择部分结果展示，如"preranked".
gseaplot2(gsea,geneSetID=1,title=gsea$Description[1],pvalue_table=T)
# geneSetID多选择几个，即可实现多个gene set的可视化。

修图之后