美文网首页RNA-seq
一个R函数完成所有的富集分析

一个R函数完成所有的富集分析

作者: Z_bioinfo | 来源:发表于2022-07-31 17:20 被阅读0次

    仅仅一个函数,就可以完成差异表达基因的GO,GSEA,KEGG 分析与可视化,下面来看一看

    01 需要的R包

    library(AnnotationDbi)
    library(org.Hs.eg.db)  基因注释需要
    library(clusterProfiler)
    library(stringr)
    library(enrichplot)#画图需要
    library(stats)
    library(dplyr)
    library(msigdbr)
    

    02 定义R函数

    首先定义一个R函数,目的是:读入数据;进行富集分析。

    #定义R函数
    allEnrich <- function(df) {
    df = read.csv(df, stringsAsFactors = F,
                    header = T)
    names(df) = c("gene", "avg_logFC")
    #msigdbr 包提供多个物种的MSigDB数据
    b = msigdbr(species = "Homo sapiens", category = "C2") %>%
          select(gs_name, entrez_gene)
    a1 = df$avg_logFC
    a2 = easyConvert(
    species = "HUMAN",
    queryList = df$gene,
    queryType = "SYMBOL"
        ) 
    names(a1) = a2$ENTREZID
    b1 = a1[order(a1, decreasing = T)]
    GSEA分析
    gsearesults = GSEA(b1, TERM2GENE = b)
    #进行CC分析
    ego1 <- enrichGO(
    gene         = df$gene,
    OrgDb         = org.Hs.eg.db,
    keyType       = 'SYMBOL',
    ont           = "CC",
    pAdjustMethod = "BH",
    pvalueCutoff  = 0.01,
    qvalueCutoff  = 0.05
    )
    #绘图
    p1 = dotplot(ego1, showCategory = 15) 
    ggplot2::ggsave("CC.pdf", p1, width = 7, height = 8)
    #输出结果,保存到本地
    write.csv(ego1@result, "Cell_component.csv")
    =========================================
    #进行BP分析
    ego2 <- enrichGO(
    gene         = df$gene,
    OrgDb         = org.Hs.eg.db,
    keyType       = 'SYMBOL',
    ont           = "BP",
    pAdjustMethod = "BH",
    pvalueCutoff  = 0.01,
    qvalueCutoff  = 0.05
    )
    #绘图
    p2 = dotplot(ego2, showCategory = 15) 
    ggplot2::ggsave("BP.pdf", p2, width = 7, height = 8)
    #输出结果,保存到本地
    write.csv(ego2@result, "Biological_process.csv")
    =======================================
    #进行MF分析
    ego3 <- enrichGO(
    gene         = df$gene,
    OrgDb         = org.Hs.eg.db,
    keyType       = 'SYMBOL',
    ont           = "MF",
    pAdjustMethod = "BH",
    pvalueCutoff  = 0.01,
    qvalueCutoff  = 0.05
    )
    #绘图
    p3 = dotplot(ego3, showCategory = 15) 
    ggplot2::ggsave("MF.pdf", p3, width = 7, height = 8)
    #输出结果,保存到本地
    write.csv(ego3@result, "Molecular_function.csv")
    =========================================
    GSEA可视化
    p4 = gsearesults@result %>%
    mutate(Description = gsub("_", " ", Description)) %>%
    mutate(Description = reorder(Description, setSize)) %>% head(15) %>%
    ggplot(aes(
    x = enrichmentScore,
    y = Description,
    size = setSize,
    color = pvalue
    )) +
    geom_point() +
    scale_size(range = c(1, 5), name = "Size") +
    scale_color_gradient(low = "blue", high = "red") +
    xlab("GeneRatio") + ylab("") +
    theme(axis.text.y = element_text(size = 10, color = "black")) 
    
    ggplot2::ggsave("GSEA.pdf", p4, width = 7, height = 8)
    write.csv(gsearesults@result, "GSEA_Results.csv")
    ==================================
    KEGG分析
    ego4 = enrichKEGG(
    gene = names(a1),
    organism = "hsa",
    keyType = "kegg",
    pvalueCutoff = 0.05,
    pAdjustMethod = "BH",
    minGSSize = 10,
    maxGSSize = 500,
    qvalueCutoff = 0.2,
    use_internal_data = FALSE
    )
    p5 = dotplot(ego4, showCategory = 15) 
    ggplot2::ggsave("KEGG.pdf", p5, width = 7, height = 8)
    write.csv(ego4@result, "KEGG_pathway.csv")
        cat("Successfully~") 
    }
    

    03 读入数据分析

    使用定义好的R函数:allEnrich,读入数据进行分析,数据格式必须为2列:

    第一列 geneSymbol

    第二类 Log2FoldChange

    #给出包含geneSymbol,Log2FoldChange所在的csv文件
     allEnrich("significant_degs.csv")
    
     preparing geneSet collections...
    GSEA analysis...
    leading edge analysis...
    done...
    wrong orderBy parameter; set to default `orderBy = "x"`
    Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
    wrong orderBy parameter; set to default `orderBy = "x"`
    Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
    wrong orderBy parameter; set to default `orderBy = "x"`
    Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
    Reading KEGG annotation online:
    
    Reading KEGG annotation online:
    
    wrong orderBy parameter; set to default `orderBy = "x"`
    Scale for 'y' is already present. Adding another scale for 'y', which will replace the existing scale.
    Successfully~
    

    成功完成分析后,这是富集分析输出的所有结果:


    image.png

    查看结果文件:


    image.png

    相关文章

      网友评论

        本文标题:一个R函数完成所有的富集分析

        本文链接:https://www.haomeiwen.com/subject/bsikwrtx.html