美文网首页R包单细胞测序转录组
CoRegNet: 共调控网络的重建和整合分析

CoRegNet: 共调控网络的重建和整合分析

作者: Hayley笔记 | 来源:发表于2021-11-16 20:13 被阅读0次

    1. 介绍

    CoRegNet包的目的是从转录组数据推断大型转录共调控网络,整合外部数据和基因调控信息去推断和分析转录组项目。这个包中独特的网络推断算法可以学习共调控网络。通过识别不同数据类型的基因共合作调节(co-operative regulators of genes)允许不同类型数据的整合到一个共表达分析中。

    #安装
    if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    
    BiocManager::install("CoRegNet")
    
    library(CoRegNet)
    data(CIT_BLCA_EXP,HumanTF,CIT_BLCA_Subgroup)
    dim(CIT_BLCA_EXP)
    # [1] 1000  183
    # 1000个基因x183个sample的基因表达矩阵
    length(intersect(rownames(CIT_BLCA_EXP),HumanTF))
    # [1] 67 ##1000个基因中有67个转录因子
    head(intersect(rownames(CIT_BLCA_EXP),HumanTF))
    # [1] "EEF1A1" "INSM1"  "HOXD1"  "IFI16"  "ID2"  "GRHL1" 
    

    2. Quick user guide(CoRegNet的主要功能)

    1. 从基因表达数据中重建大规模调控网络
    grn = hLICORN(CIT_BLCA_EXP, TFlist=HumanTF) 
    #输入的是基因表达矩阵和2020个转录因子的list(包内置的)
    
    2. 推断转录因子活性
    influence = regulatorInfluence(grn,CIT_BLCA_EXP)
    
    3. Retrieve inferred co-coregulators
    coregs= coregulators(grn)
    
    4. Analyze the network of cooperative regulators using an interactive display
    display(grn,CIT_BLCA_EXP,influence,clinicalData=CIT_BLCA_Subgroup)
    

    3详细功能和用法介绍

    3.1 Construction of a large-scale co-regulatory network from gene expression data

    CoRegNet包中的推理算法是LICORN 算法的hybrid版本。
    网络的重构包含四步:
    First, the gene expression data is discretized.
    Second, all the potential sets of cooperative regulators are extracted using the apriori algorithm of frequent itemset mining.
    Third, the best combinations of co-activators and co-inhibitors are identified for each gene.
    Finally, a continuous model of regulation using a linear regression method with interaction terms is used to score the local gene regulatory networks of each gene.

    使用CoRegNet做共表达至少需要输入两个dataset:

    1. 基因表达矩阵/数据框。要求行名为基因名,列名为样本名且列名要唯一。
    2. TF的基因list。CoRegNet包中内置了人类转录因子list,并且有official gene symbol (HumanTF)和EntrezGene IDs (HumanTF_entrezgene)这两种格式。使用的时候直接用data(HumanTF)data(HumanTF_entrezgene)调用即可。
    # An example of how to infer a co-regulation network
    grn =hLICORN(CIT_BLCA_EXP, TFlist=HumanTF)
    print(grn) #grn是根据输入的表达矩阵和转录因子预测出来的转录因子,靶基因和调控互作
    # [1] "63 Transcription Factors.  867 Target Genes.  8072 Regulatory interactions."
    # [1] "No added evidences."
    head(grn@GRN)
    #   Target coact  corep         Coef.Acts          Coef.Reps Coef.coActs Coef.coReps         R2      RMSE
    # 1 DIRAS2 NR1H4  IFI16 0.242753364850399  -0.39999852548476          NA          NA 0.24528282 0.9670125
    # 2 DIRAS2 NR1H4    MYC 0.225607457464437 -0.273761935521453          NA          NA 0.11392664 1.0525775
    # 3 DIRAS2 NR1H4  SNAI2 0.158558921637845 -0.349914647715346          NA          NA 0.23964161 0.9705609
    # 4 DIRAS2 NR1H4   IRX3  0.25463526841265 -0.180678166467546          NA          NA 0.12653573 1.0419588
    # 5 DIRAS2 NR1H4 SPOCD1  0.29515905804531 -0.212927437477536          NA          NA 0.14379615 1.0316241
    # 6 DIRAS2 NR1H4   <NA> 0.308379330917705               <NA>          NA          NA 0.08489369 1.0651439
    
    # 导出所有的Target
    Target=unique(grn@GRN$Target) #可以拿出来做GO富集等
    
    63 Transcription Factors x 867 Target Gene x 8072 Regulatory interactions

    在上面hLICORN()这一步是自动对输入的矩阵进行了离散化的,也就是说,实际上用于构建调控网络的input矩阵是离散化后的矩阵。下面是对矩阵进行手动离散化的两种方法

    #Default discretization. 
    #Uses the standard deviation of the whole dataset to set a threshold.
    disc1=discretizeExpressionData(CIT_BLCA_EXP)
    table(disc1)
    # disc1
    #     -1      0      1 
    #  23482 131019  28499 
    boxplot(as.matrix(CIT_BLCA_EXP)~disc1)
    
    #Discretization with a hard threshold
    disc2=discretizeExpressionData(CIT_BLCA_EXP, threshold=1)
    table(disc2)
    # disc2
    #    -1     0     1 
    # 45956 93063 43981
    boxplot(as.matrix(CIT_BLCA_EXP)~disc2)
    
    # more examples here
    help(discretizeExpressionData)
    

    使用矩阵前200个基因以方便运算(可以开4线程,速度更快)

    # running only on the 200 first gene in the matrix for fast analysis
    # Choosing to divide in 4 threads whenever possible
    options("mc.cores"=4)
    grn =hLICORN(head(CIT_BLCA_EXP,200), TFlist=HumanTF)
    print(grn)
    # [1] "18 Transcription Factors.  161 Target Genes.  642 Regulatory interactions."
    # [1] "No added evidences."
    options("mc.cores"=2)
    grn =hLICORN(head(CIT_BLCA_EXP,200), TFlist=HumanTF)
    print(grn)
    # [1] "18 Transcription Factors.  161 Target Genes.  642 Regulatory interactions."
    # [1] "No added evidences."
    
    3.2 Refining the inferred regulatory networks

    第二步是使用external knowledge来丰富前面得到的inferred regulatory network。有两种类型的外部数据集可以使用:1. TF对基因的调控数据比如Transcription Factor Binding Sites (TFBS)ChIP数据;2. 共调控信息比如蛋白-蛋白互作,这类数据可以提供TFs的合作信息。
    ❗️对于前面两种类型的数据,分别使用addEvidencesaddCooperativeEvidences函数来添加。

    # 引入四类外部数据
    # ChIP data from the CHEA database
    data(CHEA_sub)
    #ChIP data from the ENCODE project
    data(ENCODE_sub)
    # Protein protein interactions between TF from the HIPPIE database
    data(HIPPIE_sub)
    # Protein protein interactions between TF from the STRING database
    data(STRING_sub)
    
    enrichedGRN = addEvidences(grn,CHEA_sub,ENCODE_sub)
    # CHEA_sub was integrated into the network.
    # ENCODE_sub was integrated into the network.
    enrichedGRN = addCooperativeEvidences(enrichedGRN,HIPPIE_sub,STRING_sub)
    # [1] "HIPPIE_sub"
    # [1] "No evidence from HIPPIE_sub were found in the inferred network."
    # [1] "STRING_sub"
    # [1] "STRING_sub was integrated into the network."
    

    得到的enrichedGRN就是引入external knowledge的grn

    print(enrichedGRN)
    # [1] "63 Transcription Factors.  867 Target Genes.  8072 Regulatory interactions."
    #            commonGene commonReg evidences commonEvidences       enrichment              p.value
    # CHEA_sub          567        12      1050             218 1.13521053808365   0.0701008769458297
    # ENCODE_sub        724        10      2749             382 1.18076947263415   0.0114797404982399
    # STRING_sub       <NA>        39       137              36 3.16656003302599 2.24846204249256e-06
    

    导入外部数据之后就可以对 共调控网络进行refine,可以使用默认的refine方法。

    # Default unsupervised refinement method
    refinedGRN = refine(enrichedGRN)
    print(refinedGRN)
    # [1] "57 Transcription Factors.  867 Target Genes.  1854 Regulatory interactions."
    #            commonGene commonReg evidences commonEvidences       enrichment              p.value
    # CHEA_sub          567        12      1050             176 3.55976690089225  3.1172872878133e-32
    # ENCODE_sub        724        10      2749             296 4.71677404968161 3.01483641042892e-48
    # STRING_sub       <NA>        21        83              17              Inf 4.69277379589639e-08
    

    可以看到refine之后,Transcription Factors和1854 Regulatory interactions的数目都发生了变化

    也可以使用指定的数据集来进行refine

    # Example of supervised refinement with the CHEA chip data
    refinedGRN = refine(enrichedGRN, integration="supervised",referenceEvidence="CHEA_sub")
    print(refinedGRN)
    
    3.3 Identification of active transcriptional programs

    CoRegNet包的目的是从给定样本或样本集中识别active cooperative TF的集合。Transcriptional activity的衡量方法是估计给定样本中特定转录调节因子的活化程度。这个衡量方法是比较transcriptional network中活化的和抑制的基因表达。它是基于一个样本中这两个基因sets间的divergence的测量(Welch’s t statistics)。从根本上来说,如果可被某个TF激活的基因出现高表达的同时可被抑制的基因出现低表达,这个TF就具有high influence。
    使用一个coregnet object的共表达网络和一个基因表达矩阵,不管这个矩阵是用来推断共表达网络的矩阵还是别的矩阵。output的是一个列为样本(和基因表达矩阵一致),行为TF的矩阵。每一行的TF都在转录网络中拥有足够数量的targets(至少10 activated和10 repressed)。

    CITinf =regulatorInfluence(grn,CIT_BLCA_EXP)
    CITinf[1:6,1:6]
    #             CIT100    CIT102     CIT103    CIT105    CIT106    CIT107
    # EMX2    -0.6705802 -4.134082   3.433414  8.512250 -2.129228  2.170088
    # EGR1     6.2721207 14.563272  -6.805588 -4.347788 11.962142 -3.975751
    # TGFB1I1  7.0348603 19.819534  -4.918518 -5.555589 12.702996 -3.141942
    # EGR2     4.5810891 14.504279  -8.664420 -7.408075 11.432331 -9.668110
    # PLAGL1   5.7439075 18.780449 -10.075245 -7.001515 13.497227 -7.333840
    # SOX9     6.1841193 10.627645  -4.749642 -4.485600 12.322927 -1.968706
    str(CITinf)
    #  num [1:33, 1:183] -0.671 6.272 7.035 4.581 5.744 ...
    #  - attr(*, "dimnames")=List of 2
    #   ..$ : chr [1:33] "EMX2" "EGR1" "TGFB1I1" "EGR2" ...
    #   ..$ : chr [1:183] "CIT100" "CIT102" "CIT103" "CIT105" ...
    

    这个新的transcriptional influence数据集可以被视为整个转录数据集的condensed版本。

    # Coregulators of a hLICORN  inferred network
    head(coregulators(grn))
    #    Reg1    Reg2     Support   NA nGRN    fisherTest adjustedPvalue
    # 1 PPARG   VGLL1 0.017881213 1247 1247  1.436958e-68   1.638132e-67
    # 2 GATA3   PPARG 0.016891795 1178 1178  2.861003e-87   1.087181e-85
    # 3 FOXF1 TGFB1I1 0.013536379  944  944 2.838172e-101   3.235516e-99
    # 4  EGR1    EGR2 0.012733373  888  888  4.332173e-72   6.173346e-71
    # 5  MSX2   PPARG 0.012288853  857  857  2.767895e-77   6.310800e-76
    # 6 GATA3    MSX2 0.009922854  692  692  9.068728e-80   2.584587e-78
    

    也可以引入分组数据和copy number数据等

    data(CIT_BLCA_CNV)
    data(CIT_BLCA_Subgroup)
    
    display(grn,expressionData=CIT_BLCA_EXP,TFA=CITinf)
    
    # Visualizing additional regulatory or co-regulatory evidences in the network
    display(enrichedGRN,expressionData=CIT_BLCA_EXP,TFA=CITinf)
    
    # Visualizing sample classification using a named factor
    display(grn,expressionData=CIT_BLCA_EXP,TFA=CITinf,clinicalData=CIT_BLCA_Subgroup)
    
    # Visualizing copy number alteration of regulators
    data(CIT_BLCA_CNV)
    display(grn,expressionData=CIT_BLCA_EXP,TFA=CITinf,clinicalData=CIT_BLCA_Subgroup,alterationData=CIT_BLCA_CNV)
    

    参考:
    http://www.bioconductor.org/packages/release/bioc/vignettes/CoRegNet/inst/doc/CoRegNet.html
    差异共表达网络

    相关文章

      网友评论

        本文标题:CoRegNet: 共调控网络的重建和整合分析

        本文链接:https://www.haomeiwen.com/subject/voihtrtx.html