scenic运行记录1

作者: Seurat_Satija | 来源:发表于2021-04-16 09:48 被阅读0次
R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> # https://mp.weixin.qq.com/s/pN4qWdUszuGqr8nOJstn8w
> 
> if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
> BiocManager::version()
[1] ‘3.12’
> # If your bioconductor version is previous to 4.0, see the section bellow
> 
> ## Required
> if (!requireNamespace(c("AUCell", "RcisTarget"), quietly = TRUE)) BiocManager::install(c("AUCell", "RcisTarget"),ask = F,update = F)
> if (!requireNamespace('GENIE3', quietly = TRUE)) BiocManager::install(c("GENIE3"),ask = F,update = F)  # Optional. Can be replaced by GRNBoost
> 
> ## Optional (but highly recommended):
> # To score the network on cells (i.e. run AUCell):
> if (!requireNamespace(c("zoo", "mixtools", "rbokeh"), quietly = TRUE)) BiocManager::install(c("zoo", "mixtools", "rbokeh"),ask = F,update = F) 
> # For various visualizations and perform t-SNEs:
> if (!requireNamespace(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne"), quietly = TRUE)) BiocManager::install(c("DT", "NMF", "ComplexHeatmap", "R2HTML", "Rtsne"),ask = F,update = F)
> 
> # To support paralell execution (not available in Windows):
> # To support paralell execution (not available in Windows):
> #if (!requireNamespace(c("doMC", "doRNG"), quietly = TRUE))  BiocManager::install(c("doMC", "doRNG"),ask = F,update = F)
> 
> # To export/visualize in http://scope.aertslab.org
> if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
> if (!requireNamespace("SCopeLoomR", quietly = TRUE))  devtools::install_github("aertslab/SCopeLoomR", build_vignettes = TRUE)
> 
> if (!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")
> if (!requireNamespace("SCENIC", quietly = TRUE))  devtools::install_github("aertslab/SCENIC") 
> packageVersion("SCENIC")
[1] ‘1.2.4’
> 
> library(Rtsne)
Warning message:
程辑包‘Rtsne’是用R版本4.0.5 来建造的 
> library(doSNOW)
载入需要的程辑包：foreach
载入需要的程辑包：iterators
载入需要的程辑包：snow
Warning messages:
1: 程辑包‘doSNOW’是用R版本4.0.5 来建造的 
2: 程辑包‘foreach’是用R版本4.0.3 来建造的 
3: 程辑包‘iterators’是用R版本4.0.3 来建造的 
4: 程辑包‘snow’是用R版本4.0.3 来建造的 
> #install.packages("doParallel") 
> library(doParallel)
载入需要的程辑包：parallel

载入程辑包：‘parallel’

The following objects are masked from ‘package:snow’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport,
    clusterMap, clusterSplit, makeCluster, parApply, parCapply, parLapply,
    parRapply, parSapply, splitIndices, stopCluster

Warning message:
程辑包‘doParallel’是用R版本4.0.3 来建造的 
> library(doMPI)
载入需要的程辑包：Rmpi
Error: package or namespace load failed for ‘Rmpi’:
 loadNamespace()里算'Rmpi'时.onLoad失败了，详细内容：
  调用: inDL(x, as.logical(local), as.logical(now), ...)
  错误: unable to load shared object 'C:/Users/Nano/Documents/R/win-library/4.0/Rmpi/libs/x64/Rmpi.dll':
  LoadLibrary failure:  找不到指定的模块。

Error: 无法载入程辑包‘Rmpi’
In addition: Warning messages:
1: 程辑包‘doMPI’是用R版本4.0.5 来建造的 
2: 程辑包‘Rmpi’是用R版本4.0.4 来建造的 
> # install.packages("doSNOW")
> # install.packages("doMPI")
> rm(list = ls())
> library(Seurat)
Registered S3 method overwritten by 'spatstat':
  method     from
  print.boxx cli 
Attaching SeuratObject
Warning message:
程辑包‘Seurat’是用R版本4.0.4 来建造的 
> # devtools::install_github('satijalab/seurat-data')
> library(SeuratData)
-- Installed datasets ------------------------------------------------ SeuratData v0.2.1 --
√ ifnb         3.1.0                         √ pbmcMultiome 0.1.0
√ panc8        3.0.2                         √ stxBrain     0.1.1
√ pbmc3k       3.1.4                         

------------------------------------------- Key -------------------------------------------
√ Dataset loaded successfully
> Dataset built with a newer version of Seurat than installed
(?) Unknown version of Seurat installed

There were 12 warnings (use warnings() to see them)
> AvailableData()
                                   Dataset Version
bmcite.SeuratData                   bmcite   0.3.0
cbmc.SeuratData                       cbmc   3.1.4
celegans.embryo.SeuratData celegans.embryo   0.1.0
hcabm40k.SeuratData               hcabm40k   3.0.0
ifnb.SeuratData                       ifnb   3.1.0
panc8.SeuratData                     panc8   3.0.2
pbmc3k.SeuratData                   pbmc3k   3.1.4
pbmcMultiome.SeuratData       pbmcMultiome   0.1.0
pbmcsca.SeuratData                 pbmcsca   3.0.0
ssHippo.SeuratData                 ssHippo   3.1.4
stxBrain.SeuratData               stxBrain   0.1.1
stxKidney.SeuratData             stxKidney   0.1.0
thp1.eccite.SeuratData         thp1.eccite   3.1.5
                                                                                  Summary
bmcite.SeuratData                                                   30k Bone Marrow Cells
cbmc.SeuratData                              scRNAseq and 13-antibody sequencing of CBMCs
celegans.embryo.SeuratData         6k C. elegans embryos from Packer and Zhu et al (2019)
hcabm40k.SeuratData        40,000 Cells From the Human Cell Atlas ICA Bone Marrow Dataset
ifnb.SeuratData                                         IFNB-Stimulated and Control PBMCs
panc8.SeuratData                         Eight Pancreas Datasets Across Five Technologies
pbmc3k.SeuratData                                              3k PBMCs from 10X Genomics
pbmcMultiome.SeuratData                                10X Genomics PBMC Multiome Dataset
pbmcsca.SeuratData                   Broad Institute PBMC Systematic Comparative Analysis
ssHippo.SeuratData                              Slide-seq v2 dataset of mouse hippocampus
stxBrain.SeuratData                               10X Genomics Visium Mouse Brain Dataset
stxKidney.SeuratData                             10X Genomics Visium Mouse Kidney Dataset
thp1.eccite.SeuratData                                                   ECCITE-seq THP-1
                           seurat    species            system ncells
bmcite.SeuratData           3.2.2      human       bone marrow  30672
cbmc.SeuratData             3.1.4      human CBMC (cord blood)   8617
celegans.embryo.SeuratData   <NA> C. elegans            embryo   6188
hcabm40k.SeuratData          <NA>      human       bone marrow  40000
ifnb.SeuratData              <NA>      human              PBMC  13999
panc8.SeuratData             <NA>      human Pancreatic Islets  14892
pbmc3k.SeuratData           3.1.4      human              PBMC   2700
pbmcMultiome.SeuratData     4.0.0      human              pbmc  11909
pbmcsca.SeuratData           <NA>      human              PBMC  31021
ssHippo.SeuratData           <NA>      mouse       hippocampus     NA
stxBrain.SeuratData          <NA>      mouse             brain  12167
stxKidney.SeuratData         <NA>      mouse            kidney   1438
thp1.eccite.SeuratData       <NA>      human              <NA>     NA
                                                                                      tech
bmcite.SeuratData                                                                     <NA>
cbmc.SeuratData                                                                   CITE-seq
celegans.embryo.SeuratData                                                            <NA>
hcabm40k.SeuratData                                                                 10x v2
ifnb.SeuratData                                                                     10x v1
panc8.SeuratData                          SMARTSeq2, Fluidigm C1, CelSeq, CelSeq2, inDrops
pbmc3k.SeuratData                                                                   10x v1
pbmcMultiome.SeuratData                                                               <NA>
pbmcsca.SeuratData         10x v2, 10x v3, SMARTSeq2, Seq-Well, inDrops, Drop-seq, CelSeq2
ssHippo.SeuratData                                                             slideseq v2
stxBrain.SeuratData                                                                 visium
stxKidney.SeuratData                                                                visium
thp1.eccite.SeuratData                                                                <NA>
                           default.dataset disk.datasets
bmcite.SeuratData                     <NA>          <NA>
cbmc.SeuratData                        raw     processed
celegans.embryo.SeuratData             raw          <NA>
hcabm40k.SeuratData                    raw          <NA>
ifnb.SeuratData                        raw          <NA>
panc8.SeuratData                       raw          <NA>
pbmc3k.SeuratData                      raw          <NA>
pbmcMultiome.SeuratData                 NA          <NA>
pbmcsca.SeuratData                     raw          <NA>
ssHippo.SeuratData                     raw          <NA>
stxBrain.SeuratData                     NA          <NA>
stxKidney.SeuratData                   raw          <NA>
thp1.eccite.SeuratData                <NA>          <NA>
                                                         other.datasets
bmcite.SeuratData                                                  <NA>
cbmc.SeuratData                                                    <NA>
celegans.embryo.SeuratData                                         <NA>
hcabm40k.SeuratData                                                <NA>
ifnb.SeuratData                                               processed
panc8.SeuratData                                                   <NA>
pbmc3k.SeuratData                                          pbmc3k.final
pbmcMultiome.SeuratData                             pbmc.rna, pbmc.atac
pbmcsca.SeuratData                                                 <NA>
ssHippo.SeuratData                                                 <NA>
stxBrain.SeuratData        posterior1, posterior2, anterior1, anterior2
stxKidney.SeuratData                                               <NA>
thp1.eccite.SeuratData                                             <NA>
                                                                                               notes
bmcite.SeuratData                                                                               <NA>
cbmc.SeuratData                                                                                 <NA>
celegans.embryo.SeuratData                                                                      <NA>
hcabm40k.SeuratData                                                                             <NA>
ifnb.SeuratData                                                                                 <NA>
panc8.SeuratData                                                                                <NA>
pbmc3k.SeuratData                                                                               <NA>
pbmcMultiome.SeuratData                                 One sample with two modalities, RNA and ATAC
pbmcsca.SeuratData                                                                     HCA benchmark
ssHippo.SeuratData                                                                              <NA>
stxBrain.SeuratData        One sample split across four datasets as paired anterior/posterior slices
stxKidney.SeuratData                                                                            <NA>
thp1.eccite.SeuratData                                                                          <NA>
                           Installed InstalledVersion
bmcite.SeuratData              FALSE             <NA>
cbmc.SeuratData                FALSE             <NA>
celegans.embryo.SeuratData     FALSE             <NA>
hcabm40k.SeuratData            FALSE             <NA>
ifnb.SeuratData                 TRUE            3.1.0
panc8.SeuratData                TRUE            3.0.2
pbmc3k.SeuratData               TRUE            3.1.4
pbmcMultiome.SeuratData         TRUE            0.1.0
pbmcsca.SeuratData             FALSE             <NA>
ssHippo.SeuratData             FALSE             <NA>
stxBrain.SeuratData             TRUE            0.1.1
stxKidney.SeuratData           FALSE             <NA>
thp1.eccite.SeuratData         FALSE             <NA>
There were 12 warnings (use warnings() to see them)
> # InstallData("pbmc3k") #  (89.4 MB) 
> data("pbmc3k")
> exprMat  <-  as.matrix(pbmc3k@assays$RNA@data)
> dim(exprMat)
[1] 13714  2700
> exprMat[1:4,1:4]
              AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG
AL627309.1                 0              0              0              0
AP006222.2                 0              0              0              0
RP11-206L10.2              0              0              0              0
RP11-206L10.9              0              0              0              0
> cellInfo <-  pbmc3k@meta.data[,c(4,2,3)]
> colnames(cellInfo)=c('CellType', 'nGene' ,'nUMI')
> head(cellInfo)
                   CellType nGene nUMI
AAACATACAACCAC Memory CD4 T  2419  779
AAACATTGAGCTAC            B  4903 1352
AAACATTGATCAGC Memory CD4 T  3147 1129
AAACCGTGCTTCCG   CD14+ Mono  2639  960
AAACCGTGTATGCG           NK   980  521
AAACGCACTGGTAC Memory CD4 T  2163  781
> table(cellInfo$CellType)

 Naive CD4 T Memory CD4 T   CD14+ Mono            B        CD8 T FCGR3A+ Mono           NK 
         697          483          480          344          271          162          155 
          DC     Platelet 
          32           14 
> ### Initialize settings
> library(SCENIC)
Warning message:
程辑包‘SCENIC’是用R版本4.0.3 来建造的 
> # 保证cisTarget_databases 文件夹下面有下载好2个1G的文件
> scenicOptions <- initializeScenic(org="hgnc", 
+                                   dbDir="cisTarget_databases", nCores=1)
Motif databases selected: 
  hg19-500bp-upstream-7species.mc9nr.feather 
  hg19-tss-centered-10kb-7species.mc9nr.feather
Registered S3 methods overwritten by 'arrow':
  method             from        
  as.character.Array DelayedArray
  as.integer.Array   DelayedArray
  as.vector.Array    DelayedArray
Warning messages:
1: In RcisTarget::importRankings(dbFilePath, columns = randomCol) :
  The following columns are missing from the database: 
2: In RcisTarget::importRankings(dbFilePath, columns = randomCol) :
  The following columns are missing from the database: 
3: package ‘RcisTarget’ was built under R version 4.0.3 
4: In RcisTarget::importRankings(dbFile, columns = rnktype) :
  The following columns are missing from the database: features
5: In RcisTarget::importRankings(dbFile, columns = rnktype) :
  The following columns are missing from the database: features
> saveRDS(scenicOptions, file="int/scenicOptions.Rds")
> ### Co-expression network
> genesKept <- geneFiltering(exprMat, scenicOptions)
Maximum value in the expression matrix: 419
Ratio of detected vs non-detected: 0.066
Number of counts (in the dataset units) per gene:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
     3.0     16.0     57.0    465.7    176.8 161685.0 
Number of cells in which each gene is detected:
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    3.0    15.0    54.0   166.5   157.0  2700.0 

Number of genes left after applying the following filters (sequential):
    5749    genes with counts per gene > 81
    5748    genes detected in more than 27 cells
    5318    genes available in RcisTarget database
Gene list saved in int/1.1_genesKept.Rds
> exprMat_filtered <- exprMat[genesKept, ]
> exprMat_filtered[1:4,1:4]
         AAACATACAACCAC AAACATTGAGCTAC AAACATTGATCAGC AAACCGTGCTTCCG
NOC2L                 0              0              0              0
HES4                  0              0              0              0
ISG15                 0              0              1              9
TNFRSF18              0              2              0              0
> dim(exprMat_filtered)
[1] 5318 2700
> runCorrelation(exprMat_filtered, scenicOptions)
> exprMat_filtered_log <- log2(exprMat_filtered+1) 
> runGenie3(exprMat_filtered_log, scenicOptions)
Using 480 TFs as potential regulators...
Running GENIE3 part 1
Running GENIE3 part 10
Running GENIE3 part 2
Running GENIE3 part 3
Running GENIE3 part 4
Running GENIE3 part 5
Running GENIE3 part 6
Running GENIE3 part 7
Running GENIE3 part 8
Running GENIE3 part 9
Finished running GENIE3.
Warning message:
In runGenie3(exprMat_filtered_log, scenicOptions) :
  Only 480 (26%) of the 1839 TFs in the database were found in the dataset. Do they use the same gene IDs?

> 
> ### Build and score the GRN
> exprMat_log <- log2(exprMat+1)
> scenicOptions@settings$dbs <- scenicOptions@settings$dbs["10kb"] # Toy run settings
> scenicOptions <- runSCENIC_1_coexNetwork2modules(scenicOptions)
00:50   Creating TF modules
Number of links between TFs and targets (weight>=0.001): 1773162
             [,1]
nTFs          480
nTargets     5318
nGeneSets    3832
nLinks    2515239
> scenicOptions <- runSCENIC_2_createRegulons(scenicOptions,
+                                             coexMethod=c("top5perTarget")) # Toy run settings
00:50   Step 2. Identifying regulons
tfModulesSummary:
              [,1]
top5perTarget   61
00:50   RcisTarget: Calculating AUC
Scoring database:  [Source file: hg19-tss-centered-10kb-7species.mc9nr.feather]
00:53   RcisTarget: Adding motif annotation
Using BiocParallel...
Number of motifs in the initial enrichment: 19044
Number of motifs annotated to the matching TF: 357
00:53   RcisTarget: Pruning targets
00:55   Number of motifs that support the regulons: 357
    Preview of motif enrichment saved as: output/Step2_MotifEnrichment_preview.html
Warning messages:
1: package ‘BiocParallel’ was built under R version 4.0.3 
2: In MulticoreParam(nCores) :
  MulticoreParam() not supported on Windows, use SnowParam()
3: package ‘AUCell’ was built under R version 4.0.3 
4: In RcisTarget::importRankings(dbFilePath, columns = randomCol) :
  The following columns are missing from the database: 
5: In RcisTarget::importRankings(dbFile, columns = rnktype) :
  The following columns are missing from the database: features
6: In importRankings(rnkName, columns = allGenes) :
  The following columns are missing from the database: 
7: In importRankings(dbNames[motifDbName], columns = allGenes) :
  The following columns are missing from the database: 
8: In .addSignificantGenes(resultsTable = resultsTable, geneSets = geneSets,  :
  The rakings provided only include a subset of genes/regions included in the whole database.
9: package ‘data.table’ was built under R version 4.0.3 
> library(doParallel)
> scenicOptions <- runSCENIC_3_scoreCells(scenicOptions, exprMat_log ) 
00:56   Step 3. Analyzing the network activity in each individual cell
    Number of regulons to evaluate on cells: 21
Biggest (non-extended) regulons: 
     SPI1 (539g)
     CEBPD (73g)
     CEBPB (47g)
     SPIB (38g)
     TBX21 (28g)
     IRF7 (15g)
     IRF8 (16g)
     STAT1 (14g)
Quantiles for the number of genes detected by cell: 
(Non-detected genes are shuffled at the end of the ranking. Keep it in mind when choosing the threshold for calculating the AUC).
    min      1%      5%     10%     50%    100% 
 212.00  325.00  434.95  539.90  816.00 3400.00 
00:57   Finished running AUCell.
00:57   Plotting heatmap...
00:57   Plotting t-SNEs...
> scenicOptions <- runSCENIC_4_aucell_binarize(scenicOptions)
Binary regulon activity: 12 TF regulons x 2216 cells.
(20 regulons including 'extended' versions)
11 regulons are active in more than 1% (22.16) cells.
> tsneAUC(scenicOptions, aucType="AUC") # choose settings
[1] "int/tSNE_AUC_50pcs_30perpl.Rds"
>
scenic运行记录1

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读