美文网首页
SCENIC/pySCENIC分析补充+小鼠数据示例2022-1

SCENIC/pySCENIC分析补充+小鼠数据示例2022-1

作者: 黄甫一 | 来源:发表于2022-12-13 23:18 被阅读0次

    适用背景

    之前写了两篇博客(四步完成单细胞数据调控网络流程分析-SCENIC/pySCENIC-2022-09-06SCENIC/pySCENIC结果可视化 2022-11-08)介绍SCENIC/pySCENIC的使用,最近在使用的时候遇到一些问题,因此这篇文章作为补充,如果看不懂本篇可以查看前两篇博客。

    补充内容主要有以下几点:

    • 1、小鼠的数据库构建
    • 2、从四步流程缩减到三步
    • 3、环境构建遇到的一些errors
    • 4、SCENIC版本的bugs

    小鼠的数据库构建

    之前的博客构建的是人的数据库,但博主最近分析需要用到小鼠的,因此需要构建一下新的数据库。正如之前博客写到的,其实构建这个数据库只需要替换3个文件,转录因子(TF)列表文件,feather文件已经tbl文件,建议使用最新的v10版本,当然也可以根据网站提示挑选适合自己数据的文件。我下载的文件如下:


    从四步流程缩减到三步

    之前生成pyscenic需要的loom文件要先用R导出矩阵,再用Python生成loom文件,实在有点繁琐,那可不可以直接从Seurat 对象生成loom文件呢?答案当然是可以,使用SCopeLoomR即可。只需要使用build_loom函数,在get_count_from_seurat.R文件后面添加几行代码即可。
    get_count_from_seurat.R文件内容如下:

    library(optparse)
    op_list <- list(
    make_option(c("-i", "--inrds"), type = "character", default = NULL, action = "store", help = "The input of Seurat RDS",metavar="rds"),
    make_option(c("-d", "--ident"), type = "character", default = NULL, action = "store", help = "The sample Ident of Seurat object",metavar="idents"),
    make_option(c("-s", "--size"),  type = "integer", default = NULL, action = "store", help = "The sample size of Seurat object",metavar="size"),
    make_option(c("-l", "--label"), type = "character", default = "out", action = "store", help = "The label of output file",metavar="label"),
    make_option(c("-a", "--assay"), type = "character", default = "Spatial", action = "store", help = "The assay of input file",metavar="assay")
    )
    parser <- OptionParser(option_list = op_list)
    opt = parse_args(parser)
    
    assay <- opt$assay
    
    library(Seurat)
    obj <- readRDS(opt$inrds)
    if (!is.null(opt$ident)) {
    Idents(obj) <-  opt$ident
    size=opt$size
    if (!is.null(size)) {
    obj <- subset(x = obj, downsample = opt$size)
    }
    saveRDS(obj,"subset.rds")
    }
    if (is.null(opt$label)) {
    label1 <- 'out'
    }else{
    label1 <- opt$label
    }
    
    library(SCopeLoomR)
    outloom <- paste0(label1,".loom")
    build_loom(file.name = outloom,dgem = obj@assays[[assay]]@counts)
    write.table(obj@meta.data,'metadata_subset.xls',sep='\t',quote=F)
    

    使用方法:
    -i,输入Seurat对象的RDS文件
    -d,随机取样分组的列名,例如groups,如果不赋值则表示不随机取样,使用全部细胞
    -s,随机取样的大小,例如20,因为这里用的是pbmc_small,所以设置小一点,实际情况可能需要设置大一点
    -l,输出文件的标签,默认为out,
    -a,使用的Seurat对象assay,默认为Spatial。

    Rscript get_count_from_seurat.R -i test.rds -d groups -s 20 -l out -a Spatial
    

    需要注意的是,新脚本添加了assay这个参数,默认为Spatial,因为最近在用跑空间组数据,如果要应用到单细胞需要改为RNA。


    环境构建遇到的一些errors

    因为最近集群在改系统盘名称还有不断删文件导致我用别人的软件、数据库文件和库路径时总是报错,因为自己开始来重新构建SCENIC/pyscenic运行需要的环境,期间遇到不少问题,这里汇总一下。

    sh Miniconda3-latest-Linux-x86_64.sh
    

    创建一个环境,安装Python和R

    conda create -n pyscenic
    conda activate pyscenic
    conda install r-base=4.2.1
    conda install python=3.9.13
    pip install pyscenic -i https://mirrors.aliyun.com/pypi/simple/
    pip install pandas -i https://mirrors.aliyun.com/pypi/simple/
    pip install scanpy -i https://mirrors.aliyun.com/pypi/simple/
    pip install opencv-python -i https://mirrors.aliyun.com/pypi/simple/
    pip install loompy -i https://mirrors.aliyun.com/pypi/simple/
    

    这样就配置好基本环境了,接下来是安装R包,确保以下需要library的包就可以。

    library(Seurat)
    library(SCopeLoomR)
    library(AUCell)
    library(SCENIC)
    library(dplyr)
    library(KernSmooth)
    library(RColorBrewer)
    library(plotly)
    library(BiocParallel)
    library(optparse)
    

    可以先用BiocManager::install() 安装,如果安装不了,谷歌搜一下conda install [package name](百度搜不出来,不推荐)就能找到conda 安装这个包的官网,虽然conda很万能,但是慢啊,BiocManager::install会快很多。当然还有一些包是装不了,需要用devtools安装,例如SCopeLoomR,那哪些包要用devtools装呢?一般BiocManager::install和conda装不了的包就极大概率要用devtools装了,所以直接搜这个包的GitHub或官网,例如谷歌搜SCopeLoomR GitHub,GitHub里面说明需要这样安装:

    devtools::install_github("aertslab/SCopeLoomR")
    
    • 谈谈SCopeLoomR安装过程遇到的一些坑
      本来想偷懒直接用别人的库路径,library的时候结果报错HDF5 library version mismatched error,R还abort突然中断了,就是说我的HDF5库版本跟我这个包安装时的版本不匹配,只能重装。用devtools重装又各种报错,缺失某个包,缺失某个动态库……反正各种缺失,没办法,只能缺什么装什么了,而且装devtools的时候就很容易报错,建议先用conda安装以下包和动态库再安装SCopeLoomR:
    conda install -c anaconda git
    conda install -c anaconda hdf5
    conda install -c conda-forge libgit2
    conda install -c conda-forge r-ragg
    conda install -c conda-forge r-xml
    

    再在R里运行:

    devtools::install_github("aertslab/SCopeLoomR")
    

    SCENIC版本的bugs

    在进行可视化的时候需要用到SCENIC的R包,但是我发现调函数plotRSS的过滤参数时,不管怎么调,出来的图都是一样的。因为看一下这个函数的源代码,好家伙,发现它莫名奇妙过滤了1.5的值,而且这个值是固定在函数里无法更改。

    • 查看SCENIC版本1.2.4
    > packageVersion("SCENIC")
    [1] ‘1.2.4’
    

    函数plotRSS源代码

    > plotRSS
    function (rss, labelsToDiscard = NULL, zThreshold = 1, cluster_columns = FALSE,
        order_rows = TRUE, trh = 0.01, varName = "cellType", col.low = "grey90",
        col.mid = "darkolivegreen3", col.high = "darkgreen", revCol = FALSE)
    {
        varSize = "RSS"
        varCol = "Z"
        if (revCol) {
            varSize = "Z"
            varCol = "RSS"
        }
        rssNorm <- scale(rss)
        rssNorm[rssNorm < zThreshold] <- 0
        rssNorm <- rssNorm[, which(!colnames(rssNorm) %in% labelsToDiscard)]
        tmp <- .plotRSS_heatmap(rssNorm, trh = trh, cluster_columns = cluster_columns,
            order_rows = order_rows)
        rowOrder <- rev(tmp@row_names_param$labels)
        rss.df <- reshape2::melt(rss)
        head(rss.df)
        colnames(rss.df) <- c("Topic", varName, "RSS")
        rssNorm.df <- reshape2::melt(rssNorm)
        colnames(rssNorm.df) <- c("Topic", varName, "Z")
        rss.df <- merge(rss.df, rssNorm.df)
        rss.df <- rss.df[which(rss.df$Z >= 1.5), ]
        rss.df <- rss.df[which(!rss.df[, varName] %in% labelsToDiscard),
            ]
        rss.df[, "Topic"] <- factor(rss.df[, "Topic"], levels = rowOrder)
        p <- dotHeatmap(rss.df, var.x = varName, var.y = "Topic",
            var.size = varSize, min.size = 0.5, max.size = 5, var.col = varCol,
            col.low = col.low, col.mid = col.mid, col.high = col.high)
        invisible(list(plot = p, df = rss.df, rowOrder = rowOrder))
    }
    <bytecode: 0x55b0ea0f86b8>
    <environment: namespace:SCENIC>
    

    于是,我把这个包更新了一些再看函数代码,发现去掉了过滤1.5值的操作,果然是旧版本的一个bug啊。

    • 查看SCENIC版本1.3.1
    > packageVersion("SCENIC")
    [1] ‘1.3.1’
    

    函数plotRSS源代码

    > plotRSS
    function (rss, labelsToDiscard = NULL, zThreshold = 1, cluster_columns = FALSE,
        order_rows = TRUE, thr = 0.01, varName = "cellType", col.low = "grey90",
        col.mid = "darkolivegreen3", col.high = "darkgreen", revCol = FALSE,
        verbose = TRUE)
    {
        varSize = "RSS"
        varCol = "Z"
        if (revCol) {
            varSize = "Z"
            varCol = "RSS"
        }
        rssNorm <- scale(rss)
        rssNorm <- rssNorm[, which(!colnames(rssNorm) %in% labelsToDiscard)]
        rssNorm[rssNorm < 0] <- 0
        rssSubset <- rssNorm
        if (!is.null(zThreshold))
            rssSubset[rssSubset < zThreshold] <- 0
        tmp <- .plotRSS_heatmap(rssSubset, thr = thr, cluster_columns = cluster_columns,
            order_rows = order_rows, verbose = verbose)
        rowOrder <- rev(tmp@row_names_param$labels)
        rm(tmp)
        rss.df <- reshape2::melt(rss)
        head(rss.df)
        colnames(rss.df) <- c("Topic", varName, "RSS")
        rssNorm.df <- reshape2::melt(rssNorm)
        colnames(rssNorm.df) <- c("Topic", varName, "Z")
        rss.df <- base::merge(rss.df, rssNorm.df)
        rss.df <- rss.df[which(!rss.df[, varName] %in% labelsToDiscard),
            ]
        if (nrow(rss.df) < 2)
            stop("Insufficient rows left to plot RSS.")
        rss.df <- rss.df[which(rss.df$Topic %in% rowOrder), ]
        rss.df[, "Topic"] <- factor(rss.df[, "Topic"], levels = rowOrder)
        p <- dotHeatmap(rss.df, var.x = varName, var.y = "Topic",
            var.size = varSize, min.size = 0.5, max.size = 5, var.col = varCol,
            col.low = col.low, col.mid = col.mid, col.high = col.high)
        invisible(list(plot = p, df = rss.df, rowOrder = rowOrder))
    }
    <bytecode: 0x56295d178670>
    <environment: namespace:SCENIC>
    

    最后脚本汇总
    三步完成单细胞数据调控网络流程分析和可视化

    需要四个脚本文件:create_loom_input.R,pyscenic_from_loom.sh,calcRSS_by_scenic.R和function_pyscenic_visualize.R,我把四个脚本都放在目录/00.scripts/下,然后只需要输入Seurat对象的rds路径inrds(例如/test.rds)和细胞分组列ingroup(例如sub),
    运行示例:

    source ~/Miniconda/md/bin/activate pyscenic
    
    inscp='~/00.scripts/'
    inrds='~/test.rds'
    ingroup='sub'
    
    #Step 1
    Rscript ${inscp}/create_loom_input.R -i ${inrds} -d ${ingroup} -l out -a Spatial
    #Step 2
    sh ${inscp}/pyscenic_from_loom.sh -i out.loom -n 20
    #Step 3
    Rscript ${inscp}/calcRSS_by_scenic.R -l aucell.loom -m metadata_subset.xls -c ${ingroup}
    

    运行上述脚本即可,有关参数请查看这篇博客四步完成单细胞数据调控网络流程分析-SCENIC/pySCENIC-2022-09-06,这里不再赘述。
    请注意,这里的脚本是适用于小鼠的,如果要用人的请替换数据库三个文件。


    附上需要用到的4个文件
    • create_loom_input.R
    library(optparse)
    op_list <- list(
    make_option(c("-i", "--inrds"), type = "character", default = NULL, action = "store", help = "The input of Seurat RDS",metavar="rds"),
    make_option(c("-d", "--ident"), type = "character", default = NULL, action = "store", help = "The sample Ident of Seurat object",metavar="idents"),
    make_option(c("-s", "--size"),  type = "integer", default = NULL, action = "store", help = "The sample size of Seurat object",metavar="size"),
    make_option(c("-l", "--label"), type = "character", default = "out", action = "store", help = "The label of output file",metavar="label"),
    make_option(c("-a", "--assay"), type = "character", default = "Spatial", action = "store", help = "The assay of input file",metavar="assay")
    )
    parser <- OptionParser(option_list = op_list)
    opt = parse_args(parser)
    
    assay <- opt$assay
    
    library(Seurat)
    obj <- readRDS(opt$inrds)
    if (!is.null(opt$ident)) {
    Idents(obj) <-  opt$ident
    size=opt$size
    if (!is.null(size)) {
    obj <- subset(x = obj, downsample = opt$size)
    }
    saveRDS(obj,"subset.rds")
    }
    if (is.null(opt$label)) {
    label1 <- 'out'
    }else{
    label1 <- opt$label
    }
    
    library(SCopeLoomR)
    outloom <- paste0(label1,".loom")
    build_loom(file.name = outloom,dgem = obj@assays[[assay]]@counts)
    write.table(obj@meta.data,'metadata_subset.xls',sep='\t',quote=F)
    
    • pyscenic_from_loom.sh
    input_loom=out.loom
    n_workers=20
    #help function
    function usage() {
    echo -e "OPTIONS:\n-i|--input_loom:\t input loom file"
    echo -e "-n|--n_workers:\t working core number"
    echo -e "-h|--help:\t Usage information"
    exit 1
    }
    #get value
    while getopts :i:n:h opt
    do
        case "$opt" in
            i) input_loom="$OPTARG" ;;
            n) n_workers="$OPTARG" ;;
            h) usage ;;
            :) echo "This option -$OPTARG requires an argument."
               exit 1 ;;
            ?) echo "-$OPTARG is not an option"
               exit 2 ;;
        esac
    done
    #在这里改数据库文件
    database='~/01.mouse/'
    tfs=${database}/allTFs_mm.txt
    feather=${database}/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather
    tbl=${database}/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl.txt
    pyscenic=pyscenic
    
    # grn
    $pyscenic grn \
    --num_workers $n_workers \
    --output grn.tsv \
    --method grnboost2 \
    $input_loom  $tfs
    
    # cistarget
    $pyscenic ctx \
    grn.tsv $feather \
    --annotations_fname $tbl \
    --expression_mtx_fname $input_loom \
    --mode "dask_multiprocessing" \
    --output ctx.csv \
    --num_workers $n_workers   \
    --mask_dropouts
    
    # AUCell
    $pyscenic aucell \
    $input_loom \
    ctx.csv \
    --output aucell.loom \
    --num_workers $n_workers
    
    • calcRSS_by_scenic.R
    library(optparse)
    op_list <- list(
    make_option(c("-l", "--input_loom"), type = "character", default = NULL, action = "store", help = "The input of aucell loom file",metavar="rds"),
    make_option(c("-m", "--input_meta"), type = "character", default = NULL, action = "store", help = "The metadata of Seurat object",metavar="idents"),
    make_option(c("-a", "--assay"), type = "character", default = 'Spatial', action = "store", help = "The assay of Seurat object",metavar="assay"),
    make_option(c("-c", "--celltype"), type = "character", default = NULL, action = "store", help = "The colname of metadata to calculate RSS",metavar="label")
    )
    parser <- OptionParser(option_list = op_list)
    opt = parse_args(parser)
    
    library(Seurat)
    library(SCopeLoomR)
    library(AUCell)
    library(SCENIC)
    library(dplyr)
    library(KernSmooth)
    library(RColorBrewer)
    library(plotly)
    library(BiocParallel)
    
    celltype <- opt$celltype
    assay <- opt$assay
    loom <- open_loom(opt$input_loom)
    
    regulons_incidMat <- get_regulons(loom, column.attr.name="Regulons")
    regulons <- regulonsToGeneLists(regulons_incidMat)
    regulonAUC <- get_regulons_AUC(loom,column.attr.name='RegulonsAUC')
    regulonAucThresholds <- get_regulon_thresholds(loom)
    close_loom(loom)
    
    meta <- read.table(opt$input_meta,sep='\t',header=T,stringsAsFactor=F)
    cellinfo <- meta[,c(opt$celltype,paste0("nFeature_",assay),paste0("nCount_",assay))]
    colnames(cellinfo)=c('celltype', 'nGene' ,'nUMI')
    cellTypes <-  as.data.frame(subset(cellinfo,select = 'celltype'))
    selectedResolution <- "celltype"
    
    sub_regulonAUC <- regulonAUC
    rss <- calcRSS(AUC=getAUC(sub_regulonAUC),
                   cellAnnotation=cellTypes[colnames(sub_regulonAUC),
                                            selectedResolution])
    rss=na.omit(rss)
    try({
    rssPlot <- plotRSS(rss)
    save(regulonAUC,rssPlot,regulons,file='regulon_RSS.Rdata')
    })
    
    saveRDS(rss,paste0(celltype,"_rss.rds"))
    
    source('/hwfssz1/ST_SUPERCELLS/P22Z10200N0664/huangbaoqian/ST_PRECISION_USER_huangbaoqian/SCENIC/04.database/00.common_scripts/function_pyscenic_visualize.R')
    plot_pyscenic(inloom='aucell.loom',incolor=incolor,inrss=paste0(celltype,"_rss.rds"),inrds='subset.rds',infun='median', ct.col=celltype,inregulons=NULL,ingrn='grn.tsv',ntop1=5,ntop2=50)
    
    • function_pyscenic_visualize.R
    library(Seurat)
    library(SCopeLoomR)
    library(AUCell)
    library(SCENIC)
    library(dplyr)
    library(KernSmooth)
    library(RColorBrewer)
    library(plotly)
    library(BiocParallel)
    library(pheatmap)
    
    library(cowplot)
    library(ggpubr)
    library(ggsci)
    library(ggplot2)
    library(tidygraph)
    library(ggraph)
    library(stringr)
    
    
    
    colpalettes<-unique(c(pal_npg("nrc")(10),pal_aaas("default")(10),pal_nejm("default")(8),pal_lancet("lanonc")(9),
                          pal_jama("default")(7),pal_jco("default")(10),pal_ucscgb("default")(26),pal_d3("category10")(10),
                          pal_locuszoom("default")(7),pal_igv("default")(51),
                          pal_uchicago("default")(9),pal_startrek("uniform")(7),
                          pal_tron("legacy")(7),pal_futurama("planetexpress")(12),pal_rickandmorty("schwifty")(12),
                          pal_simpsons("springfield")(16),pal_gsea("default")(12)))
    len <- 100
    incolor<-c(brewer.pal(8, "Dark2"),brewer.pal(12, "Paired"),brewer.pal(8, "Set2"),brewer.pal(9, "Set1"),colpalettes,rainbow(len))
    
    plot_pyscenic <- function(inloom='aucell.loom',incolor=incolor,inrss='seurat_annotations_rss.rds',inrds='subset.rds',infun='median', ct.col='seurat_annotations',inregulons=NULL,ingrn='grn.tsv',ntop1=5,ntop2=50){
    ###load data
    loom <- open_loom(inloom)
    
    regulons_incidMat <- get_regulons(loom, column.attr.name="Regulons")
    regulons <- regulonsToGeneLists(regulons_incidMat)
    regulonAUC <- get_regulons_AUC(loom,column.attr.name='RegulonsAUC')
    regulonAucThresholds <- get_regulon_thresholds(loom)
    
    embeddings <- get_embeddings(loom)
    close_loom(loom)
    
    rss <- readRDS(inrss)
    sce <- readRDS(inrds)
    
    ##calculate  RSS fc
    df = do.call(rbind,
                 lapply(1:ncol(rss), function(i){
                    dat= data.frame(
                    regulon  = rownames(rss),
                    cluster =  colnames(rss)[i],
                    sd.1 = rss[,i],
                    sd.2 = apply(rss[,-i], 1, get(infun))
                   )
                 }))
    
    df$fc = df$sd.1 - df$sd.2
    
    #select top regulon
    ntopg <- df %>% group_by(cluster) %>% top_n(ntop1, fc)
    
    ntopgene <- unique(ntopg$regulon)
    write.table(ntopgene,'sd_regulon_RSS.list',sep='\t',quote=F,row.names=F,col.names=F)
    #plot rss by cluster
    
    #using plotRSS
    rssPlot <- plotRSS(rss)
    regulonsToPlot <- rssPlot$rowOrder
    rp_df <- rssPlot$df
    
    write.table(regulonsToPlot,'rss_regulon.list',sep='\t',quote=F,row.names=F,col.names=F)
    write.table(rp_df,'rssPlot_data.xls',sep='\t',quote=F)
    nlen <- length(regulonsToPlot)
    hei <- ceiling(nlen)*0.4
    blu<-colorRampPalette(brewer.pal(9,"Blues"))(100)
    lgroup <- levels(rssPlot$df$cellType)
    
    nlen2 <- length(lgroup)
    wei <- nlen2*2
    pdf(paste0('regulons_RSS_',ct.col,'_in_dotplot.pdf'),wei,hei)
    print(rssPlot$plot)
    dev.off()
    
    # sd top gene
    anrow = data.frame( group = ntopg$cluster)
    lcolor <- incolor[1:length(unique(ntopg$cluster))]
    names(lcolor) <- unique(anrow$group)
    annotation_colors <- list(group=lcolor)
    
    pn1 = rss[ntopg$regulon,]
    pn2 = rss[unique(ntopg$regulon),]
    rownames(pn1) <-  make.unique(rownames(pn1))
    rownames(anrow) <- rownames(pn1)
    scale='row'
    hei <- ceiling(length(ntopg$regulon)*0.4)
    pdf(paste0('regulon_RSS_in_sd_topgene_',ct.col,'.pdf'),wei,hei)
    print(
    pheatmap(pn1,annotation_row = anrow,scale=scale,annotation_colors=annotation_colors,show_rownames = T,main='sd top regulons')
    )
    print(
    pheatmap(pn2,scale=scale,show_rownames = T, main='sd top unique regulons')
    )
    dev.off()
    
    #plotRSS gene
    
    pn2 = rss[unique(rp_df$Topic),]
    scale='row'
    hei <- ceiling(length(unique(rp_df$Topic))*0.4)
    pdf(paste0('regulon_RSS_in_plotRSS_',ct.col,'.pdf'),wei,hei)
    print(
    pheatmap(pn2,scale=scale,show_rownames = T, main='plotRSS unique regulons')
    )
    dev.off()
    
    #all regulons
    
    hei <- ceiling(length(rownames(rss))*0.2)
    pdf(paste0('all_regulons_RSS_in_',ct.col,'.pdf'),wei,hei)
    print(
    pheatmap(rss,scale=scale,show_rownames = T,main='all regulons RSS')
    )
    dev.off()
    #plot rss by all cells
    if (is.null(inregulons)){
    inregulons <- regulonsToPlot
    }else{
    inregulons <- intersect(inregulons,rownames(rss))
    regulonsToPlot <- inregulons
    
    }
    pn3=as.matrix(regulonAUC@assays@data$AUC)
    regulon <- rownames(pn3)
    #regulon <- inregulons
    pn3 <- pn3[regulon,]
    #pn3 <- pn3[,sample(1:dim(pn3)[2],500)]
    
    sce$group1=sce@meta.data[,ct.col]
    
    meta <- sce@meta.data
    meta <- meta[order(meta$group1),]
    #meta <- meta[colnames(pn3),]
    ancol = data.frame(meta[,c('group1')])
    colnames(ancol) <- c('group1')
    rownames(ancol) <- rownames(meta)
    lcolor <- incolor[1:length(unique(ntopg$cluster))]
    names(lcolor) <- unique(ntopg$cluster)
    annotation_colors <- list(group1 =lcolor)
    
    df1 <- ancol
    df1$cell <- rownames(df1)
    df1 <- df1[order(df1$group1),]
    pn3 <- pn3[,rownames(df1)]
    torange=c(-2,2)
    pn3 <- scales::rescale(pn3,to=torange)
    pn3 <- pn3[,rownames(ancol)]
    
    scale='none'
    hei <- ceiling(length(unique(regulon))*0.2)
    pdf(paste0('all_regulon_activity_in_allcells.pdf'),10,hei)
    print(
    pheatmap(pn3,annotation_col = ancol,scale=scale,annotation_colors=annotation_colors,show_rownames = T,show_colnames = F,cluster_cols=F)
    )
    #pheatmap(pn3,scale=scale,show_rownames = T, show_colnames = F,cluster_cols=F)
    dev.off()
    
    #plot in seurat
    regulonsToPlot = inregulons
    sce$sub_celltype <- sce@meta.data[,ct.col]
    sub_regulonAUC <- regulonAUC[,match(colnames(sce),colnames(regulonAUC))]
    
    cellClusters <- data.frame(row.names = colnames(sce),
                               seurat_clusters = as.character(sce$seurat_clusters))
    cellTypes <- data.frame(row.names = colnames(sce),
                            celltype = sce$sub_celltype)
    
    sce@meta.data = cbind(sce@meta.data ,t(sub_regulonAUC@assays@data@listData$AUC[regulonsToPlot,]))
    Idents(sce) <- sce$sub_celltype
    
    nlen <- length(regulonsToPlot)
    hei <- ceiling(nlen)*0.4
    blu<-colorRampPalette(brewer.pal(9,"Blues"))(100)
    nlen2 <- length(unique(sce$sub_celltype))
    wei <- nlen2*2
    pdf('regulons_activity_in_dotplot.pdf',wei,hei)
    print(DotPlot(sce, features = unique(regulonsToPlot)) + coord_flip()+
          theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))+
          scale_color_gradientn(colours = blu)
          )
    dev.off()
    
    hei=ceiling(nlen/4)*4
    pdf('regulons_activity_in_umap.pdf',16,hei)
    print(RidgePlot(sce, features = regulonsToPlot , ncol = 4))
    print(VlnPlot(sce, features = regulonsToPlot,pt.size = 0 ))
    print(FeaturePlot(sce, features = regulonsToPlot))
    dev.off()
    
    grn <- read.table(ingrn,sep='\t',header=T,stringsAsFactors=F)
    inregulons1=gsub('[(+)]','',inregulons)
    
    c1 <- which(grn$TF %in% inregulons1)
    grn <- grn[c1,]
    #edge1 <- data.frame()
    #node1 <- data.frame()
    pdf(paste0(ntop2,'_regulon_netplot.pdf'),10,10)
    for (tf in unique(grn$TF)) {
    tmp <- subset(grn,TF==tf)
    if (dim(tmp)[1] > ntop2) {
    tmp <- tmp[order(tmp$importance,decreasing=T),]
    tmp <- tmp[1:ntop2,]
    }
    node2 <- data.frame(tmp$target)
    node2$node.size=1.5
    node2$node.colour <- 'black'
    colnames(node2) <- c('node','node.size','node.colour')
    df1 <- data.frame(node=tf,node.size=2,node.colour='#FFDA00')
    node2 <- rbind(df1,node2)
    
    
    edge2 <- tmp
    colnames(edge2) <- c('from','to','edge.width')
    edge2$edge.colour <- "#1B9E77"
    torange=c(0.1,1)
    edge2$edge.width <- scales::rescale(edge2$edge.width,to=torange)
    
    graph_data <- tidygraph::tbl_graph(nodes = node2, edges = edge2, directed = T)
    p1 <- ggraph(graph = graph_data, layout = "stress", circular = TRUE) + geom_edge_arc(aes(edge_colour = edge.colour, edge_width = edge.width)) +
      scale_edge_width_continuous(range = c(1,0.2)) +geom_node_point(aes(colour = node.colour, size = node.size))+ theme_void() +
          geom_node_label(aes(label = node,colour = node.colour),size = 3.5, repel = TRUE)
    p1 <- p1 + scale_color_manual(values=c('#FFDA00','black'))+scale_edge_color_manual(values=c("#1B9E77"))
    print(p1)
    }
    dev.off()
    #plot activity heatmap
    meta <- sce@meta.data
    celltype <- ct.col
    cellsPerGroup <- split(rownames(meta),meta[,celltype])
    sub_regulonAUC <- regulonAUC[onlyNonDuplicatedExtended(rownames(regulonAUC)),]
    # Calculate average expression:
    regulonActivity_byGroup <- sapply(cellsPerGroup,
                                      function(cells)
                                        rowMeans(getAUC(sub_regulonAUC)[,cells]))
    scale='row'
    rss <- regulonActivity_byGroup
    hei <- ceiling(length(regulonsToPlot)*0.4)
    pn1 <- rss[regulonsToPlot,]
    pdf(paste0('regulon_activity_in_',ct.col,'.pdf'),wei,hei)
    print(
    pheatmap(pn1,scale=scale,show_rownames = T, main='regulons activity')
    )
    dev.off()
    
    hei <- ceiling(length(rownames(rss))*0.2)
    pdf(paste0('all_regulons_activity_in_',ct.col,'.pdf'),wei,hei)
    print(
    pheatmap(rss,scale=scale,show_rownames = T,main='all regulons activity')
    )
    dev.off()
    
    ##calculate fc
    df = do.call(rbind,
                 lapply(1:ncol(rss), function(i){
                    dat= data.frame(
                    regulon  = rownames(rss),
                    cluster =  colnames(rss)[i],
                    sd.1 = rss[,i],
                    sd.2 = apply(rss[,-i], 1, get(infun))
                   )
                 }))
    
    df$fc = df$sd.1 - df$sd.2
    
    #select top regulon
    ntopg <- df %>% group_by(cluster) %>% top_n(ntop1, fc)
    
    ntopgene <- unique(ntopg$regulon)
    write.table(ntopgene,'sd_regulon_activity.list',sep='\t',quote=F,row.names=F,col.names=F)
    
    anrow = data.frame( group = ntopg$cluster)
    lcolor <- incolor[1:length(unique(ntopg$cluster))]
    names(lcolor) <- unique(anrow$group)
    annotation_colors <- list(group=lcolor)
    pn1 = rss[ntopg$regulon,]
    pn2 = rss[unique(ntopg$regulon),]
    rownames(pn1) <-  make.unique(rownames(pn1))
    rownames(anrow) <- rownames(pn1)
    scale='row'
    hei <- ceiling(length(ntopg$regulon)*0.4)
    pdf(paste0('regulon_activity_in_sd_topgene_',ct.col,'.pdf'),wei,hei)
    print(
    pheatmap(pn1,annotation_row = anrow,scale=scale,annotation_colors=annotation_colors,show_rownames = T,main='sd top regulons')
    )
    print(
    pheatmap(pn2,scale=scale,show_rownames = T, main='sd top unique regulons ')
    )
    dev.off()
    
    }
    
    

    相关文章

      网友评论

          本文标题:SCENIC/pySCENIC分析补充+小鼠数据示例2022-1

          本文链接:https://www.haomeiwen.com/subject/cfihqdtx.html