美文网首页
课前准备--单细胞CNV分析与CNV聚类(封装版)

课前准备--单细胞CNV分析与CNV聚类(封装版)

作者: 单细胞空间交响乐 | 来源:发表于2024-06-18 20:36 被阅读0次

    作者,Evil Genius

    计算原理仍然是inferCNV

    计算步骤

    1、Subtract the reference gene expression from all cells. Since the data is in log space, this effectively computes the log fold change. If references for multiple categories are available (i.e. multiple values are specified to reference_cat), the log fold change is “bounded”:
    • Compute the mean gene expression for each category separately.
    • Values that are within the minimum and the maximum of the mean of all references, receive a log fold change of 0, since they are not considered different from the background.
    • From values smaller than the minimum of the mean of all references, subtract that minimum.
    • From values larger than the maximum of the mean of all references, subtract that maximum.

    This procedure avoids calling false positive CNV regions due to cell-type specific expression of clustered gene regions (e.g. Immunoglobulin- or HLA genes in different immune cell types).

    2、Clip the fold changes at -lfc_cap and +lfc_cap.
    3、Smooth the gene expression by genomic position. Computes the average over a running window of length window_size. Compute only every nth window to save time & space, where n = step.
    4、Center the smoothed gene expression by cell, by subtracting the median of each cell from each cell.
    5、Perform noise filtering. Values < dynamic_theshold * STDDEV are set to 0, where STDDEV is the standard deviation of the smoothed gene expression
    6、Smooth the final result using a median filter.

    实现目标

    CNV聚类
    脚本封装版
    #! /bin/python
    ### 20240618
    ### zhaoyunfei
    ### https://infercnvpy.readthedocs.io/en/latest/notebooks/tutorial_3k.html
    
    import pandas as pd
    import numpy as np
    import sklearn
    import scanpy as sc
    import infercnvpy as cnv
    import matplotlib.pyplot as plt
    import warnings
    import argparse
    
    warnings.simplefilter("ignore")
    
    sc.settings.set_figure_params(figsize=(5, 5))
    
    parse=argparse.ArgumentParser(description='scvelo')
    parse.add_argument('--input',help='the anndata',type=str,required = True)
    parse.add_argument('--outdir',help='the analysis dir',type=str)
    parse.add_argument('--sample',help='the sample name',type=str,required = True)
    parse.add_argument('--ref',help='the ref celltype;eg T,B',type=str,required = True)
    parse.add_argument('--windiow',help='the bin size',type=int,default = 250)
    
    argv = parse.parse_args()
    adata = argv.input
    outdir = argv.outdir
    sample = argv.sample
    ref = argv.ref
    windiow = argv.windiow
    
    adata = sc.read(adata)
    
    sc.pp.normalize_total(adata)
    
    adata.var['symbol'] = adata.var.index
    
    adata.var.index = adata.var['gene_ids']
    
    gene_order = pd.read_csv('/home/samples/DB/scRNA/inferCNV/gene_order.txt',sep = '\t',index_col = 0)
    
    gene_intersect = list(set(adata.var.index) & set(gene_order.index))
    
    adata = adata[:,gene_intersect]
    
    gene_order = gene_order.loc[gene_intersect,:]
    
    adata.var.index.name = 'gene'
    
    tmp = pd.merge(adata.var,gene_order,on = 'gene_ids')
    
    tmp.index = tmp['symbol']
    
    adata.var = tmp
    
    adata.var['chromosome'] = 'chr' + adata.var['chr']
    
    adata.var['ensg'] = adata.var['gene_ids']
    
    adata.var['start'] = adata.var['Start']
    
    adata.var['end'] = adata.var['End']
    
    reference = ref.split(',')
    
    cnv.tl.infercnv(adata,reference_key="celltype",reference_cat=reference,window_size=windiow)
    
    cnv.pl.chromosome_heatmap(adata, groupby="celltype")
    
    plt.savefig(outdir + '/' + sample + '.cnv.heatmap.png',bbox_inches = 'tight')
    
    ####cnv cluster
    
    cnv.tl.pca(adata)
    
    cnv.pp.neighbors(adata)
    
    cnv.tl.leiden(adata)
    
    cnv.pl.chromosome_heatmap(adata, groupby="cnv_leiden", dendrogram=True)
    
    plt.savefig(outdir + '/' + sample + '.cnv.leiden.heatmap.png',bbox_inches = 'tight')
    
    ####UMAP plot of CNV profiles
    cnv.tl.umap(adata)
    
    cnv.tl.cnv_score(adata)
    
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(11, 11))
    ax4.axis("off")
    cnv.pl.umap(adata,color="cnv_leiden",legend_loc="on data",legend_fontoutline=2,ax=ax1,show=False,)
    
    cnv.pl.umap(adata, color="cnv_score", ax=ax2, show=False)
    
    cnv.pl.umap(adata, color="celltype", ax=ax3)
    
    plt.savefig(outdir + '/' + sample + '.cnv.celltype.umap.CNV.png',bbox_inches = 'tight')
    
    ####visualize the CNV score and clusters on the transcriptomics-based UMAP plot
    fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(12, 11), gridspec_kw={"wspace": 0.5})
    ax4.axis("off")
    
    sc.pl.umap(adata, color="cnv_leiden", ax=ax1, show=False)
    
    sc.pl.umap(adata, color="cnv_score", ax=ax2, show=False)
    
    sc.pl.umap(adata, color="celltype", ax=ax3)
    
    plt.savefig(outdir + '/' + sample + '.cnv.celltype.umap.RNA.png',bbox_inches = 'tight')
    
    ####Classifying tumor cells
    tumor = []
    
    for i in adata.obs['cnv_score']:
    
        if i > 0.2 :
            
            tumor.append('tumor')
    
        else :
    
            tumor.append('normal')
    
    adata.obs["cnv_status"] = tumor
    
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5), gridspec_kw={"wspace": 0.5})
    
    cnv.pl.umap(adata, color="cnv_status", ax=ax1, show=False)
    
    sc.pl.umap(adata, color="cnv_status", ax=ax2) 
    
    plt.savefig(outdir + '/' + sample + '.cnv.status.png',bbox_inches = 'tight')
    
    cnv.pl.chromosome_heatmap(adata[adata.obs["cnv_status"] == "tumor", :])
    
    plt.savefig(outdir + '/' + sample + 'tumor.cnv.status.png',bbox_inches = 'tight')
    
    cnv.pl.chromosome_heatmap(adata[adata.obs["cnv_status"] == "normal", :])
    
    plt.savefig(outdir + '/' + sample + 'normal.cnv.status.png',bbox_inches = 'tight')
    
    adata.write(outdir + '/' + sample + '.h5ad')
    

    生活很好,有你更好

    第一课:单细胞联合突变信息分析
    第二课:单细胞数据分析之新版 velocity
    第三课:单细胞数据分析之 CNV 聚类
    第四课:单细胞通讯分析大汇总(cellphoneDB、NicheNet、CellChat)
    第五课:单细胞联合 VDJ分析
    第六课:单细胞联合 ATAC 分析
    第七课:空间转录组整合分析与形态学判定(空间注释)
    第八课:单细胞空间联合分析(cell2location)与区域细胞富集
    第九课:空间细胞类型共定位与细胞通路同定位
    第10课:单细胞空间联合分析之 MIA
    第 11 课:空间生态位通讯与 COMMOT
    第 12 课:空间转录组之分子 niche 与细胞 niche
    第 13 课:空间网络(基因、细胞、通路网络)
    第14课:空间转录组微生物的检测运用
    第 15 课:空间 VDJ(暂定)
    第 16 课:空间 CNV burden 与 CNV 聚类
    第 17 课:空间邻域分析(banksy、CellNeighborEx),分子邻域与细胞邻域
    第 18 课:空间转录组 hotspot
    第 19 课:空间细胞“社区”(空间细胞“unit”)
    第20课:空间基因梯度
    第21课:空间转录组的活性空间域
    第 22 课:空间转录组之 velocity
    第 23 课:空间轨迹
    第24课:细胞邻域特征评分(异型细胞网络)
    第25课:空间分子生态位差异分析
    第 26 课:高精度 HD 平台数据分析框架
    第 27 课:高精度 HD 平台分析内容与 niche
    第 28 课:Xenium 分析框架
    第29 课:原位数据细胞分割导论
    第 30 课:Xenium 生态位分析与微环境分析

    相关文章

      网友评论

          本文标题:课前准备--单细胞CNV分析与CNV聚类(封装版)

          本文链接:https://www.haomeiwen.com/subject/jszfcjtx.html