美文网首页单细胞测序bioinformaticsR plot
单细胞数据整合鉴定肿瘤细胞

单细胞数据整合鉴定肿瘤细胞

作者: 11的雾 | 来源:发表于2022-01-21 10:14 被阅读0次

    公众号:生信诊断所

    本文是一则学习笔记,但有干货
    这里的数据是保密的,做了哪些处理也不能说。就单纯为了测试liger整合数据和Seurat对象直接的转换,还有scCATCH工具的细胞类型注释。

    LIGER(Linked Inference of Genomic Experimental Relationships)基因组实验关系关联推断,用一种整合非负矩阵分解算法。
    主要的步骤分为:数据预处理和归一化,联合因子分解,分位数归一化和联合聚类,以及可视化。矩阵分解的概念,非负矩阵分解 NMF算法可以参照官方文献。

    先找两个数据:一个起名

    # install.packages('rliger')
    library(rliger)
    library(dplyr)
    library(patchwork)
    library(Seurat)
    
    setwd("/home/data/analysis")
    直接指定到10x cellranger分析结果目录下的filtered_feature_bc_matrix目录
    folder1 = "../sample1/filtered_feature_bc_matrix"
    folder2 = "../sample2/filtered_feature_bc_matrix"
    

    step1: preprocessing and normalization

    matrix_list <- rliger::read10X(sample.dirs = c(folder1,folder2),
                           sample.names = c('TU_CL1','tumor_digest'),merge=FALSE)
    
    # make sure all cell names are unique
    colnames(matrix_list$TU_CL1[["Gene Expression"]]) <- paste0(colnames(matrix_list$TU_CL1[["Gene Expression"]]),'_1')
    colnames(matrix_list$tumor_digest[["Gene Expression"]]) <- paste0(colnames(matrix_list$tumor_digest[["Gene Expression"]]),'_2')
    
    # 创建liger对象
    liger_obj <- createLiger(list(TU_CL1=matrix_list$TU_CL1[["Gene Expression"]],
                                  tumor_digest = matrix_list$tumor_digest[["Gene Expression"]]))
    # Removing 14185 genes not expressing in TU_CL1.
    # Removing 10058 genes not expressing in tumor_digest.
    
    # 
    liger_obj <- rliger::normalize(liger_obj) %>%
      rliger::selectGenes() %>% # select high variable genes
      rliger::scaleNotCenter() # scale but not center by subtracting mean; not log-transform
    

    Step 2 : joint matrix factorization

    这一步做联合矩阵分解,k设置20,也就是20个factor,为20个metagene,比较耗时,多核会好一点,15min in 16cores

    # 选做:suggestK非常耗时,可以不做,默认用k=20跑
    rliger::suggestK(liger_obj, k.test = seq(20,30,2),lambda = 5, plot.log2 = F,num.cores = 72)
    #直接用k=20跑
    liger_obj <- rliger::optimizeALS(liger_obj, k=20)
    Finished in 32.68537 mins, 30 iterations.
    Max iterations set: 30.
    Final objective delta: 7.5549e-06.
    Best results with seed 1.
    

    Step 3: quantile normalization and joint clustering

    liger_obj <- rliger::quantile_norm(liger_obj)
    
    # alignment score ranges from 0 (no alignment) to 1 (perfect alignment)
    rliger::calcAlignment(liger_obj) # 评估数据整合的效果。数字越大说明两个数据集比对上的程度越高。
    [1] 0.2550047 
    这里结果是0.255看来也不怎么好,
    

    Step 4: visualization and downstream analysis

    
    liger_obj <- runUMAP(liger_obj)
    Spectral initialization failed to converge, using random initialization instead
    dim_plts <- plotByDatasetAndCluster(liger_obj,
                                        axis.labels = c("UMAP 1",'UMAP 2'),
                                        return.plots = TRUE)
    dim_plts[[1]] + dim_plts[[2]]
    

    这个函数返回的是一个包含了两个ggplot2对象的list: 这里思考一下如何在这个基础上修改图。比如pt.size,label.size等参数


    Image.png
    loading_plts <- plotGeneLoadings(liger_obj, axis.labels = c("UMAP 1","UMAP 2"), return.plots = TRUE)
    
    Image.png
    loading_plts: 每个factor中权重排名靠前的gene
    loading_plts[[3]]
    Image.png

    Step 5: differential expression analysis and marker gene selection

    # Setting compare.method to 'clusters' compares each feature's normalized counts between clusters combining all datasets.
    # Setting compare.method to 'datasets' give us the features most differentially expressed across datasets for every cluster
    
    dge_cluster <- runWilcoxon(liger_obj, compare.method = 'clusters')
    dge_dataset <- runWilcoxon(liger_obj, compare.method = 'datasets')
    dge_cluster_sig <- dge_cluster[dge_cluster$padj <0.05 & abs(dge_cluster$logFC)>1,]
    dge_dataset_sig <- dge_dataset[dge_dataset$padj <0.05 & abs(dge_dataset$logFC)>1,]
    
    plt_genes <- plotGene(liger_obj,
                          gene = dge_dataset_sig[which.max(dge_dataset_sig$logFC),'feature'],
                          keep.scale = TRUE,
                          return.plots = TRUE)
    plt_genes[[1]] + plt_genes[[2]]
    
    Image.png
    plt_genes_v <- plotGeneViolin(liger_obj,
                                  gene = dge_dataset_sig[which.max(dge_dataset_sig$logFC),'feature'],
                                  return.plots = TRUE))
    plt_genes_v[[1]] / plt_genes_v[[2]]
    
    Image.png

    step 6 转换成Seurat 对象,

    # liger to Seurat object
    library(Seurat)
    seurat_obj = rliger::ligerToSeurat(liger_obj, use.liger.genes = TRUE, by.dataset = T, renormalize = TRUE)
    DimPlot(seurat_obj, pt.size = 1, label=T, label.size = 4, repel=T)
    
    Image.png

    step 7 用scCATCH注释cluster

    library(scCATCH)
    liger_clusters = as.character(liger_obj@clusters, levels=1:length(levels(liger_obj@clusters)))
    scCATCH_obj <- scCATCH::createscCATCH(seurat_obj[["RNA"]]@data, cluster = liger_clusters)
    clu_markers <- scCATCH::findmarkergene(scCATCH_obj, species = "Human",
                                           cluster = "All", cancer = c("Normal"),
                                           marker = cellmatch,
                                           tissue = c("Brain"),
                                           use_method = "2", cell_min_pct = 0.25,
                                           logfc = 0.25, pvalue = 0.25
    )
    
    clu_ann <- scCATCH::findcelltype(clu_markers)
    scCATCH_ann_data = merge(clu_ann@meta, clu_ann@celltype,by = "cluster") %>% tibble::column_to_rownames("cell")
    seurat_obj = AddMetaData(seurat_obj, scCATCH_ann_data)
    DimPlot(seurat_obj, pt.size = 1, label = T, label.size = 4, repel = T, group.by = "cell_type")
    
    Image.png

    工具鉴定出来的不是很准确,那么来探究一下,到底这两个样本中有没有肿瘤细胞,它们在哪里呢?
    这时候就需要人工去鉴定了。
    先借助数据库找一下肿瘤marker:
    目前单细胞扩展的marker库有:CellMarker,PanglaoDB,SCSig,Immuno-cell-guide等
    收集了几个肿瘤marker,也不知道准不准,


    Image.png
    cancer_marker <- c("PARP1","EGFR","SOX9","SOX2","POU3F2","OLIG2","SALL2","NFIB")
    FeaturePlot(seurat_obj,features = cancer_marker)
    
    Image.png

    明天再用肿瘤marker自动注释工具试试。
    未完待续。关注公众号获取更多内容。

    相关文章

      网友评论

        本文标题:单细胞数据整合鉴定肿瘤细胞

        本文链接:https://www.haomeiwen.com/subject/nskehrtx.html