-
CellphoneDB是基于Sender细胞表面配体和Receiver细胞表面受体对应的基因的表达量的高低进行的分析。
-
CellhoneDB仅能对于人类基因数据进行分析,因此需要对小鼠基因进行同源基因的转换
Step1 获得表达矩阵及注释文件
-
表达矩阵的格式:
行名为基因列名为barcode
adata.to_df().T
不过得到的是gene_symbol作为行名
关于是否使用raw_counts:
Counts file. scRNA-seq count data containing gene expression values in which rows are genes presented with gene names identifiers (Ensembl IDs, gene names or hgnc_symbol annotation) and columns are cells. We recommend using normalized count data. Importantly, the user needs to specify whether the data was log-transformed when using the subsampling option.
-
注释文件格式
第一列为cell第二列为cell_type
用sc.get.obs_df
即可
Step2 小鼠同源基因转换
rm(list = ls())#一键清空
options(stringsAsFactors = F) #字符串不转化为因子
library(biomaRt)
exp <- read.table("exp_db.txt", header = TRUE, row.names=NULL)
exp[1:5,1:5]
names(exp)[1] <-"MGI.symbol"#以便于后期inner join
mouse.gene <- rownames(exp)
human.mart <- biomaRt::useMart(host="https://dec2021.archive.ensembl.org", "ENSEMBL_MART_ENSEMBL", dataset="hsapiens_gene_ensembl")
mouse.mart <- biomaRt::useMart(host="https://dec2021.archive.ensembl.org", "ENSEMBL_MART_ENSEMBL", dataset="mmusculus_gene_ensembl")
m2h.g <- getLDS(attributes = c("mgi_symbol"),filters = "mgi_symbol",
values = mouse.gene,mart = mouse.mart,
attributesL = c("hgnc_symbol","chromosome_name","start_position"),
martL = human.mart,uniqueRows = T,)
head(m2h.g)
m2h.g2 <- m2h.g[,1:2]
dim(m2h.g2)
dim(exp)
m2h.g3 <- m2h.g2[!m2h.g2$HGNC.symbol=="",]
dim(m2h.g3)
hexp <- m2h.g3 %>% inner_join(exp, by = "MGI.symbol")
hexp[1:5,1:5]
dim(hexp)
exp[exp$MGI.symbol=="mt-Nd6","AAACCTGAGGAGCGAG"]#验证一下
hexp2 <- hexp %>% distinct(HGNC.symbol, .keep_all = TRUE)
dim(hexp2)
row.names(hexp2) <- hexp2[,2]
hexp2[1:5,1:5]
hexp3<-hexp2[,-c(1,2)]
hexp3[1:5,1:5]
dim(hexp3)
注意最好不要直接保存为table,因为这样行和列都会加上双引号,可以先保存为csv,然后再改成txt(tab分割)
注意完全模仿参考文件的格式
Step3 在终端进行cellphonedb分析
cellphonedb method statistical_analysis \
CD8_annotation_db_hs.txt \
CD8_exp_db_hs.txt \
--counts-data=gene_name \
--threads=4
output:
- deconvoluted.txt: 基因在亚群中的平均表达量
- mean.txt:每对受体-配体的平均表达量
-pvalue.txt:每对受体-配体的P值 - significant_means.txt:每对受体-配体显著性结果的平均表达量值
Step4 利用Cellphonedb自带的绘图
cellphonedb plot dot_plot \
--means-path=means.txt \
--pvalues-path=pvalues.txt \
--output-path=/out
To plot only desired rows/columns (samples for rows and columns based in example data files):
cellphonedb plot dot_plot --rows in/rows.txt --columns in/columns.txt
网友评论