MCPcounter实战点点滴滴

作者: 陈宇乔 | 来源:发表于2018-12-27 17:38 被阅读101次

第一

我得到的数据是nrpm值，处理的方法如下。考虑了管家基因后的normalization。然后limma做差异分析。

image.png

我的问题：能不能使用原始read counts数据用DEseq2做normalization？（DEseq2做normilization时需要使用为整数的read counts，不然会报错）
回答：DEseq2做normalization时没有考虑到管家基因，所以不建议用此方法

第二

我自己在给我的数据nrpm上直接做了PCA图：nrpm数据格式和PCA图如下:

由于数据差异较大，PCA图不是特别好看，所以建议可以Zscore或者logratio再次进行归一化。代码参考https://www.jianshu.com/p/57f62efa0fab

b<- scale(expr)
a<- log(expr+1)
########zscore就是scale

第三

需要研究一下MCPcount是需要read counts还是可以用normalization的值
但是Xcell写的很明确，read counts或者normalization的值都可以，因为它只做ranking

MCPcounter方法学原文中使用TCGA数据库验证时，使用了normalized results。所以解释了我的疑问。
查考文章：Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression

image.png

MCP counter和cibersort最大的区别就是cibersort是计算淋巴细胞的比例，二MCPcounter是一个决定计数方法。

第四

我自己的数据表明

Primary tumor

image.png	image.png

Metastasis lesion

image.png	image.png

参考文章Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression
此文章对Lung Adenocarcinoma的数据进行了分析发现，将患者分成四组Blinegae 和Tcell 高表达的患者预后最好，我们的数据有一样的发现！！奠定了这一篇文章的基础。
后续我也会将分组信息变成四组进行分析

image.png

第五

MCPcounter进行TCGA数据挖掘的方法

只需要构建一个表达矩阵：colname是sample_name，rownames是基因symbol或者另外两个ID（不记得了）featuresType=c("HUGO_symbols")其实有三种可选

代码如下

library(curl)
library(MCPcounter)
??MCPcounter.estimate
load(file = 'TCGA.Rdata')
exprMatrix[1:10,1:10]####################ensemble整洁版ID
####library(clusterProfiler)
library(org.Hs.eg.db)
ls("package:org.Hs.eg.db")
g2s=toTable(org.Hs.egSYMBOL);head(g2s)
g2e=toTable(org.Hs.egENSEMBL);head(g2e)
tmp=merge(g2e,g2s,by='gene_id')
head(tmp)
colnames(exprMatrix)[ncol(exprMatrix)] <- c("ensembl_id")###################重命名Ensemble_ID 便于后面merge
exprMatrix<- merge(tmp,exprMatrix,by='ensembl_id')
exprMatrix<- exprMatrix[,- c(1,2)]
exprMatrix=exprMatrix[!duplicated(exprMatrix$symbol),]
row.names(exprMatrix)<- exprMatrix[,1]
exprMatrix<- exprMatrix[,-1]
probesets=read.table(curl("http://raw.githubusercontent.com/ebecht/MCPcounter/master/Signatures/probesets.txt"),sep="\t",stringsAsFactors=FALSE,colClasses="character")
genes=read.table(curl("http://raw.githubusercontent.com/ebecht/MCPcounter/master/Signatures/genes.txt"),sep="\t",stringsAsFactors=FALSE,header=TRUE,colClasses="character",check.names=FALSE)
results<- MCPcounter.estimate(exprMatrix,featuresType=c("HUGO_symbols")[1],
                    probesets=probesets,
                    genes=genes
)

第五

修改生存曲线图例legend 参考资料：
http://www.sthda.com/english/wiki/survminer-r-package-survival-data-analysis-and-visualization
待续。。。。。

MCPcounter实战点点滴滴

第一

第二

第三

第四

Primary tumor

Metastasis lesion

第五

第五

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

肿瘤免疫微环境

生信

生物信息学R语言源码

TCGA数据挖掘

免疫肿瘤