作业3
找到R包CLL内置的数据集的表达矩阵里面的TP53基因的表达量,并且绘制在 progres.-stable分组的boxplot图
答案如下:
> rm(list = ls())
> options(stringsAsFactors = F)
> suppressPackageStartupMessages(library(CLL))
> data(sCLLex)
> sCLLex
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12625 features, 22 samples
element names: exprs
protocolData: none
phenoData
sampleNames: CLL11.CEL CLL12.CEL ... CLL9.CEL (22 total)
varLabels: SampleID Disease
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu95av2
> exprSet <- exprs(sCLLex)###获得其表达矩阵
![](https://img.haomeiwen.com/i18577194/223ce34a53b62966.png)
> pd <- pData(sCLLex)###获得个样本对应的疾病类型分组
![](https://img.haomeiwen.com/i18577194/c9849fb768a600af.png)
> library(hgu95av2.db)
> ls("package:hgu95av2.db")###查看此包中的内置数据集
[1] "hgu95av2" "hgu95av2.db" "hgu95av2_dbconn"
[4] "hgu95av2_dbfile" "hgu95av2_dbInfo" "hgu95av2_dbschema"
[7] "hgu95av2ACCNUM" "hgu95av2ALIAS2PROBE" "hgu95av2CHR"
[10] "hgu95av2CHRLENGTHS" "hgu95av2CHRLOC" "hgu95av2CHRLOCEND"
[13] "hgu95av2ENSEMBL" "hgu95av2ENSEMBL2PROBE" "hgu95av2ENTREZID"
[16] "hgu95av2ENZYME" "hgu95av2ENZYME2PROBE" "hgu95av2GENENAME"
[19] "hgu95av2GO" "hgu95av2GO2ALLPROBES" "hgu95av2GO2PROBE"
[22] "hgu95av2MAP" "hgu95av2MAPCOUNTS" "hgu95av2OMIM"
[25] "hgu95av2ORGANISM" "hgu95av2ORGPKG" "hgu95av2PATH"
[28] "hgu95av2PATH2PROBE" "hgu95av2PFAM" "hgu95av2PMID"
[31] "hgu95av2PMID2PROBE" "hgu95av2PROSITE" "hgu95av2REFSEQ"
[34] "hgu95av2SYMBOL" "hgu95av2UNIGENE" "hgu95av2UNIPROT"
> ids <- toTable(hgu95av2SYMBOL)
> head(ids)
probe_id symbol
1 1000_at MAPK3
2 1001_at TIE1
3 1002_f_at CYP2C19
4 1003_s_at CXCR5
5 1004_at CXCR5
6 1005_at DUSP1
ID转换:在ids文件中,通过基因名(symbol)“TP53”查找到相应的“probe_id”
![](https://img.haomeiwen.com/i18577194/a814d9bdb27cce6b.png)
画图
> boxplot(exprSet['1939_at',] ~ pd$Disease)
> boxplot(exprSet['1974_s_at',] ~ pd$Disease)
> boxplot(exprSet['31618_at',] ~ pd$Disease)
![](https://img.haomeiwen.com/i18577194/6cbccbd5b462afdc.png)
![](https://img.haomeiwen.com/i18577194/373efb27757c6c3f.png)
![](https://img.haomeiwen.com/i18577194/22e16855bdda5bc4.png)
作业 4
找到BRCA1基因在TCGA数据库的乳腺癌数据集(Breast Invasive Carcinoma (TCGA, PanCancer Atlas))的表达情况
提示:使用http://www.cbioportal.org/index.do 定位数据集:http://www.cbioportal.org/datasets
作业5
找到TP53基因在TCGA数据库的乳腺癌数据集的表达量分组看其是否影响生存
step1 下载数据
进入网站http://www.oncolnc.org/,输入你想要的基因
![](https://img.haomeiwen.com/i18577194/9b0539de9e018fda.png)
找到你想要的癌症类型,点击进入
![](https://img.haomeiwen.com/i18577194/192473cde6a312f4.png)
输入高、低两个数值
![](https://img.haomeiwen.com/i18577194/eafac293b0834a5b.png)
点击[click here],进行下载
![](https://img.haomeiwen.com/i18577194/31d1b3eb88bbc5f8.png)
将其保存为CSV格式
step2 读入文件
> b <- read.csv(file = 'BRCA_7157_50_50.csv',header = T,sep = ',')
step3 ggbetweenstats 作图
> library(ggstatsplot)
> ?ggstatsplot
> colnames(b)
[1] "Patient" "Days" "Status" "Expression" "Group"
> ggbetweenstats(data = b,x =Group,y =Expression)
t is large; approximation invoked.
Note: Shapiro-Wilk Normality Test for Expression : p-value = < 0.001
Note: Bartlett's test for homogeneity of variances for factor Group : p-value = < 0.001
Warning messages:
1: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
full precision may not have been achieved in 'pnt{final}'
2: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
full precision may not have been achieved in 'pnt{final}'
3: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
full precision may not have been achieved in 'pnt{final}'
![](https://img.haomeiwen.com/i18577194/7c38dc26bdef55a2.png)
step4 生存曲线绘制
> library(ggplot2)
> library(survival)
> ?survival
> library(survminer)
> table(b$Status)###table函数:检测数据中各分组的个数
Alive Dead
871 135
> b$Status <- ifelse(b$Status=='Dead',1,0)###检测一个向量中符合某一要求的元素,并让其返回一个值
> ?ifelse
> table(b$Status)
0 1
871 135
> ?survfit
> sfit <- survfit(Surv(Days,Status)~Group,data=b)###survfit绘制生存曲线
> summary(sfit)
Call: survfit(formula = Surv(Days, Status) ~ Group, data = b)
Group=High
time n.risk n.event survival std.err lower 95% CI upper 95% CI
116 469 1 0.998 0.00213 0.994 1.000
174 463 1 0.996 0.00303 0.990 1.000
224 455 1 0.994 0.00373 0.986 1.000
239 452 1 0.991 0.00432 0.983 1.000
255 450 1 0.989 0.00484 0.980 0.999
302 442 1 0.987 0.00532 0.977 0.997
320 439 1 0.985 0.00576 0.973 0.996
.......
> ?summary
> ?ggsurvplot###使用ggplot2绘制生存曲线
> ggsurvplot(sfit,conf.int = F,pval = TRUE)
![](https://img.haomeiwen.com/i18577194/5c908dc96b50645b.png)
网友评论