美文网首页
2019-07-05学习总结

2019-07-05学习总结

作者: 桃浪桃浪 | 来源:发表于2019-07-05 17:49 被阅读0次
作业3

找到R包CLL内置的数据集的表达矩阵里面的TP53基因的表达量,并且绘制在 progres.-stable分组的boxplot图
答案如下:

> rm(list = ls())
> options(stringsAsFactors = F)
> suppressPackageStartupMessages(library(CLL))
> data(sCLLex)
> sCLLex
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12625 features, 22 samples 
  element names: exprs 
protocolData: none
phenoData
  sampleNames: CLL11.CEL CLL12.CEL ... CLL9.CEL (22 total)
  varLabels: SampleID Disease
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: hgu95av2 
> exprSet <- exprs(sCLLex)###获得其表达矩阵
image.png
> pd <- pData(sCLLex)###获得个样本对应的疾病类型分组
> library(hgu95av2.db)
> ls("package:hgu95av2.db")###查看此包中的内置数据集
 [1] "hgu95av2"              "hgu95av2.db"           "hgu95av2_dbconn"      
 [4] "hgu95av2_dbfile"       "hgu95av2_dbInfo"       "hgu95av2_dbschema"    
 [7] "hgu95av2ACCNUM"        "hgu95av2ALIAS2PROBE"   "hgu95av2CHR"          
[10] "hgu95av2CHRLENGTHS"    "hgu95av2CHRLOC"        "hgu95av2CHRLOCEND"    
[13] "hgu95av2ENSEMBL"       "hgu95av2ENSEMBL2PROBE" "hgu95av2ENTREZID"     
[16] "hgu95av2ENZYME"        "hgu95av2ENZYME2PROBE"  "hgu95av2GENENAME"     
[19] "hgu95av2GO"            "hgu95av2GO2ALLPROBES"  "hgu95av2GO2PROBE"     
[22] "hgu95av2MAP"           "hgu95av2MAPCOUNTS"     "hgu95av2OMIM"         
[25] "hgu95av2ORGANISM"      "hgu95av2ORGPKG"        "hgu95av2PATH"         
[28] "hgu95av2PATH2PROBE"    "hgu95av2PFAM"          "hgu95av2PMID"         
[31] "hgu95av2PMID2PROBE"    "hgu95av2PROSITE"       "hgu95av2REFSEQ"       
[34] "hgu95av2SYMBOL"        "hgu95av2UNIGENE"       "hgu95av2UNIPROT"      
> ids <- toTable(hgu95av2SYMBOL)
> head(ids)
   probe_id  symbol
1   1000_at   MAPK3
2   1001_at    TIE1
3 1002_f_at CYP2C19
4 1003_s_at   CXCR5
5   1004_at   CXCR5
6   1005_at   DUSP1

ID转换:在ids文件中,通过基因名(symbol)“TP53”查找到相应的“probe_id”


image.png

画图

> boxplot(exprSet['1939_at',] ~ pd$Disease)
> boxplot(exprSet['1974_s_at',] ~ pd$Disease)
> boxplot(exprSet['31618_at',] ~ pd$Disease)
image.png
image.png
image.png
作业 4

找到BRCA1基因在TCGA数据库的乳腺癌数据集(Breast Invasive Carcinoma (TCGA, PanCancer Atlas))的表达情况

提示:使用http://www.cbioportal.org/index.do 定位数据集:http://www.cbioportal.org/datasets

作业5

找到TP53基因在TCGA数据库的乳腺癌数据集的表达量分组看其是否影响生存

提示使用:http://www.oncolnc.org/

step1 下载数据

进入网站http://www.oncolnc.org/,输入你想要的基因

image.png

找到你想要的癌症类型,点击进入


image.png

输入高、低两个数值


image.png
点击[click here],进行下载
image.png

将其保存为CSV格式

step2 读入文件
> b <- read.csv(file = 'BRCA_7157_50_50.csv',header = T,sep = ',')
step3 ggbetweenstats 作图
> library(ggstatsplot)
> ?ggstatsplot
> colnames(b)
[1] "Patient"    "Days"       "Status"     "Expression" "Group"     
> ggbetweenstats(data = b,x =Group,y =Expression)
t is large; approximation invoked.
Note: Shapiro-Wilk Normality Test for Expression : p-value = < 0.001

Note: Bartlett's test for homogeneity of variances for factor Group : p-value = < 0.001

Warning messages:
1: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
  full precision may not have been achieved in 'pnt{final}'
2: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
  full precision may not have been achieved in 'pnt{final}'
3: In stats::pt(q = tvalue, df = dfvalue, ncp = x) :
  full precision may not have been achieved in 'pnt{final}'
image.png
step4 生存曲线绘制
> library(ggplot2)
> library(survival)
> ?survival
> library(survminer)
> table(b$Status)###table函数:检测数据中各分组的个数

Alive  Dead 
  871   135 
> b$Status <- ifelse(b$Status=='Dead',1,0)###检测一个向量中符合某一要求的元素,并让其返回一个值
> ?ifelse
> table(b$Status)

  0   1 
871 135 
> ?survfit
> sfit <- survfit(Surv(Days,Status)~Group,data=b)###survfit绘制生存曲线
> summary(sfit)
Call: survfit(formula = Surv(Days, Status) ~ Group, data = b)

                Group=High 
 time n.risk n.event survival std.err lower 95% CI upper 95% CI
  116    469       1    0.998 0.00213        0.994        1.000
  174    463       1    0.996 0.00303        0.990        1.000
  224    455       1    0.994 0.00373        0.986        1.000
  239    452       1    0.991 0.00432        0.983        1.000
  255    450       1    0.989 0.00484        0.980        0.999
  302    442       1    0.987 0.00532        0.977        0.997
  320    439       1    0.985 0.00576        0.973        0.996
.......
> ?summary
> ?ggsurvplot###使用ggplot2绘制生存曲线
> ggsurvplot(sfit,conf.int = F,pval = TRUE)
image.png

相关文章

网友评论

      本文标题:2019-07-05学习总结

      本文链接:https://www.haomeiwen.com/subject/ruiehctx.html