【包】limma介绍

作者: JamesMori | 来源:发表于2021-12-17 14:19 被阅读0次

【包】limma介绍
用R语言分析：RNAseq表达矩阵样本的差异性
limma快速差异分析工具！
limma的两个弟弟：edgeR和DESeq2
R语言初学笔记：差异表达基因
一文做会漂亮的火山图
RNASEQ分析入门笔记8-使用DESeq2进行表达差异分析
12.GEO数据集的R语言差异分析和代码解析—1.数据下载和整理
【生信技能树】R语言高级练习题
生信人的20个R语言习题-高级

该文档内容来自limma包introduction文件的学习与整理

写在前面：
芯片数据的分析流程及关键词：

microarray由spots组成，spots是由printer head上的tips单次滴下产生的一点。一个probe对应一个gene，由多个spots组成。一个spot中有foreground以及background两部分，background数据可以用来校正foreground，用intensity来表示强度。有的芯片只有一个样本、一个染色，属于one channel，有的芯片有两个样本、两个染色，是two channel；

image analysis software：图片分析软件用来分析microarray，将图片信息转化为数值信息；

读取图片分析软件输出的文档，然后给文件添加各种信息；

芯片数据质量考察；

背景矫正然后标准化处理；

根据实验设计对基因表达量进行回归（建模）后差异分析；

一、适用范围

GenePix，ImaGene 等图片处理软件导出的芯片数据
affy包读的Affymetrix芯片数据
直接读取Illumina BeadChips，或与beadarray包共用
RNA-seq的序列表达数据转化的基因表达数据

二、数据类型（学包几步：数据类型、数据处理函数、关键函数（输入、输出、参数）、绘图及调整）

EListRaw（Raw Expression list）：储存处理前的single-channel数据，一般是read函数出来的；
EList（Expression list）：储存背景校正和标准化的ElistRaw数据，一般是处理函数出来的，例normalizebetweenArrays();
RGList（Red-Green list）：储存处理前的two-channel数据；
MAList：储存处理RGList得到的M-values或A-values数据，一般是处理函数出来的，例normalizeWithinArrays()，row for spot，一个probe对应很多spot；M：log-intensity ratio；A：log-intensity average；
MArrayLM（MicroArray Linear Model）：储存gene-wise linear models for normalized intensities or log-ratios，通常由lmFit()创建；
TestResults：储存对比结果是否为零的检验结果，通常由decideTests()创建。

1-2中row是probe，3-4row是spot，col都是array
5中row是probe，col是对比或参数
$、summary、dim、length、ncol、nrow、names、dimnames、rownames、colnames、[、cbind、rbind、merge等一众R中普遍的泛类函数都被写进去了。

三、读取相关文件

Affymetrix用affy、gcrma、aroma.affymetrix读取及预处理
三个文件：output files包含probes的IDs和names以及其他注释，通常还需要target files描述样本与channel的对应关系，以及spot files描述spot types，鉴别哪些probes是control；
target files：rows for slides，cols are FileName for image-anlysis output、Cy3 and Cy5 for RNA sample types, other cols are optional, such as SlideNumber or Date, or could be created by yourself.
ImaGene 分析two channel芯片会产生两个表，相应的FileName分成2 cols；

#读取file names
targets<-readTargets()
#设置files参数或直接设置target frame读取output files
RG<-read.maimages(files=targets$FileName, source=“<image_analysis_programe_name>”, path=“”)
RG<-read.maimages(targets, source=“”)
#ImaGene data的读取需要files是2-col矩阵，而不是vector
RG<-read.maimages(files=targets[,c(“FileNameCy3”, “FileNameCy5”), source=“”])
#columns参数修改probes的estimates，annotation参数帮助读取标准格式但不在source范围内的结果文件
RG<-read.maimages(targets, columns=list(R=“”, G=“”, Rb=“”, Gb=“”), annotation=c(“Block”, ”Row”, “Column”, “ID”, “Name”))
show(RG)
summary(RG$R)
RG<-cabins(RG1,RG2)

#green.only=T参数使函数可以读取single-channel数据，intensity会被储存在Elist中E对象中，不能用columns参数
#Illumina的output文件包括多个array，和control probes是分开的文件，导出的一般是probe summary files
x<-read.ilmn(“probe.txt”, ctrlfiles=“control.txt”)
#若有多个output文件，那么需要targets文件，使用read.ilmn.targets函数

读取intensities时，可以读取quality information并计算quality weights，质量比较依赖于image analysis software, weight会被很多函数自动使用。

#添加wt.fun参数计算weights，储存在weights元素，各种programe有不同的计算方式，可以help(QualityWeights)查看，也可以自己创建函数根据output中的值计算
myfun<-function(x) as.numeric(x$Flags>-50.5)
RG<-read.maimages(files, source=“genepix”, wt.fun=myfun)

读取gal文件添加ID

#若有gal文件
RG$genes<-readGAL()
#一般情况就读文件，自己赋值

spot types file
SpotType描述spot types，还应有names及simplified regular expression等足以匹配到probes，其他列包含绘图属性。设置前要先了解programe输出的列名，以便匹配。

#如果给STF文件命名为SpotTypes.txt
spottypes<-readSpotTypes()
RG$genes$Status<-controlStatus(spottypes, RG)

三、矩阵质量检验

绘制MA plot结合各种control可以看出矩阵数据的质量。高质量矩阵的MA plot是大量彗星状非差异表达基因，以及少量差异表达基因。

spottypes<-readSpotTypes()
RG$Status<-controlStatus(spottypes,RG)
plotMD(RG)

箱线图绘制background intensities

boxplot(data.frame(log2(RG$Gb)), main=“Green background”)

3.单个芯片内部分布不均匀，可以通过imageplot作图

imageplot(log2(RG$Gb[,1], RG$printer))

四、数据预处理

background correction
使用background数据处理foreground数据，最简单的处理方法是substract，normexp较常用在差异基因分析上，是适应性地调整foreground intensities，可设置offset参数将低intensities的M value设置为0.

RG<-backgroundCorrect(RG, method=“normexp”, offset=50)

within-array normalization
normalize M values for each array separately
print-tip loess normalization是默认方式
对于无print-tip的Agilent，使用global loess normalization或robust spline normalization
print-tip loess对于少于一定spots per print-tip的不适应，即使多次数，但仍有少量基因少次数
自动使用quality weights，可以添加不同意义的control，赋予他们weights，影响nornalization，这一方法可以用control方法完成

#spikein spots are differentially expressed
#titation spots are not differentially expressed
w<-modifyWeights(RG$weights, RG$genes$Status, c(“spikein”, “titration”), c(0,2))
MA<-normalizeWhithinArrays(RG, weights=w)
#等同于
#csi是非差异表达基因
csi<-RG$genes$Status==“titration”
MA<-normalizeWhithinArrays(RG, method=“control”, controlspots=csi)

between-array normalization
focus on red and green intensities other than log-ratios, so such methods could also be called individual channel or separate channel normalization
用来做一类样本的分析
plotDensities能够绘制所有芯片red and green intensities
within-array normalization 不影响A values，使用quantile normalization作用于A values能够使芯片间分布相同，若直接作用于red or green intensities会有些噪音

MA.Aq<-normalizeBetweenArrays(MA, method=“Aquantile”)
plotDensities(MA.Aq)
MA.q<-normalizeBetweenArrays(MA, method=“quantile”)
plotDensities(MA.q)

marray包
pre-processing two-color microarray data
reading, normalization, graphical display of data
可与limma的类相互转换

五、过滤不需要的引物

对单通路及RNA-seq数据来说比较重要

六、线性模型

design matrix：RNA sample to array
contrast matrix：comparison pairs
建模要覆盖全部系统部分（包括随机变异），进而估计样本间数据变异
RNA sources and coefficients数目一致这个要求太严格了，所以可以提供contrast分析

single-channel design
分析起来和普通的线性回归或方差分析类似，对于矩阵数据，特定的对比是需要的
straightforward strategy：set up the simplest possible design and then to extract from the fit the contrasts of interest

#读取affy处理过的数据
data<-ReadAffy()
eset<-rma(data)
#根据design matrix建线性模型
design<-model.matrix(~0+factor(c(1,1,1,2,2,3,3,3)))
colnames(design)<-c(“group1”,”group2”,”group3”)
fit<-lmFit(este,design)
#给上述模型加上contrast信息
contrast.matrix<-makeContrasts(group2-group1,group3-group2,group3-group1,levels=design)
fit2<-contrasts.fit(fit, contrast.matrix)
fit2<-eBayes(fit2)
#输出差异表达基因
topTable(fit2, chef=1, adjust=“BH”)
#输出TestResults文件
results<-decideTests(fit2)
#输出Venn图
vennDiagram(results)
#线性模型是对于单个基因在矩阵中的表达建模的

common reference design
two-color array with a common reference used
set up the design matrix with targets file

targets<-readTargets(“targets.txt”)
#每个array中都有EGFP样本作为ref
design<-modelMatrix(targets, ref=“EGFP”)
fit<-lmFit(MA,design)
#一样可以添加contrast信息，前三个是查看与EGFP对比，后两两个是特定的对比
contrast.matrix<-makeContrasts(AML1,CBFb,AML1.CBFb,AML1.CBFb-AML1,AML1.CBFb-CBFb, levels=design)
contrast.matrix#查看
fit2<-contrasts.fit(fit,contrasts.matrix)
fit2<-eBayes(fit2)

3.direct two-color design
direct design：问题在于不好定two channel中的ref

design<-modelMatrix(targets, ref=“CD4”)
fit<-lmFit(MA, design)
contrast.matrix<-cbind(“CD8-CD4”=c(1,0), ”DN-CD4”=c(0,1), “CD8-DN”=c(1,-1))
rownames(contrast.matrix)<-colnames(design)
fit2<-contrasts.fit(fit,contrast.matrix)
fit2<-eBayes(fit2)

【包】limma介绍
该文档内容来自limma包introduction文件的学习与整理写在前面：芯片数据的分析流程及关键词：micr...
用R语言分析：RNAseq表达矩阵样本的差异性
我们之前介绍了limma包，limma包是对基因芯片表达矩阵的分析，不能对逆转录RNAseq表达矩阵进行分析（因为...
limma快速差异分析工具！
01—研究背景今天给大家介绍的是用 R程序包limma[1]差异分析 limma包是2015年发表在Nuclei...
limma的两个弟弟：edgeR和DESeq2
01—研究背景上一篇公众号我们为大家详细的介绍了R软件包limma筛选差异基因，limma包做差异分析要求数据满...
R语言初学笔记：差异表达基因
setwd("E:/GSE25066")#环境设置 library(limma)#加载差异分析包limma #将分...
一文做会漂亮的火山图
一、通过limma包对输入数据进行处理 1、归一化处理在利用limma包进行差异分析处理之前，要对数据进行归一化...
RNASEQ分析入门笔记8-使用DESeq2进行表达差异分析
基因的差异表达分析，通常使用R中的软件包，包括：DESeq2，edgeR，limma等，今天介绍DESeq2的分析...
12.GEO数据集的R语言差异分析和代码解析—1.数据下载和整理
一、举例介绍本节下载GSE1009数据集，使用limma包进行差异分析举例。 GSE1009 样本量：共6个样本...
【生信技能树】R语言高级练习题
安装一些R包数据包： ALL, CLL, pasilla, airway软件包：limma，DESeq2，clus...
生信人的20个R语言习题-高级
安装一些R包：数据包： ALL, CLL, pasilla, airway软件包：limma，DESeq2，clu...

【包】limma介绍

一、适用范围

二、数据类型（学包几步：数据类型、数据处理函数、关键函数（输入、输出、参数）、绘图及调整）

三、读取相关文件

三、矩阵质量检验

四、数据预处理

五、过滤不需要的引物

六、线性模型

相关文章

【包】limma介绍

用R语言分析：RNAseq表达矩阵样本的差异性

limma快速差异分析工具！

limma的两个弟弟：edgeR和DESeq2

R语言初学笔记：差异表达基因

一文做会漂亮的火山图

RNASEQ分析入门笔记8-使用DESeq2进行表达差异分析

12.GEO数据集的R语言差异分析和代码解析—1.数据下载和整理

【生信技能树】R语言高级练习题

生信人的20个R语言习题-高级

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读