ChIP-Seq: DiffBind无control，无重复样本

作者: scdzzdw | 来源:发表于2019-10-23 17:14 被阅读0次

ChIP-Seq: DiffBind无control，无重复样本
第6篇：重复样本的处理——IDR
无重复RNAseq样本Gfold值的计算
第9篇：差异peaks分析——DiffBind
统计学笔记5 置信区间
统计学笔记4 抽样分布
「聚类分析」16聚类分析之KMeans算法与K中心点算法
2022-11-17 DiffBind使用
{haskell} 巧用nub生成无重复随机数
客户分群-聚类算法

DiffBind的使用有前辈已经写的很详细了，可以参考下：

另附上其官方手册：

https://www.bioconductor.org/packages/release/bioc/vignettes/DiffBind/inst/doc/DiffBind.pdf

DiffBind首先要导入一个SampleSheet文件，格式为csv，官方文档提到Spreadsheets in Excel® format, with a .xls or .xlsx suffix, are also accepted，但是我导入时出错。
SampleSheet文件含有固定的几列，

image.png
PeakCaller的选项为
– “raw”: text file file; peak score is in fourth column
– “bed”: .bed file; peak score is in fifth column
– “narrow”: default peak.format: narrowPeaks file
– “macs”: MACS .xls file
– “swembl”: SWEMBL .peaks file
– “bayes”: bayesPeak file
– “peakset”: peakset written out using pv.writepeakset
– “fp4”: FindPeaks v4
（详见手册：https://www.bioconductor.org/packages/release/bioc/manuals/DiffBind/man/DiffBind.pdf）
其中我的样本无control的input，所以两列ControlID和bamControl为空。

image.png
其次bam文件路径仍为E:\defect\DNA_protein_interaction\GSE55506\Differential_expression\T2N_H3K4me3_sorted.bam，无需写成R识别的\\，导入R测试

> dbObj <- dba(sampleSheet="SampleSheet.csv")
trisomy_21 fibroblasts trisomy_21 trisomy_21 trisomy_21 1 narrow
euploid fibroblasts euploid euploid euploid 1 narrow
> dbObj
2 Samples, 33153 sites in matrix (47495 total):
          ID      Tissue     Factor  Condition  Treatment Replicate Caller Intervals
1 trisomy_21 fibroblasts trisomy_21 trisomy_21 trisomy_21         1 narrow     40820
2    euploid fibroblasts    euploid    euploid    euploid         1 narrow     44391

没问题，证明无control的input也是可行的，但是进行差异分析时报错

> dbObj <- dba.contrast(dbObj, categories=DBA_FACTOR,minMembers = 1)
Error in dba.contrast(dbObj, categories = DBA_FACTOR, minMembers = 1) : 
  minMembers must be at least 2. Use of replicates strongly advised.
> dbObj <- dba.contrast(dbObj, categories=DBA_FACTOR,minMembers = 2)
Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers. 
> dbObj <- dba.analyze(dbObj)
Error in pv.DBA(DBA, method, bSubControl, bFullLibrarySize, bTagwise = bTagwise,  : 
  Unable to perform analysis: no contrasts specified.
In addition: Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers. 
> dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION)
Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers. 
> dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION, minMembers = 1)
Error in dba.contrast(dbObj, categories = DBA_CONDITION, minMembers = 1) : 
  minMembers must be at least 2. Use of replicates strongly advised.

根据提示，难道一定需要2个以上的重复？待解决......

=================================================================================
去论坛及官网问了下，DiffBind的作者给出了回答，输入的样本DiffBind需要重复
原回答：
Yes, replicates are required to do any kind of statistical analysis. Replicates are required to estimate the variance in the data and calculate confidence statistics such as p-values/FDRs.

Without replicates, you can do some exploratory analysis of overlapping peaks (occupancy analysis). For example using dba.plotVenn(). But not knowing if your data represents an outlier, combined with the inherent noisiness of peak calling, means you will have to have another way to validate any "differential" peaks you identify.
链接：https://support.bioconductor.org/p/125809/#125840