DiffBind
的使用有前辈已经写的很详细了,可以参考下:
- https://mp.weixin.qq.com/s?__biz=MzAxMDkxODM1Ng==&mid=2247487545&idx=1&sn=6dea2112d5a1c14555a4263d5dcfe42c&chksm=9b485082ac3fd994df4665359d5e71feb39c4188e9aabebe3996ca0744a54cc711df368068a7&scene=21#wechat_redirect
- https://www.jianshu.com/p/f849bd55ac27
- https://zhuanlan.zhihu.com/p/52379828
另附上其官方手册:
DiffBind
首先要导入一个SampleSheet
文件,格式为csv
,官方文档提到Spreadsheets in Excel® format, with a .xls or .xlsx suffix, are also accepted
,但是我导入时出错。
SampleSheet
文件含有固定的几列,
![](https://img.haomeiwen.com/i14719393/f59d8ed4bd016109.png)
PeakCaller
的选项为– “raw”: text file file; peak score is in fourth column
– “bed”: .bed file; peak score is in fifth column
– “narrow”: default peak.format: narrowPeaks file
– “macs”: MACS .xls file
– “swembl”: SWEMBL .peaks file
– “bayes”: bayesPeak file
– “peakset”: peakset written out using pv.writepeakset
– “fp4”: FindPeaks v4
(详见手册:https://www.bioconductor.org/packages/release/bioc/manuals/DiffBind/man/DiffBind.pdf)
其中我的样本无control的input,所以两列
ControlID
和bamControl
为空。![](https://img.haomeiwen.com/i14719393/87979946fe797b2a.png)
其次bam文件路径仍为
E:\defect\DNA_protein_interaction\GSE55506\Differential_expression\T2N_H3K4me3_sorted.bam
,无需写成R识别的\\
,导入R测试
> dbObj <- dba(sampleSheet="SampleSheet.csv")
trisomy_21 fibroblasts trisomy_21 trisomy_21 trisomy_21 1 narrow
euploid fibroblasts euploid euploid euploid 1 narrow
> dbObj
2 Samples, 33153 sites in matrix (47495 total):
ID Tissue Factor Condition Treatment Replicate Caller Intervals
1 trisomy_21 fibroblasts trisomy_21 trisomy_21 trisomy_21 1 narrow 40820
2 euploid fibroblasts euploid euploid euploid 1 narrow 44391
没问题,证明无control的input也是可行的,但是进行差异分析时报错
> dbObj <- dba.contrast(dbObj, categories=DBA_FACTOR,minMembers = 1)
Error in dba.contrast(dbObj, categories = DBA_FACTOR, minMembers = 1) :
minMembers must be at least 2. Use of replicates strongly advised.
> dbObj <- dba.contrast(dbObj, categories=DBA_FACTOR,minMembers = 2)
Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers.
> dbObj <- dba.analyze(dbObj)
Error in pv.DBA(DBA, method, bSubControl, bFullLibrarySize, bTagwise = bTagwise, :
Unable to perform analysis: no contrasts specified.
In addition: Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers.
> dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION)
Warning message:
No contrasts added. Perhaps try more categories, or lower value for minMembers.
> dbObj <- dba.contrast(dbObj, categories=DBA_CONDITION, minMembers = 1)
Error in dba.contrast(dbObj, categories = DBA_CONDITION, minMembers = 1) :
minMembers must be at least 2. Use of replicates strongly advised.
根据提示,难道一定需要2个以上的重复?待解决......
=================================================================================
去论坛及官网问了下,DiffBind的作者给出了回答,输入的样本DiffBind需要重复
原回答:
Yes, replicates are required to do any kind of statistical analysis. Replicates are required to estimate the variance in the data and calculate confidence statistics such as p-values/FDRs.
Without replicates, you can do some exploratory analysis of overlapping peaks (occupancy analysis). For example using dba.plotVenn(). But not knowing if your data represents an outlier, combined with the inherent noisiness of peak calling, means you will have to have another way to validate any "differential" peaks you identify.
链接:https://support.bioconductor.org/p/125809/#125840
网友评论