对于双端测序而言,事情就变得简单了。bam文件的第9列直接存储了fragment length的信息,直接提取之后就可以用来画图,提取的代码如下
从bam文件中提取插入片段长度
samtools view input.bam | \
awk -F '\t' 'function abs(x){return ((x < 0.0) ? -x : x)} {print $1"\t"abs($9)}' | \
sort | uniq | cut -f2 > fragment.length.txt
用R作图
data <- read.table("fragment.length.txt", header = F)
# 设置插入片段长度的阈值,过滤掉太长的片段
length_cutoff <- 1200
fragment <- data$V1[data$V1 <= length_cutoff]
# 利用直方图统计频数分布,设置柱子个数
breaks_num <- 500
res <- hist(fragment, breaks = breaks_num, plot = FALSE)
# 添加坐标原点
plot(x = c(0, res$breaks),
y = c(0, 0, res$counts) / 10^2,
type = "l", col = "red",
xlab = "Fragment length(bp)",
ylab = expression(Normalized ~ read ~ density ~ 10^2),
main = "Sample Fragment sizes")
参考https://blog.csdn.net/weixin_43569478/article/details/108079757
网友评论