2023-08-18bwa批量比对

作者: 麦冬花儿 | 来源:发表于2023-08-17 18:02 被阅读0次

在R中进行蛋白序列与Pfam数据库的批量比对
ggplot-RNA文库reads比对情况-饼图[pie cha
使用bioconda安装常用软件
个性化本地blast
【管理】供应链管理问题与解决方案
批量处理代码行比对生成报告html
Appium手机自动化测试框架搭建
5.序列比对（贯穿所有的生物信息学）
操作亚马逊铺货模式你还在做表格上传吗？别被这个时代抛弃了
2020-01-13 序列比对（二）：算法

进到align目录
对质量好的测序数据进行比对

一个个比对，生成BAM文件

align目录
sample=SRR7696207

bwa mem -t 2 -R "@RG\tID:$sample\tSM:$sample\tLB:WGS\tPL:Illumina" ../hg38/bwa_index/gatk_hg38 ../clean/SRR7696207_1_val_1.fq.gz ../clean/SRR7696207_2_val_2.fq.gz |samtools sort -@ 2 -o SRR7696207.bam -

不用-R参数也可以执行，但后面gatk的时候会报错

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 143150 sequences (20000163 bp)...
[M::process] read 142658 sequences (20000278 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (1, 61056, 1, 1)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (135, 165, 207)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 351)
[M::mem_pestat] mean and std.dev: (174.05, 52.67)
......

2或者循环批量比对方法一

#clean目录
ls *1.fastq.gz > 1
ls *2.fastq.gz > 2
paste 1 2 > config
vim config

增加第一列文件名，记得不能空格，要Tab分隔
如果样本量很多就用脚本，具体见大样本分析那篇
align目录下

INDEX=../hg38/bwa_index/gatk_hg38
cat ../clean/config|while read id
do 
arr=($id)
sample=${arr[0]}
fq1=${arr[1]}
fq2=${arr[2]}
bwa mem -t 5 -R "@RG\tID:$sample\tSM:$sample\tLB:WGS\tPL:Illumina" $INDEX ../clean/$fq1 ../clean/$fq2 |samtools sort  -@ 2 -o $sample.bam -
done &

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 142876 sequences (20000122 bp)...
[M::process] read 142628 sequences (20000141 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 61992, 1, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (137, 174, 219)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 383)
[M::mem_pestat] mean and std.dev: (181.21, 59.08)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 465)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 142876 reads in 24.094 CPU sec, 11.833 real sec

循环比对方法二

bwa index -a bwtsw $reference         #对于大基因组建立FM-Index
 #bwa index -a is ref.fasta             #对小基因组建立index，速度快，内存消耗大
# 2.使用bwa men 比对
mkdir -p ../fina1_bam
cat SRR_Acc_List.txt|while read line 
do 
$bwa mem -t 16 -M -Y  -R "@RG\tID:$line\tPL:ILLUMINA\tLB:$line\tSM:$line" $reference ../bam/${line}.fastq.gz | $samtools view -Sb -t 16 -f bam > ../fina1_bam/${line}.bam &&
echo "** ${line} BWA MEM done **"
# 3.排序
$samtools sort -@ 16 -m 4G  -O bam ../fina1_bam/${line}.bam -o ../fina1_bam/${line}.sorted.bam && \
$samtools index ../fina1_bam/${line}.sorted.bam
echo "** ${line} sorted raw bam file done **"
done

3 查看bam文件

$ samtools view -H SRR8517853.bam |grep -v "SQ"
@HD     VN:1.6  SO:coordinate
@RG     ID:SRR8517853/tSM:SRR8517853    LB:WGS  PL:Illumina
@PG     ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem -t 2 -R @RG\tID:SRR8517853/tSM:SRR8517853\tLB:WGS\tPL:Illumina ../hg38/bwa_index/gatk_hg38 ../clean/SRR8517853_1_val_1.fq.gz ../clean/SRR8517853_2_val_2.fq.gz