美文网首页三代测序技术
hisat2的使用, samtools

hisat2的使用, samtools

作者: vicLeo | 来源:发表于2020-02-16 15:03 被阅读0次

    一、获得fastq文件
    参考:bam文件转换fastq
    PS: 我获得的公司汇报的数据是 .bam 格式的, 所以需要将bam文件转换fastq
    我的bam文件在 /home/lchen/circ/ ;
    讲bam转换为fastqc, 用 bedtools
    我的fastq文件将存放在 /home/lchen/hisat2/rna_seq/

    **语法**
    bamToFastq -i unmapped.bam -fq unmapped.fastq
    实例:
    (rna) lchen@Vostro-5460~$ bamToFastq -i  /home/lchen/circ/c1.bam -fq /home/lchen/hisat2/rna_seq/c1.fastq
    

    参考:Y大宽
    我的fastq文件在 /home/lchen/hisat2/rna_seq/
    我的index在 /home/lchen/reference/hisat2_index/hg38/genome
    比对后得到的sam文件会存放在 /home/lchen/hisat2/aligned/

    构建索引:
    建议直接下载hisat2官网的索引文件
    下载地址:https://ccb.jhu.edu/software/hisat2/index.shtml

    解压缩你会看到
    (rna) lchen@Vostro-5460:~/reference/hisat2_index$ tar -zxvf hg38.tar.gz.gz
    hg38/
    hg38/make_hg38.sh
    hg38/genome.8.ht2
    hg38/genome.5.ht2
    hg38/genome.7.ht2
    hg38/genome.6.ht2
    hg38/genome.4.ht2
    hg38/genome.3.ht2
    hg38/genome.1.ht2
    hg38/genome.2.ht2
    

    或用hisat2-build构建一个:

    hisat2-build -p 20 hg38.fa hg38
    

    -p 20 p为线程, 一般多少核去运行,这个看自己电脑的内存,
    最后,将基因组、转录组、SNP建立索引:

    **语法**
    hisat2-build -p 4 genome.fa --snp genome.snp --ss genome.ss --exon genome.exon genome_snp_tran
    **实例**
    hisat2-build -p 4 hg38.fa --snp hg38.snp --ss hg38e.ss --exon hg38.exon hg38_snp_tran
    

    单端数据比对的用法如下:
    hisat -x hg19 -p 20 -U reads.fq -S align.sam
    x是参考基因组索引文件的前缀, U为单端数据 , S为输出sam文件
    实例

    hisat2 -x /home/lchen/reference/hisat2_index/hg38/genome -p 20 -U /home/lchen/hisat2/rna_seq/c1.fastq -S /home/lchen/hisat2/aligned/alignc1.sam
    50803150 reads; of these:
      50803150 (100.00%) were unpaired; of these:
        2477095 (4.88%) aligned 0 times
        42642733 (83.94%) aligned exactly 1 time
        5683322 (11.19%) aligned >1 times
    95.12% overall alignment rate
    (rna) lchen@Vostro-5460:~$ hisat2 -x /home/lchen/reference/hisat2_index/hg38/genome -p 20 -U /home/lchen/hisat2/rna_seq/c2.fastq -S /home/lchen/hisat2/aligned/alignc2.sam
    51949025 reads; of these:
      51949025 (100.00%) were unpaired; of these:
        2615007 (5.03%) aligned 0 times
        46117658 (88.77%) aligned exactly 1 time
        3216360 (6.19%) aligned >1 times
    94.97% overall alignment rate
    hisat2 -x /home/lchen/reference/hisat2_index/hg38/genome -p 20 -U /home/lchen/hisat2/rna_seq/d1.fastq -S /home/lchen/hisat2/aligned/alignd1.sam
    49316177 reads; of these:
      49316177 (100.00%) were unpaired; of these:
        2458874 (4.99%) aligned 0 times
        42248554 (85.67%) aligned exactly 1 time
        4608749 (9.35%) aligned >1 times
    95.01% overall alignment rate
    
    

    双端数据用法如下:

    hisat -x hg19 -p 20 -1 R1.fq -2 R2.fq -S align.sam

    参考: 酷睿_1991

    接下来
    用samtools将sam格式转位bam格式

    samtools view -S /home/lchen/hisat2/aligned/alignd1.sam -b > d1.bam
    **sort完之后, bam文件至少可以缩小2~3倍**
    samtools sort -@ 5 -o /home/lchen/circ/sorted/d1_sorted.bam /home/lchen/circ/sorted/alignd1.sam
    sort已有的bam文件:
    samtools sort -@ 5 -o /home/lchen/circ/sorted/c1_sorted.bam /home/lchen/circ/c1.bam
    merge所需要的bam:
     samtools merge -@ 6 -c /home/lchen/circ/sorted/merged.bam /home/lchen/circ/sorted/*_sorted.bam
    
    再次sort:
    samtools sort -@ 5 -o /home/lchen/circ/sorted/sorted_merged.bam /home/lchen/circ/sorted/merged.bam 
    再次index一次:
    samtools index /home/lchen/circ/sorted/sorted_merged.bam
    
    remove duplicates:
    samtools rmdup -s /home/lchen/circ/sorted/sorted_merged.bam /home/lchen/circ/sorted/sorted.rmdup.bam
    
    最后一次index;
    samtools index /home/lchen/circ/sorted/sorted.rmdup.bam
    

    到了这一步基本上就处理的差不多了,可以进行call variation了,但是这里还有一步建立索引,这十分的重要!!!!
      必须对bam文件进行排序后,才能进行index,否则会报错。建立索引后将产生后缀为.bai的文件,用于快速的随机处理。很多情况下需要有bai文件的存在,特别是显示序列比对情况下。建立索引很简单。

    参考:十三而舍

     flagstat :
    samtools flagstat /home/lchen/circ/sorted/sorted_merged.bam > /home/lchen/circ/sorted/sorted.flagstat.txt
    depth:
      686  samtools depth /home/lchen/circ/sorted/sorted_merged.bam > /home/lchen/circ/sorted/sorted.depth
    mpileup:
    samtools mpileup -f /home/lchen/reference/samtools_index/hg38.fa  /home/lchen/circ/sorted/sorted_merged.bam > /home/lchen/circ/sorted/sorted.mpileup.txt
    
    

    相关文章

      网友评论

        本文标题:hisat2的使用, samtools

        本文链接:https://www.haomeiwen.com/subject/eycaxhtx.html