美文网首页大杂烩,味道好
通过简单数据熟悉Linux下生物信息学各种操作2

通过简单数据熟悉Linux下生物信息学各种操作2

作者: Y大宽 | 来源:发表于2019-07-01 08:54 被阅读0次

    原地址
    几点说明
    1.非简单翻译,所有代码均可运行,为了辅助理解,基本每步代码都有结果,需要比较的进行了整合
    2.原文中的软件都下载最新版本
    3.原文中有少量代码是错误的,这里进行了修正
    4.对于需要的一些知识背景,在这里进行了注释或链接到他人博客


    一共4部分
    通过简单数据熟悉Linux下生物信息学各种操作1
    通过简单数据熟悉Linux下生物信息学各种操作2
    通过简单数据熟悉Linux下生物信息学各种操作3
    通过简单数据熟悉Linux下生物信息学各种操作4


    11安装使用bwa(mac)

    11.1安装编译并创建到bin目录的链接

    cat src
    curl -OL http://downloads.sourceforge.net/project/bio-bwa/bwa-0.7.17.tar.bz2
    tar jxvf bwa-0.7.17.tar.bz2
    cd bwa-0.7.17/
    make
    ln -s ~/src/bwa-0.7.17/bwa ~/bin
    

    11.2 制作index

    创建文件夹放references
    获取1999爆发ebola的参考基因组

    mkdir -p ~/refs/852
    efetch -db nucleotide -id NC_002549 -format fasta > ~/refs/852/ebola-1999.fa
    bwa index ~/refs/852/ebola-1999.fa
    ls ~/refs/852
    
    ebola-1999.fa       ebola-1999.fa.ann   ebola-1999.fa.pac
    ebola-1999.fa.amb   ebola-1999.fa.bwt   ebola-1999.fa.sa
    

    通过blast也可以搜寻到

    makeblastdb -in ~/refs/852/ebola-1999.fa -dbtype nucl -parse_seqids
    ls ~/refs/852
    
    ebola-1999.fa       ebola-1999.fa.nhr   ebola-1999.fa.nsi
    ebola-1999.fa.amb   ebola-1999.fa.nin   ebola-1999.fa.nsq
    ebola-1999.fa.ann   ebola-1999.fa.nog   ebola-1999.fa.pac
    ebola-1999.fa.bwt   ebola-1999.fa.nsd   ebola-1999.fa.sa
    

    共有16个文件,这也是为什么刚才创建单独文件夹的原因
    获取ebolas 基因组的前1行作为query序列

    head -2 ~/refs/852/ebola-1999.fa > query.fa
    

    现在用bwa-mem把上面的query map到它自己的基因组上去

    cat results.sam 
    
    @SQ SN:NC_002549.1  LN:18959
    @PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem /Users/ucco/refs/852/ebola-1999.fa query.fa
    NC_002549.1 0   NC_002549.1 1   60  70M *   0   0   CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA  *   NM:i:0  MD:Z:70 AS:i:70 XS:i:0
    

    比较用上面同样序列的blasting结果

    blastn -db ~/refs/852/ebola-1999.fa -query query.fa > results.blastn
    
    Query= NC_002549.1 Zaire ebolavirus isolate Ebola
    virus/H.sapiens-tc/COD/1976/Yambuku-Mayinga, complete genome
    
    Length=70
                                                                          Score        E
    Sequences producing significant alignments:                          (Bits)     Value
    
    NC_002549.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-tc/COD...  130        6e-34
    
    
    >NC_002549.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-tc/COD/1976/Yambuku-Mayinga, 
    complete genome
    Length=18959
    
     Score = 130 bits (70),  Expect = 6e-34
     Identities = 70/70 (100%), Gaps = 0/70 (0%)
     Strand=Plus/Plus
    
    Query  1   CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGA  60
               ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
    Sbjct  1   CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGA  60
    
    Query  61  AGATTAATAA  70
               ||||||||||
    Sbjct  61  AGATTAATAA  70
    
    
    
    Lambda      K        H
        1.33    0.621     1.12 
    
    Gapped
    Lambda      K        H
        1.28    0.460    0.850 
    
    Effective search space used: 1079922
    
    
      Database: /Users/ucco/refs/852/ebola-1999.fa
    

    12理解SAM格式

    现在vi query.fa,增加或删除几个bases,然后进行比对
    比如在第6个碱基开始添加一串T

    cat query.fa 
    >NC_002549.1 Zaire ebolavirus isolate Ebola virus/H.sapiens-tc/COD/1976/Yambuku-Mayinga, complete genome
    CGGACTTTTTTTTTACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA
    

    再次运行比对程序,结果如下

    @SQ SN:NC_002549.1  LN:18959
    @PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem /Users/ucco/refs/852/ebola-1999.fa query.fa
    NC_002549.1 0   NC_002549.1 6   60  14S65M  *   0   CGGACTTTTTTTTTACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAANM:i:0   MD:Z:65 AS:i:65 XS:i:0
    #之前没有修改bases的序列比对结果
    @SQ SN:NC_002549.1  LN:18959
    @PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem /Users/ucco/refs/852/ebola-1999.fa query.fa
    NC_002549.1 0   NC_002549.1 1   60  70M *   0   0   CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA  *   NM:i:0  MD:Z:70 AS:i:70 XS:i:0
    

    12.1切片操作看特定列

    为了查看特定列,可以进行cut操作,要深入理解每列的意义
    query id,alignment flag and target id

    cat results.sam |cut -f 1,2,3
    
    @SQ SN:NC_002549.1  LN:18959
    @PG ID:bwa  PN:bwa
    NC_002549.1 0   NC_002549.1
    

    比对的start,mapping quality,CIGAR string

    cat results.sam |cut -f 4,5,6
    
    VN:0.7.17-r1188 CL:bwa mem /Users/ucco/refs/852/ebola-1999.fa query.fa
    1   60  70M
    

    paired end read information,name ,position and length of template

    cat results.sam |cut -f 7,8,9
    
    
    *   0   0
    

    query sequence, query quality, other attributes

    cat results.sam |cut -f 10-14
    
    
    CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA  *   NM:i:0  MD:Z:70 AS:i:70
    

    12.2 同时比对两个序列

    在这里下载示例数据

    curl -O http://www.personal.psu.edu/iua1/courses/files/2014/data-14.tar.gz
    tar xzvf data-14.tar.gz
    

    会产生两个名字为read1.fq和read2.fq的文件
    用bwa比对

    bwa mem ~/refs/852/ebola-1999.fa read1.fq read2.fq > results.sam
    

    部分结果如下

    @SQ SN:NC_002549.1  LN:18959
    @PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem /Users/ucco/refs/852/ebola-1999.fa read1.fq read2.fq
    gi|10313991|ref|NC_002549.1|_17166_17748_1:1:0_0:0:0_0  83  NC_002549.1 17679   60  70M =   17166   -583    AATTCAACGAAGCCCATACTGGCTAAGTCATTTAACTCAGTATGCTGACTGTGAGTTACATTTAAGTTAT  2222222222222222222222222222222222222222222222222222222222222222222222  NM:i:0  MD:Z:70 MC:Z:70M    AS:i:70 XS:i:0
    gi|10313991|ref|NC_002549.1|_17166_17748_1:1:0_0:0:0_0  163 NC_002549.1 17166   60  70M =   17679   583 CAGTCTAGAGTCAGAAATAGTATCAGGAATGACTACTCCTAGGATGCTTCTACCCGTTATGTCAAAATTC  2222222222222222222222222222222222222222222222222222222222222222222222  NM:i:2  MD:Z:4A49T15    MC:Z:70M    AS:i:6XS:i:0
    gi|10313991|ref|NC_002549.1|_4818_5334_1:0:0_3:0:0_1    99  NC_002549.1 4818    60  70M =   5265    517 GCCATCATGCTTGCTTCATACACTATCACCCAATTCGGCAAGGCAACCAATCCACTTGTCAGAGTCAATC  2222222222222222222222222222222222222222222222222222222222222222222222  NM:i:1  MD:Z:32T37  MC:Z:70M    AS:i:6XS:i:0
    gi|10313991|ref|NC_002549.1|_4818_5334_1:0:0_3:0:0_1    147 NC_002549.1 5265    60  70M =   4818    -517    GTGCCAGAAACTCTGGTCCTCAAGCTGACCGGTAAGAAGGTGACTTCTAAAATTCGACAACCAATCATCC  2222222222222222222222222222222222222222222222222222222222222222222222  NM:i:3  MD:Z:19A32A1G15 MC:Z:70M    AS:i:5XS:i:0
    gi|10313991|ref|NC_002549.1|_8927_9349_3:0:0_3:0:0_2    83  NC_002549.1 9280    60  70M =   8927    -423    GGTTCAGGGTTAAGAACATTGGTTCCTCAATCAGATAATGAGGAAGCTTCAACCAAACCGGGGAAATGCT  2222222222222222222222222222222222222222222222222222222222222222222222  NM:i:3  MD:Z:1T54C7C5   MC:Z:70M    AS:i:5XS:i:0
    

    为了简单明了表示,把上述的qurey加T后一起运行

     cp query.fa query_addT.fa
    vi query_addT.fa
     bwa mem ~/refs/852/ebola-1999.fa query.fa query_addT.fa> xresults.sam
    

    结果如下

    @SQ SN:NC_002549.1  LN:18959
    @PG ID:bwa  PN:bwa  VN:0.7.17-r1188 CL:bwa mem /Users/ucco/refs/852/ebola-1999.fa query.fa query_addT.fa
    NC_002549.1 65  NC_002549.1 1   60  70M =   6   6   CGGACACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA  *   NM:i:0  MD:Z:70 MC:Z:15S65M AS:i:70 XS:i:0
    NC_002549.1 129 NC_002549.1 6   60  15S65M  =   1   -6  CGGACTTTTTTTTTTACACAAAAAGAAAGAAGAATTTTTAGGATCTTTTGTGTGCGAATAACTATGAGGAAGATTAATAA    *   NM:i:0  MD:Z:65 MC:Z:70M    AS:i:65 XS:i:0
    

    13 BAM 文件和samtools

    安装短的read simulator:wgsim

    cd ~/src
    git clone https://github.com/lh3/wgsim.git
    cd wgsim
    gcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm
    ln -s ~/src/wgsim/wgsim ~/bin/wgsim
    #Check that it works
    wgsim
    
    Program: wgsim (short read simulator)
    Version: 1.9
    Contact: Heng Li <lh3@sanger.ac.uk>
    
    Usage:   wgsim [options] <in.ref.fa> <out.read1.fq> <out.read2.fq>
    
    Options: -e FLOAT      base error rate [0.020]
             -d INT        outer distance between the two ends [500]
             -s INT        standard deviation [50]
             -N INT        number of read pairs [1000000]
             -1 INT        length of the first read [70]
             -2 INT        length of the second read [70]
             -r FLOAT      rate of mutations [0.0010]
             -R FLOAT      fraction of indels [0.15]
             -X FLOAT      probability an indel is extended [0.30]
             -S INT        seed for random generator [0, use the current time]
             -A FLOAT      discard if the fraction of ambiguous bases higher than FLOAT [0.05]
             -h            haplotype mode
    

    下载并安装samtools包,并且链接到~/bin 目录

    cd ~/src
    curl -OL http://sourceforge.net/projects/samtools/files/samtools/1.9/samtools-1.1.tar.bz2
    tar jxvf samtools-1.9.tar.bz2
    cd samtools-1.9
    make
    ln -s ~/src/samtools-1.9/samtools ~/bin/
    

    现在从参考基因组生成短reads,然后再map到这个基因组上

    mkdir ~/edu/lec15
    cd ~/edu/lec15
    
    wgsim -N 10000 ~/refs/852/ebola-1999.fa read1.fq read2.fq > mutations.txt
    
    [wgsim] seed = 1561981271
    [wgsim_core] calculating the total length of the reference sequence...
    [wgsim_core] 1 sequences, total length: 18959
    

    比对

    bwa mem ~/refs/852/ebola-1999.fa read1.fq read2.fq > results.sam
    
    [M::bwa_idx_load_from_disk] read 0 ALT contigs
    [M::process] read 20000 sequences (1400000 bp)...
    [M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 9950, 0, 0)
    [M::mem_pestat] skip orientation FF as there are not enough pairs
    [M::mem_pestat] analyzing insert size distribution for orientation FR...
    [M::mem_pestat] (25, 50, 75) percentile: (466, 500, 534)
    [M::mem_pestat] low and high boundaries for computing mean and std.dev: (330, 670)
    [M::mem_pestat] mean and std.dev: (499.84, 50.02)
    [M::mem_pestat] low and high boundaries for proper pairs: (262, 738)
    [M::mem_pestat] skip orientation RF as there are not enough pairs
    [M::mem_pestat] skip orientation RR as there are not enough pairs
    [M::mem_process_seqs] Processed 20000 reads in 0.537 CPU sec, 0.539 real sec
    [main] Version: 0.7.17-r1188
    [main] CMD: bwa mem /Users/ucco/refs/852/ebola-1999.fa read1.fq read2.fq
    [main] Real time: 1.111 sec; CPU: 0.606 sec
    

    转变为BAM文件

    samtools view -Sb results.sam > temp.bam
    # Sort the alignment.
    samtools sort temp.bam -oresults.bam
    #index the aligment
    samtools index results.bam
    

    用samtools进行过滤
    详细用法参考samtools常用命令详解
    -f match the flag:保留match flags的reads
    -F filter the flags:删除match flags的reads,保留剩余部分
    首先,保留比对到reverse链上的reads

    samtools view -f 16 results.bam
    

    然后,删除比对到forward链上的reads

    samtools view -F 16 results.bam
    

    对flag counts进行计数,下面两个命令都可以

    samtools view -F 16 results.bam |wc -l
    samtools view -c -F 16 results.bam
    
    10000
    

    关于samtools flag代表的意义请参考
    samtools flag
    根据质量进行过滤。如果一个reads可以map到多个位点,那么质量为0,否则>=1代表唯一匹配

    # uniquely mapping reads.
    samtools view -c -q 1 results.bam
    # Count the high quality alignment.
    samtools view -c -q 40 results.bam
    

    14 用ReadSeq转换数据

    14.1安装ReadSeq

    mkdir -p readseq
    cd ~/src/readseq
    curl -OL http://iubio.bio.indiana.edu/soft/molbio/readseq/java/readseq.jar
    

    我打不开这个网站,然后google下载的.
    因为readseq调用的方式和前面安装的trimmomatic相似,所以cp

    cp ~/bin/trimmomatic ~/bin/readseq
    vi ~/bin/readseq
    

    改成下图即可

    java -jar ~/src/readseq/readseq.jar $@
    
    vi_readseq

    14.2获取1999 ebola基因组的full genbank 记录

    http://www.ncbi.nlm.nih.gov/genome/4887
    进入lec文件夹,随便你以前用其他名字命名的
    获取complete data

    efetch -db nucleotide -id NC_002549.1 -format gb > NC.gb
    

    格式为GFF(Generic Feature Format)文件

    readseq -format=GFF NC.gb
    # 也可以设定输出名字
    readseq -format=GFF -o NC-all.gff NC.gb
    

    会有以下几个文件


    4files

    提取到fasta 文件

    readseq -format=FASTA -o NC.fa NC.gb
    

    先查看一下gff文件

    cat NC-all.gff |head
    ##gff-version 2
    # seqname   source  feature start   end score   strand  frame   attributes
    NC_002549   -   source  1   18959   .   +   .   organism "Zaire ebolavirus" ; mol_type "viral cRNA" ; isolate "Ebola virus/H.sapiens-tc/COD/1976/Yambuku-Mayinga" ; db_xref "taxon:186538"
    NC_002549   -   5'UTR   1   55  .   +   .   note "leader region" ; citation "[1]" ; function "regulation or initiation of RNA replication"
    NC_002549   -   gene    56  3026    .   +   .   gene "NP" ; locus_tag "ZEBOVgp1" ; db_xref "GeneID:911830"
    NC_002549   -   mRNA    56  3026    .   +   .   gene "NP" ; locus_tag "ZEBOVgp1" ; product "nucleoprotein" ; db_xref "GeneID:911830"
    NC_002549   -   regulatory  56  67  .   +   .   regulatory_class "other" ; gene "NP" ; locus_tag "ZEBOVgp1" ; note "putative; transcription start signal" ; citation "[1]"
    NC_002549   -   CDS 470 2689    .   +   .   gene "NP" ; locus_tag "ZEBOVgp1" ; function "encapsidation of genomic RNA" ; codon_start 1 ; product "nucleoprotein" ; protein_id "NP_066243.1" ; db_xref "GeneID:911830" ; translation "MDSRPQKIWMAPSLTESDMDYHKILTAGLSVQQGIVRQRVIPVYQVNNLEEICQLIIQAFEAGVDFQESADSFLLMLCLHHAYQGDYKLFLESGAVKYLEGHGFRFEVKKRDGVKRLEELLPAVSSGKNIKRTLAAMPEEETTEANAGQFLSFASLFLPKLVVGEKACLEKVQRQIQVHAEQGLIQYPTAWQSVGHMMVIFRLMRTNFLIKFLLIHQGMHMVAGHDANDAVISNSVAQARFSGLLIVKTVLDHILQKTERGVRLHPLARTAKVKNEVNSFKAALSSLAKHGEYAPFARLLNLSGVNNLEHGLFPQLSAIALGVATAHGSTLAGVNVGEQYQQLREAATEAEKQLQQYAESRELDHLGLDDQEKKILMNFHQKKNEISFQQTNAMVTLRKERLAKLTEAITAASLPKTSGHYDDDDDIPFPGPINDDDNPGHQDDDPTDSQDTTIPDVVVDPDDGSYGEYQSYSENGMNAPDDLVLFDLDEDDEDTKPVPNRSTKGGQQKNSQKGQHIEGRQTQSRPIQNVPGPHRTIHHASAPLTDNDRRNEPSGSTSPRMLTPINEEADPLDDADDETSSLPPLESDDEEQDRDGTSNRTPTVAPPAPVYRDHSEKKELPQDEQQDQDHTQEARNQDSDNTQSEHSFEEMYRHILRSQGPFDAVLYYHMMKDEPVVFSTSDGKEYTYPDSLEEEYPPWLTEKEAMNEENRFVTLDGQQFYWPVMNHKNKFMAILQHHQ"
    NC_002549   -   regulatory  3015    3026    .   +   .   regulatory_class "polyA_signal_sequence" ; gene "NP" ; locus_tag "ZEBOVgp1"
    NC_002549   -   misc_feature    3027    3031    .   +   .   note "intergenic region"
    

    只获取gff文件中的gene rows

    cat NC-all.gff |egrep '\tgene\t'
    
    NC_002549   -   gene    56  3026    .   +   .   gene "NP" ; locus_tag "ZEBOVgp1" ; db_xref "GeneID:911830"
    NC_002549   -   gene    3032    4407    .   +   .   gene "VP35" ; locus_tag "ZEBOVgp2" ; db_xref "GeneID:911827"
    NC_002549   -   gene    4390    5894    .   +   .   gene "VP40" ; locus_tag "ZEBOVgp3" ; db_xref "GeneID:911825"
    NC_002549   -   gene    5900    8305    .   +   .   gene "GP" ; locus_tag "ZEBOVgp4" ; db_xref "GeneID:911829"
    NC_002549   -   gene    8288    9740    .   +   .   gene "VP30" ; locus_tag "ZEBOVgp5" ; db_xref "GeneID:911826"
    NC_002549   -   gene    9885    11518   .   +   .   gene "VP24" ; locus_tag "ZEBOVgp6" ; note "putative" ; db_xref "GeneID:911828"
    NC_002549   -   gene    11501   18282   .   +   .   gene "L" ; locus_tag "ZEBOVgp7" ; db_xref "GeneID:911824"
    

    实际上我们还想要前两行的注释行

    cat NC-all.gff |head -2 > NC-genes.gff
    cat NC-all.gff | egrep '\tgene\t' >> NC-genes.gff
    
    cat NC-genes.gff 
    
    ##gff-version 2
    # seqname   source  feature start   end score   strand  frame   attributes
    NC_002549   -   gene    56  3026    .   +   .   gene "NP" ; locus_tag "ZEBOVgp1" ; db_xref "GeneID:911830"
    NC_002549   -   gene    3032    4407    .   +   .   gene "VP35" ; locus_tag "ZEBOVgp2" ; db_xref "GeneID:911827"
    NC_002549   -   gene    4390    5894    .   +   .   gene "VP40" ; locus_tag "ZEBOVgp3" ; db_xref "GeneID:911825"
    NC_002549   -   gene    5900    8305    .   +   .   gene "GP" ; locus_tag "ZEBOVgp4" ; db_xref "GeneID:911829"
    NC_002549   -   gene    8288    9740    .   +   .   gene "VP30" ; locus_tag "ZEBOVgp5" ; db_xref "GeneID:911826"
    NC_002549   -   gene    9885    11518   .   +   .   gene "VP24" ; locus_tag "ZEBOVgp6" ; note "putative" ; db_xref "GeneID:911828"
    NC_002549   -   gene    11501   18282   .   +   .   gene "L" ; locus_tag "ZEBOVgp7" ; db_xref "GeneID:911824"
    NC_002549   -   gene    56  3026    .   +   .   gene "NP" ; locus_tag "ZEBOVgp1" ; db_xref "GeneID:911830"
    NC_002549   -   gene    3032    4407    .   +   .   gene "VP35" ; locus_tag "ZEBOVgp2" ; db_xref "GeneID:911827"
    NC_002549   -   gene    4390    5894    .   +   .   gene "VP40" ; locus_tag "ZEBOVgp3" ; db_xref "GeneID:911825"
    NC_002549   -   gene    5900    8305    .   +   .   gene "GP" ; locus_tag "ZEBOVgp4" ; db_xref "GeneID:911829"
    NC_002549   -   gene    8288    9740    .   +   .   gene "VP30" ; locus_tag "ZEBOVgp5" ; db_xref "GeneID:911826"
    NC_002549   -   gene    9885    11518   .   +   .   gene "VP24" ; locus_tag "ZEBOVgp6" ; note "putative" ; db_xref "GeneID:911828"
    NC_002549   -   gene    11501   18282   .   +   .   gene "L" ; locus_tag "ZEBOVgp7" ; db_xref "GeneID:911824"
    

    染色体坐标不匹配,需要修复
    重新索引并且重新比对

    cp NC.fa ~/refs/852/
    bwa index ~/refs/852/NC.fa
    
    [bwa_index] Pack FASTA... 0.00 sec
    [bwa_index] Construct BWT for the packed sequence...
    [bwa_index] 0.01 seconds elapse.
    [bwa_index] Update BWT... 0.00 sec
    [bwa_index] Pack forward-only FASTA... 0.00 sec
    [bwa_index] Construct SA from BWT and Occ... 0.00 sec
    [main] Version: 0.7.17-r1188
    [main] CMD: bwa index /Users/ucco/refs/852/NC.fa
    [main] Real time: 0.113 sec; CPU: 0.015 sec
    

    再align
    注意我的命名有点乱,最好去完全按原文中的命名
    下面的align.sh我没找到
    但是aligh就是比对索引那些一起写到一个脚本里了,改天写下来。

    cp ../lec4/*.fq .
    bash align.sh read1.fq read2.fq results.bam
    

    载入IGV,看100-150bp区域的深度

    相关文章

      网友评论

        本文标题:通过简单数据熟悉Linux下生物信息学各种操作2

        本文链接:https://www.haomeiwen.com/subject/lgewcctx.html