美文网首页
2023-07-18提取fasta文件id并计算碱基数量

2023-07-18提取fasta文件id并计算碱基数量

作者: 麦冬花儿 | 来源:发表于2023-07-17 21:29 被阅读0次
    [train@MiWiFi-R3P-srv 16.scripts]$ head ~/04.genome_assembling/IDBA/illumina.fasta
    >SRR2131197.2 2 length=100
    GGCTCACACAGATATCGCAGAAAGCGCCCGGTGGTCACGTCCCATAACTTGACAAGGCCATCCGAGCCACCCGTGACCATGTAGCGGTCGTTCAGTTGC
    >SRR2131197.2 2 length=100
    CCATGTTCCAAGGCTATACGCATGTGGTTGCCCACTTGCAGCTCTTCGGCGATATGTTAGCAACGGGAAGCAGTGACGGCCGCGTGCTTGTGTATTCGCT
    >SRR2131197.4 4 length=100
    ACGAGTCACAATGCCCGTGCCACGCGGCAGAAAGTCGCGGCCGACAATGTTCTCCAGCACACTACTCTTGCCGGACGACTGGCTACCGAGCACCGTGAT
    >SRR2131197.4 4 length=100
    TAACTCTCCCCCTCCGGGGGCCTCAGAGCTTGTGAATAAGGTGCGTGCGATGTCGGCTAACAGCGCAGCTGCAGGACGCCTTCCATGACGTACGTGAGAG
    >SRR2131197.6 6 length=96
    GGTAGTCATAGTAGGAGTAGTAGTGATAGTAGGAGTCATATTGATAGTCATGGTATTAGTAATAATAATAATAGTAGTAATACTCATAGTGGAAG
    
    

    脚本如下

    #!/usr/bin/perl
    
    open IN, "<", $ARGV[0] or die "Can not open the file $ARGV[0], $!";
    while ( <IN> ) {
        if ( s/^>// ) {
            s/\s.*\n//;
            print "$seq_name\t$length\n";
            $length = 0;
            $seq_name = $_;
        }
        else {
            $length += length($_) - 1;
        }
    }
    print "$seq_name\t$length\n";
    
    [train@MiWiFi-R3P-srv 16.scripts]$ perl c2.pl ~/04.genome_assembling/IDBA/illumina.fasta  | head
        
    SRR2131197.2    99
    SRR2131197.2    100
    SRR2131197.4    99
    SRR2131197.4    100
    SRR2131197.6    95
    SRR2131197.6    96
    SRR2131197.7    99
    SRR2131197.7    100
    SRR2131197.8    99
    

    相关文章

      网友评论

          本文标题:2023-07-18提取fasta文件id并计算碱基数量

          本文链接:https://www.haomeiwen.com/subject/clngudtx.html