美文网首页基本命令
将某一多行的fasta文件转换为单行的fasta文件

将某一多行的fasta文件转换为单行的fasta文件

作者: 今天没回家 | 来源:发表于2021-01-14 21:56 被阅读0次

    处理数据的时候经常遇到将多行fasta转换为单行,或者将很多行fasta串联起来合并为一行,下面是一些方法。

    vi test.fa
    
    >a
    AAAA
    >b
    CCCC
    GGGG
    >c
    AAAA
    GGGGG
    CCCCCC
    TTTTTTTT
    
    目的文件:
    >123
    AGCTAGCTAAAA
    >456
    AAAATTTTCCCCAAAATTTTCCCC
    
    或者
    >a
    AAAACCCCGGGGAAAAGGGGGCCCCCCTTTTTTTT
    

    1、将多行fasta转换为单行的,并保留原来的头

    awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' test.fa
    
    >a
    AAAA
    >b
    CCCCGGGG
    >c
    AAAAGGGGGCCCCCCTTTTTTTTTTTTTTTT
    
    ps : /^>/ { print n $0; n = "" } : 读取第一行时,因为n暂时不存在,所以不会出现第一行为空的情况
    

    ps printf和print的区别:
    printf不换行 print换行

    2、将多行fasta文件转为单行,只保留一个头

    sed '1!{/^>/d;}' test.fa | awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }'
    
    >a
    AAAACCCCGGGGAAAAGGGGGCCCCCCTTTTTTTT
    
    

    3、写成一个小脚本

    vi single_fa.sh
    
    #! /bin/sh
    func() {
        echo "Usage:"
        echo "single_fa.sh [-m mode(many or only)] [-n header name (if -m=only)] [-i input_file] [-o output_file]"
        echo "MODE:"
        echo "many: keep all header"
        echo "only: keep single header"
        exit -1
    }
    
    while getopts :m:n:i:o: varname
    do
       let optnum++
       case $varname in
       m)
          mode="$OPTARG" ;;
       n)
          name="$OPTARG" ;;
       i)
          input="$OPTARG" ;;
       o)
          output="$OPTARG" ;;
       ?) 
          func ;;
       esac
    done
    
    if [ $mode = "many" ];then
         awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }'  $input > $output
    elif test  "$name" ; then
         sed '1!{/^>/d;}' $input | awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }' |sed  "1i\>$name" > $output
    else 
         sed '1!{/^>/d;}' $input | awk '!/^>/ { printf "%s", $0; n = "\n" } /^>/ { print n $0; n = "" } END { printf "%s", n }'  > $output
    fi
    

    运行演示

    sh single_fa.sh -m many -i test.fa -o output.fa
    
    cat output.fa
    >123
    AGCTAGCTAAAA
    >456
    AAAATTTTCCCCAAAATTTTCCCC
    
    sh single_fa.sh -m only -i test.fa -o output.fa
    
    cat output.fa
    >123
    AGCTAGCTAAAAAAAATTTTCCCCAAAATTTTCCCC
    
    sh single_fa.sh -m only -i test.fa -o output.fa -n hahaha
    
    cat output.fa
    >hahaha
    AGCTAGCTAAAAAAAATTTTCCCCAAAATTTTCCCC
    
    

    部分内容参考自
    https://stackoverflow.com/questions/15857088/remove-line-breaks-in-a-fasta-file
    https://unix.stackexchange.com/questions/193246/how-can-i-detect-that-not-enough-options-were-passed-with-getopts

    相关文章

      网友评论

        本文标题:将某一多行的fasta文件转换为单行的fasta文件

        本文链接:https://www.haomeiwen.com/subject/ymcbaktx.html