美文网首页
test8: 从总fasta文件中提取指定蛋白列表的sub_fa

test8: 从总fasta文件中提取指定蛋白列表的sub_fa

作者: 夕颜00 | 来源:发表于2020-11-27 14:05 被阅读0次

    输入文件:
    1、总fasta文件:rename.fa

    >Gene.2::TRINITY_DN1000_c0_g1::g.2::m.2
    SKYKYAIQRPDFSADDFESFITKHTGISVKSKGSAALDSNSLIKAITGIFAVISIITFVI
    YYHWEKFHHFIYGQNYRHISVHDTIQTGYGRALLQHIIPYQLSRRKMAIAI
    >Gene.3::TRINITY_DN10027_c0_g1::g.3::m.3
    LYLHGLLSPIFPNIGATLNAVAEKPAFTLPAPLMSRLGRRPLLVGTMWLSAVFCVFGSLT
    GKMATATLVQMVCGVLGIFGMAAAFNLLLIYTAELFPTAVRNAALGCVQQAVHFGAIAAP
    MVVMTGGGVALGVFGVCGMVGGVLAVYLPETWNKPLYDTMAGMEEGEKGMVGV*
    >Gene.5::TRINITY_DN10030_c0_g1::g.5::m.5
    RLINMYAAAARSSQVIPPPPRVSAASAAARSSPAIPPPPHTTADRSSPAIPPPPPATAPS
    DSDFTSDSSDSDSPAPNSALHDSILSAYLRTSASSSPDLAKIRSFLSSSVSCCLICLVRI
    RPTDAVWSCSASCHALF
    

    2、指定蛋白列表:pro_list.txt

    Gene.2::TRINITY_DN1000_c0_g1::g.2::m.2
    Gene.3::TRINITY_DN10027_c0_g1::g.3::m.3
    Gene.5::TRINITY_DN10030_c0_g1::g.5::m.5
    Gene.6::TRINITY_DN1005_c0_g2::g.6::m.6
    

    3:输出文件:sub.fa

    指定蛋白列表的fasta文件

    二、代码

    #! /usr/bin/env python
    # _*_ coding: utf-8 _*_
    ##usage:python3 sub_fa.py -fa rename.fa -list pro_list.txt -o sub.fa
    
    import argparse
    
    
    #if len(sys.argv) !=3 :
    #    print("enter 'python3 sub_fa.py -h' for help")
    #    sys.exit()
    
    
    parser = argparse.ArgumentParser(description='manual to this script')
    parser.add_argument('-fa', '--fasta', type=str, default=None, help='all_fasta',required=True)
    parser.add_argument('-list', '--prolist', type=str, default=None, help='proList',required=True)
    parser.add_argument('-o', '--output', type=str, default=None, help='output_path',required=True)
    args = parser.parse_args()
    
    faAll = args.fasta
    pro_list = args.prolist
    fa_w = args.output
    
    #faAll = "E:/Script/python/test8/rename.fa"
    #pro_list = "E:/Script/python/test8/pro_list.txt"
    #fa_w = "E:/Script/python/test8/sub.fa"
    
    pro_1 = open(pro_list,"r")
    pro_list = list(pro_1)
    pro_1_list = [i.replace("\n","") for i in pro_list]
    # print(pro_1_list)
    
    all_fa = open(faAll,"r")
    fa_1_w = open(fa_w,"w")
    l = all_fa.read()
    ls = l.split(">")
    
    for line in ls:
        name = line.split("\n")[0]
        if name in pro_1_list:
            fa_1_w.write(">" + line)
    

    相关文章

      网友评论

          本文标题:test8: 从总fasta文件中提取指定蛋白列表的sub_fa

          本文链接:https://www.haomeiwen.com/subject/vgjxwktx.html