输入文件:
1、总fasta文件:rename.fa
>Gene.2::TRINITY_DN1000_c0_g1::g.2::m.2
SKYKYAIQRPDFSADDFESFITKHTGISVKSKGSAALDSNSLIKAITGIFAVISIITFVI
YYHWEKFHHFIYGQNYRHISVHDTIQTGYGRALLQHIIPYQLSRRKMAIAI
>Gene.3::TRINITY_DN10027_c0_g1::g.3::m.3
LYLHGLLSPIFPNIGATLNAVAEKPAFTLPAPLMSRLGRRPLLVGTMWLSAVFCVFGSLT
GKMATATLVQMVCGVLGIFGMAAAFNLLLIYTAELFPTAVRNAALGCVQQAVHFGAIAAP
MVVMTGGGVALGVFGVCGMVGGVLAVYLPETWNKPLYDTMAGMEEGEKGMVGV*
>Gene.5::TRINITY_DN10030_c0_g1::g.5::m.5
RLINMYAAAARSSQVIPPPPRVSAASAAARSSPAIPPPPHTTADRSSPAIPPPPPATAPS
DSDFTSDSSDSDSPAPNSALHDSILSAYLRTSASSSPDLAKIRSFLSSSVSCCLICLVRI
RPTDAVWSCSASCHALF
2、指定蛋白列表:pro_list.txt
Gene.2::TRINITY_DN1000_c0_g1::g.2::m.2
Gene.3::TRINITY_DN10027_c0_g1::g.3::m.3
Gene.5::TRINITY_DN10030_c0_g1::g.5::m.5
Gene.6::TRINITY_DN1005_c0_g2::g.6::m.6
3:输出文件:sub.fa
指定蛋白列表的fasta文件
二、代码
#! /usr/bin/env python
# _*_ coding: utf-8 _*_
##usage:python3 sub_fa.py -fa rename.fa -list pro_list.txt -o sub.fa
import argparse
#if len(sys.argv) !=3 :
# print("enter 'python3 sub_fa.py -h' for help")
# sys.exit()
parser = argparse.ArgumentParser(description='manual to this script')
parser.add_argument('-fa', '--fasta', type=str, default=None, help='all_fasta',required=True)
parser.add_argument('-list', '--prolist', type=str, default=None, help='proList',required=True)
parser.add_argument('-o', '--output', type=str, default=None, help='output_path',required=True)
args = parser.parse_args()
faAll = args.fasta
pro_list = args.prolist
fa_w = args.output
#faAll = "E:/Script/python/test8/rename.fa"
#pro_list = "E:/Script/python/test8/pro_list.txt"
#fa_w = "E:/Script/python/test8/sub.fa"
pro_1 = open(pro_list,"r")
pro_list = list(pro_1)
pro_1_list = [i.replace("\n","") for i in pro_list]
# print(pro_1_list)
all_fa = open(faAll,"r")
fa_1_w = open(fa_w,"w")
l = all_fa.read()
ls = l.split(">")
for line in ls:
name = line.split("\n")[0]
if name in pro_1_list:
fa_1_w.write(">" + line)
网友评论