根据第3列的type,提取mRNA及相应exon的信息
思路:
每次读取一行,提取到mRNA特征值后,写出该行;
判断下一行是否具有mRNA或exon特征值,如果有的话,递归自动判断下下一行
import re
import sys
sys.setrecursionlimit(1000000) # 设置最高递归次数
def autoNext(file, out):
content = next(file)
if re.search("\tmRNA\t", content) or re.search(r"\texon\t", content):
out.write(content)
return autoNext(file, out)
with open("genome.gff", "r") as gff:
outGFF = open("mRNA.tmp.gff", "w")
try:
while gff:
line = next(gff)
if re.search(r"\tmRNA\t", line):
outGFF.write(line)
autoNext(gff, outGFF)
outGFF.flush() # 及时清理缓存
except StopIteration: # 防止最后next完最后一行后报错
pass
outGFF.close()
网友评论