目的:
image.png
下面代码:
#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
print(lines)
aDict={}
for i in lines:
i=i.strip()
if i.startswith(">"):
a=i.split()#以空格分割字符串,结果成一个列表,列表的第0个元素就是序列的标题
print(a[0][1:])#去除大于号以外的标题内容打印出来
wname = open(a[0][1:], "w")#每次循环打开一个不同标题命名的文件
else:
print(i)
wname.write(a[0] + '\n' + i)#把标题行和换行符以及序列行写入文件
wname.close()
运行结果:
image.png
这里遇到的一个问题是,文件名不能含有大于号,要注意一下。
我自己从人类基因组提取了一段fasta序列,如下:
>KI270539.1 dna:scaffold scaffold:GRCh38:KI270539.1:1:993:1 REF
GCCCCACGTCCGGGAGGGAGGTGGGGGTGTCAGCCCCCCACCAGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGGGTCAGACCCCCGCCCGGCCAGCCGCCCCGTCCGGGAAGGGAGGGGC
GTCTCTGCCCAGCCACCCCTACTGGGAAGTGAGGAGCTCCTCTGCCGGGCCAGCCACCCC
GTCCGGGAGGGAGGTGGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCGGGAGGG
AGGTGGGGGGGCCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGAGTGAGGGGCGCCT
CTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACTCCTCTGTCCGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGAGGGGTCAGCCCCCCCGCCCGGCCAGACGCCCCGTCTGGGAGGGAGG
TGGGGGGGTCAGCCCCACGTCCGGGAGGGAGGTGTGGGGGGGTCAGCCCCCTGCCAGGCC
AGCCGCCCCGTCCGGGAGGGAGGTGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCG
GGAGGTGAGGGGCGCCTCTGCCCAGCCGCCCCTACTGGGAAGTGAGGAGCCCCTCTGCCG
GGCCAGCCACCCCGTCCGGGAGGGAGGTAGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCC
CCATCCGGGAGGGAGGTGGGGGGTCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGGG
TGAGGGGCGCCTCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACCCCTCTGCCCGGCCA
GCCGCCCCCTCCGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCTGTC
TGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCATGCGGGAGGTGAG
GGGCGCCTCTGCCTGGCCGCCCCTACTAGGAAGTGAGGCGCCCCGCTGCCCGGCCAGCCG
CCCCGTCCGGGAGGGAGGTGGGGGGTCAGCCCT
>KI270385.1 dna:scaffold scaffold:GRCh38:KI270385.1:1:990:1 REF
TTTCATAGAGCATGTTTGAAACACTCTTTCTGTAGTATCTGAAAACGGACATTTCAAGCG
CTTTCAGGCCTATGGTAAGAAAGGAAATATCTTCAAATAAAAACTAGAGAGAAGCATTCT
CAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTCTGATACA
ACATTTTGGAAACACTCTTTTTGTAGAATCTGCAAGTGGATAATTGGATAGCTTTGAAGG
TTTCGTTGGAAACGGGAATATCTTCATATAAAATCAAGACAGAAGCATTCTCAGAAACTT
CTCTGTGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCCTTCATAGAGCAGGTTTG
AAACACTCTTTTTGTAATATTTGGAAGTGGACATTTGCAGCGCTTTGAGGCCTATGTTGA
AAAAGGAAATATCTTCTCCTGAAAACCAGACAGAAGCATTCTCAGAAACTTCCTTGTGAT
GTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAACAGTCT
TTTTGTAGAATCTGGAAGTAGATATTTGGACACCTTTGAGGATTTCTTTGGAAACGGGAT
ATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTC
AATTAGCAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACCCCTTTAGTAGGA
TATGCAAGTTGATATTTAGATAACTAGGAAGATTTCCTTGGAAACGGAATATCTTCATAT
AAATCTAGACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAG
TTGATATTCCCTTTTATAGAGCAGGTTTGAAACACTCTTTCTGCACTTACCTGAAGAAGA
CTTTTGCAGCGCTTTGAGGCCTATGTTGAAAAAGGAATATCTTCCCATAAAACTAAACAG
AGCATTCTCAGAAACTTGTTGTGATGTGTG
>KI270423.1 dna:scaffold scaffold:GRCh38:KI270423.1:1:981:1 REF
AGATTTCGTTGGAACGGGATAAACTTCCCAGAACTACACGGAATCATTCTCAAAAACTTC
ATTGTGATGTTTGCATTCAACTCACAGAGTTGAACCTTGCTTTCATAGTTCAGCTTTCAA
ACACTCTTTTTGTAGAATCTGCTAGTGGATATTTGGACCACTTTGTGGCCTTCCTTCGAA
ACGGGTATATCTTCACATCAAACCTAGACAGAAGCATTCTCAGAATGTTTCCTGTGATGA
CTGCATTCAACTCACAGAGGTGAGCAATCCTGTTGATGGAGCAGTTTTGAAACTCTCTTT
CTTTGGAATCTGCAAGTGGATGTGTGGACCTCTTTGAAGATTTCGTTGGAAACAGGTTCT
TCTTCACAGAAAAACTAAACAGAAGCATTCTCAGAAACTACTTTATGACGTTTGTGTTCA
ACTTGCAGAGTGAAATTTCCTCTTGACAGAGCAGCTATGAAACATTGCTTTTCTTGAATC
TGCAAGTGGACATTTGGAGGGCTTTGAGGCCTGTGGCGGAAACGTTAATATCTGCATATA
AAAACTAGATAGAAGCATTCTGAGAATCTACTTTATGATGATTGCATTCGACTCACAGAG
TTGAACCTTCCAATGGATAGAGCAGTTTGTAAACACTCTTTTTGTAGAATCTGTGATTGC
TGATTTGGACTGCATTGAGGCCTACGGTACTAAAGGAAATAACTTCACCTAAAATCCAAA
CGGAAGCATTCACAGAAAATTCTTTGTGATGATTGGATTGAACTAAGAGAGCTGAACATT
CCTTTAGATGGCGCAGTTTCCAAACACACTTTCTGTAGAATCTGCAGGTGGATATTTGGA
CCTCTCTGAGGATTTCGTTGGAAATGGGATAAACTTCCCAGAACTACACGGAAGCATTCT
CAGAAACTTCTTTGTGATGTTTGCATTCACTCACAGAGTTGAACCTTGCTTTCATAGTTC
AGCTTTCAAACACTCTTTTTG
>KI270392.1 dna:scaffold scaffold:GRCh38:KI270392.1:1:971:1 REF
ATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACCTTTGTTTTGATGCAGCATTTTGG
AAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTTGAAGGTTTCGTTGG
AAACGGGAATATCTTCATATAAAATCAATACAGAAGCCTTCTCAGAAACTTCTCTGTGAT
GTTTGCATTGAACTCACAGAGTTGAACACTTCCTTTCATAGAGCTGGTTTGAAATACTCT
TTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTGTGGTGAAAAAGGAGA
TATCTTCTCCTAAAAACCATACAGAAGCATTCTCAGAATCTTTCTTGTGATGTGTGTACT
CAAGTAACACAGTTGAACCTTCAATTTGACAGAGCAGTTTTGAAGCACTCTTTTTGTAGA
ATCTGCAAGTGGATATTTTGATACCTTTGAGGATTTCGTTGGACACTGGATATCTTCATA
TAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTCAATTAACAG
AGTTGAACCTTTGTTTCGATACAGCATTTTGGAAACATTCCTTTAGTAGAATCTGCAAGT
TGATATTCAGATAGCTAGGAAGATTTCCTTGGAAACGGGAATATCTTCATATAAAATCTA
GACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAGTTGAATA
TTCCCTTTCATAGAGTAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAGTGGACGTTTC
AAGCGCTTTCAGGCCTGTGGTGAAAAAGGAAATATCTTCAAATAAAAATTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACATTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTT
GAAGTTTTCGT
>KI270394.1 dna:scaffold scaffold:GRCh38:KI270394.1:1:970:1 REF
AAGTGGATATTTGGATAGCTTTGAGGATTTCGTTGGAAACGGGATTACATATAAAATCTA
GAGAGAAGCATTCTCAGGAACTTCTTTGTGATGTTTGCATTCAAGTCACAGAACTGAACA
TTCCCTTTCATAGAGCAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAACGGACATTTC
ATACGCTTTCAGGCCTATGGTGAGAAAGGAAATATCTTCAAATAAAAACTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGGTTT
GAAGGTTTCGTTGGAAACGGGAATATCTTCATATAAAATCAACACAGAAGCATTCTCAGA
AACTTCTCTGCGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCTTTCATAGAGCTG
GTTTGAAATACTCTTTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTAT
GTTGAAAATGGAAATATCTTCTCCTAAAAACCAGACAGAAGCATTCTCAGAAACTTCCTT
GTGATGTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAAC
AGTCTTTTTGTAGAATCTGGAAGTAGATATTTGGATACCTTTGAGGATTTCTTTGGAAAC
GGGATATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATG
TCCTCAATTAACAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACTCCTTTAG
TAGAATCTGCAAGTTGATACTTAGATAGGAAGATTTCCTTGGAAACGGGAATATCTTCAT
ATAAAATCTAGACGGAAGCATTCTCGGAAACTTCTTTGTGCTGTATGTCCTCAATAACAG
AGTTGAACCT
网友评论