美文网首页
python实战

python实战

作者: M78_a | 来源:发表于2020-04-05 14:50 被阅读0次

目的:


image.png

下面代码:

#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
print(lines)
aDict={}
for i in lines:
    i=i.strip()
    if i.startswith(">"):
        a=i.split()#以空格分割字符串,结果成一个列表,列表的第0个元素就是序列的标题
        print(a[0][1:])#去除大于号以外的标题内容打印出来
        wname = open(a[0][1:], "w")#每次循环打开一个不同标题命名的文件

    else:
        print(i)
        wname.write(a[0] + '\n' + i)#把标题行和换行符以及序列行写入文件
wname.close()

运行结果:


image.png

这里遇到的一个问题是,文件名不能含有大于号,要注意一下。

我自己从人类基因组提取了一段fasta序列,如下:

>KI270539.1 dna:scaffold scaffold:GRCh38:KI270539.1:1:993:1 REF
GCCCCACGTCCGGGAGGGAGGTGGGGGTGTCAGCCCCCCACCAGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGGGTCAGACCCCCGCCCGGCCAGCCGCCCCGTCCGGGAAGGGAGGGGC
GTCTCTGCCCAGCCACCCCTACTGGGAAGTGAGGAGCTCCTCTGCCGGGCCAGCCACCCC
GTCCGGGAGGGAGGTGGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCGGGAGGG
AGGTGGGGGGGCCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGAGTGAGGGGCGCCT
CTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACTCCTCTGTCCGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGAGGGGTCAGCCCCCCCGCCCGGCCAGACGCCCCGTCTGGGAGGGAGG
TGGGGGGGTCAGCCCCACGTCCGGGAGGGAGGTGTGGGGGGGTCAGCCCCCTGCCAGGCC
AGCCGCCCCGTCCGGGAGGGAGGTGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCG
GGAGGTGAGGGGCGCCTCTGCCCAGCCGCCCCTACTGGGAAGTGAGGAGCCCCTCTGCCG
GGCCAGCCACCCCGTCCGGGAGGGAGGTAGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCC
CCATCCGGGAGGGAGGTGGGGGGTCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGGG
TGAGGGGCGCCTCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACCCCTCTGCCCGGCCA
GCCGCCCCCTCCGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCTGTC
TGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCATGCGGGAGGTGAG
GGGCGCCTCTGCCTGGCCGCCCCTACTAGGAAGTGAGGCGCCCCGCTGCCCGGCCAGCCG
CCCCGTCCGGGAGGGAGGTGGGGGGTCAGCCCT
>KI270385.1 dna:scaffold scaffold:GRCh38:KI270385.1:1:990:1 REF
TTTCATAGAGCATGTTTGAAACACTCTTTCTGTAGTATCTGAAAACGGACATTTCAAGCG
CTTTCAGGCCTATGGTAAGAAAGGAAATATCTTCAAATAAAAACTAGAGAGAAGCATTCT
CAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTCTGATACA
ACATTTTGGAAACACTCTTTTTGTAGAATCTGCAAGTGGATAATTGGATAGCTTTGAAGG
TTTCGTTGGAAACGGGAATATCTTCATATAAAATCAAGACAGAAGCATTCTCAGAAACTT
CTCTGTGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCCTTCATAGAGCAGGTTTG
AAACACTCTTTTTGTAATATTTGGAAGTGGACATTTGCAGCGCTTTGAGGCCTATGTTGA
AAAAGGAAATATCTTCTCCTGAAAACCAGACAGAAGCATTCTCAGAAACTTCCTTGTGAT
GTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAACAGTCT
TTTTGTAGAATCTGGAAGTAGATATTTGGACACCTTTGAGGATTTCTTTGGAAACGGGAT
ATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTC
AATTAGCAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACCCCTTTAGTAGGA
TATGCAAGTTGATATTTAGATAACTAGGAAGATTTCCTTGGAAACGGAATATCTTCATAT
AAATCTAGACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAG
TTGATATTCCCTTTTATAGAGCAGGTTTGAAACACTCTTTCTGCACTTACCTGAAGAAGA
CTTTTGCAGCGCTTTGAGGCCTATGTTGAAAAAGGAATATCTTCCCATAAAACTAAACAG
AGCATTCTCAGAAACTTGTTGTGATGTGTG
>KI270423.1 dna:scaffold scaffold:GRCh38:KI270423.1:1:981:1 REF
AGATTTCGTTGGAACGGGATAAACTTCCCAGAACTACACGGAATCATTCTCAAAAACTTC
ATTGTGATGTTTGCATTCAACTCACAGAGTTGAACCTTGCTTTCATAGTTCAGCTTTCAA
ACACTCTTTTTGTAGAATCTGCTAGTGGATATTTGGACCACTTTGTGGCCTTCCTTCGAA
ACGGGTATATCTTCACATCAAACCTAGACAGAAGCATTCTCAGAATGTTTCCTGTGATGA
CTGCATTCAACTCACAGAGGTGAGCAATCCTGTTGATGGAGCAGTTTTGAAACTCTCTTT
CTTTGGAATCTGCAAGTGGATGTGTGGACCTCTTTGAAGATTTCGTTGGAAACAGGTTCT
TCTTCACAGAAAAACTAAACAGAAGCATTCTCAGAAACTACTTTATGACGTTTGTGTTCA
ACTTGCAGAGTGAAATTTCCTCTTGACAGAGCAGCTATGAAACATTGCTTTTCTTGAATC
TGCAAGTGGACATTTGGAGGGCTTTGAGGCCTGTGGCGGAAACGTTAATATCTGCATATA
AAAACTAGATAGAAGCATTCTGAGAATCTACTTTATGATGATTGCATTCGACTCACAGAG
TTGAACCTTCCAATGGATAGAGCAGTTTGTAAACACTCTTTTTGTAGAATCTGTGATTGC
TGATTTGGACTGCATTGAGGCCTACGGTACTAAAGGAAATAACTTCACCTAAAATCCAAA
CGGAAGCATTCACAGAAAATTCTTTGTGATGATTGGATTGAACTAAGAGAGCTGAACATT
CCTTTAGATGGCGCAGTTTCCAAACACACTTTCTGTAGAATCTGCAGGTGGATATTTGGA
CCTCTCTGAGGATTTCGTTGGAAATGGGATAAACTTCCCAGAACTACACGGAAGCATTCT
CAGAAACTTCTTTGTGATGTTTGCATTCACTCACAGAGTTGAACCTTGCTTTCATAGTTC
AGCTTTCAAACACTCTTTTTG
>KI270392.1 dna:scaffold scaffold:GRCh38:KI270392.1:1:971:1 REF
ATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACCTTTGTTTTGATGCAGCATTTTGG
AAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTTGAAGGTTTCGTTGG
AAACGGGAATATCTTCATATAAAATCAATACAGAAGCCTTCTCAGAAACTTCTCTGTGAT
GTTTGCATTGAACTCACAGAGTTGAACACTTCCTTTCATAGAGCTGGTTTGAAATACTCT
TTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTGTGGTGAAAAAGGAGA
TATCTTCTCCTAAAAACCATACAGAAGCATTCTCAGAATCTTTCTTGTGATGTGTGTACT
CAAGTAACACAGTTGAACCTTCAATTTGACAGAGCAGTTTTGAAGCACTCTTTTTGTAGA
ATCTGCAAGTGGATATTTTGATACCTTTGAGGATTTCGTTGGACACTGGATATCTTCATA
TAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTCAATTAACAG
AGTTGAACCTTTGTTTCGATACAGCATTTTGGAAACATTCCTTTAGTAGAATCTGCAAGT
TGATATTCAGATAGCTAGGAAGATTTCCTTGGAAACGGGAATATCTTCATATAAAATCTA
GACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAGTTGAATA
TTCCCTTTCATAGAGTAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAGTGGACGTTTC
AAGCGCTTTCAGGCCTGTGGTGAAAAAGGAAATATCTTCAAATAAAAATTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACATTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTT
GAAGTTTTCGT
>KI270394.1 dna:scaffold scaffold:GRCh38:KI270394.1:1:970:1 REF
AAGTGGATATTTGGATAGCTTTGAGGATTTCGTTGGAAACGGGATTACATATAAAATCTA
GAGAGAAGCATTCTCAGGAACTTCTTTGTGATGTTTGCATTCAAGTCACAGAACTGAACA
TTCCCTTTCATAGAGCAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAACGGACATTTC
ATACGCTTTCAGGCCTATGGTGAGAAAGGAAATATCTTCAAATAAAAACTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGGTTT
GAAGGTTTCGTTGGAAACGGGAATATCTTCATATAAAATCAACACAGAAGCATTCTCAGA
AACTTCTCTGCGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCTTTCATAGAGCTG
GTTTGAAATACTCTTTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTAT
GTTGAAAATGGAAATATCTTCTCCTAAAAACCAGACAGAAGCATTCTCAGAAACTTCCTT
GTGATGTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAAC
AGTCTTTTTGTAGAATCTGGAAGTAGATATTTGGATACCTTTGAGGATTTCTTTGGAAAC
GGGATATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATG
TCCTCAATTAACAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACTCCTTTAG
TAGAATCTGCAAGTTGATACTTAGATAGGAAGATTTCCTTGGAAACGGGAATATCTTCAT
ATAAAATCTAGACGGAAGCATTCTCGGAAACTTCTTTGTGCTGTATGTCCTCAATAACAG
AGTTGAACCT

相关文章

网友评论

      本文标题:python实战

      本文链接:https://www.haomeiwen.com/subject/idyuphtx.html