美文网首页
python实战

python实战

作者: M78_a | 来源:发表于2020-03-31 23:58 被阅读0次

目前为止:笔记内容比较乱,请看的同学,移步到生信宝典。里面内容写得非常好,最近我在练习里面的例题。

目的:写入变量内容到文件

context = '''The best way to learn python contains two steps:
1. Rember basic things mentionded here masterly.
2. Practise with real demands.
'''
方法一:
fh = open("test_file.txt","w")
print(context,file=fh)
fh.close()

for line in open("Test_file.txt"):
    print(line,end="")
fh.close()

for line in open("Test_file.txt"):
    print(line.strip())
fh.close()

方法二:with open as ;print
with open("test_file2.txt","w") as fh:
    print(context,file=fh)
fh.close()

目的:把fasta序列存入字典中,并且多行序列转为一行打印输出

#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()

#首先把序列存入字典
aDict={}
a=""
b=""
for i in lines:
    if i.startswith(">"):
        a=i.strip()
    else:
        b=b+i.strip()

        aDict[a]=b
print(aDict)

#对刚刚建立的字典进行循环,输出
for k,v in aDict.items():
    print(k)
    print(v)

看看效果:


image.png

目的:将fasta文件中的序列按照每80个字母一行的序列输出
思路:
前面已经把fasta序列存入字典中。如果想按照80个字母一行输出,那么自然会用到列表或者字符串的切片。用到切片,那么这个切片的索引值哪里来?而且这个索引值是有规律的:每次都是第二个索引比第一个索引多80。我们就想到range()函数可以提供这个功能。
方法1:字典的值是一个列表。

#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
for i in lines:
    if (i.startswith(">")):
        name = i.rstrip()
        aDict[name] = []
    else:
        i = i.rstrip()
        aDict[name].append(i)
#print(aDict)

for a,b in aDict.items():
    print(a)
    c=''.join(b)#把value给连接起来
    #print(c)
    for n in list(range(0,len(c),80)):
        print(c[n:(n+80)])

方法2:字典的值是一个字符串。我觉得这个更简单些

#!/usr/bin/env python
#fasta序列存入字典,目前最简单理解的方式
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
a=""
b=""
for i in lines:
    if i.startswith(">"):
        a=i.strip()
    else:
        b=b+i.strip()

        aDict[a]=b
#print(aDict)

for k,v in aDict.items():
    print(k)
    #print(v)
    #print(len(v))
    for n in list(range(0, len(v), 80)):
        print(v[n:n+80])

image.png

目的:写程序sortFasta.py, 读入test2.fa, 并取原始序列名字第一个空格前的名字为处理后的序列名字,排序后输出
• 用到的知识点
– sort
– dict
– aDict[key] = []
– aDict[key].append(value)

#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
a=""
b=""
for i in lines:
    if i.startswith(">"):
        a=i.strip().split()[0]
    else:
        b=b+i.strip()

        aDict[a]=b
#print(aDict)

for i in sorted(aDict):#这个sorted函数可以直接对key进行升序,
    print(i)
    print(aDict[i])

image.png
  1. 提取给定名字的序列
    • 写程序grepFasta.py, 提取fasta.name 中名字对应的test2.fa 的序列,并输出到屏幕。
    • 写程序grepFastq.py, 提取fastq.name 中名字对应的test1.fq 的序列,并输出到文件。
    – 用到的知识点
  • print >>fh, or fh.write()
  • 取模运算,4 % 2 == 0
#!/usr/bin/env python
#fasta序列存入字典,目前最简单理解的方式
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
a=""
b=""
for i in lines:
    if i.startswith(">"):
        a=i.strip().split()[0]
    else:
        b=b+i.strip()

        aDict[a]=b
print(aDict)

#按键的顺序输出。对键进行排序
"""
for i in sorted(aDict):
    print(i)
    print(aDict[i])
"""
"""
#取某个键值,打印出来
key = input("请输入ID:")
print(key)
print(aDict.get(key,"not exist"))
"""

#写入文件
key = input("请输入ID:")
print(key)
print(aDict.get(key,"not exist"))#打印出来看看
context = aDict.get(key,"not exist")#提取的序列赋值给一个变量

with open("test_file3.txt","w") as fh:#以写的方式创建文件
    print(context,file=fh)#写入内容
fh.close()

不过好像有的知识点没有用到

练习用的序列来自人类基因组序列的一段:

>KI270539.1 dna:scaffold scaffold:GRCh38:KI270539.1:1:993:1 REF
GCCCCACGTCCGGGAGGGAGGTGGGGGTGTCAGCCCCCCACCAGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGGGTCAGACCCCCGCCCGGCCAGCCGCCCCGTCCGGGAAGGGAGGGGC
GTCTCTGCCCAGCCACCCCTACTGGGAAGTGAGGAGCTCCTCTGCCGGGCCAGCCACCCC
GTCCGGGAGGGAGGTGGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCGGGAGGG
AGGTGGGGGGGCCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGAGTGAGGGGCGCCT
CTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACTCCTCTGTCCGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGAGGGGTCAGCCCCCCCGCCCGGCCAGACGCCCCGTCTGGGAGGGAGG
TGGGGGGGTCAGCCCCACGTCCGGGAGGGAGGTGTGGGGGGGTCAGCCCCCTGCCAGGCC
AGCCGCCCCGTCCGGGAGGGAGGTGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCG
GGAGGTGAGGGGCGCCTCTGCCCAGCCGCCCCTACTGGGAAGTGAGGAGCCCCTCTGCCG
GGCCAGCCACCCCGTCCGGGAGGGAGGTAGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCC
CCATCCGGGAGGGAGGTGGGGGGTCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGGG
TGAGGGGCGCCTCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACCCCTCTGCCCGGCCA
GCCGCCCCCTCCGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCTGTC
TGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCATGCGGGAGGTGAG
GGGCGCCTCTGCCTGGCCGCCCCTACTAGGAAGTGAGGCGCCCCGCTGCCCGGCCAGCCG
CCCCGTCCGGGAGGGAGGTGGGGGGTCAGCCCT
>KI270385.1 dna:scaffold scaffold:GRCh38:KI270385.1:1:990:1 REF
TTTCATAGAGCATGTTTGAAACACTCTTTCTGTAGTATCTGAAAACGGACATTTCAAGCG
CTTTCAGGCCTATGGTAAGAAAGGAAATATCTTCAAATAAAAACTAGAGAGAAGCATTCT
CAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTCTGATACA
ACATTTTGGAAACACTCTTTTTGTAGAATCTGCAAGTGGATAATTGGATAGCTTTGAAGG
TTTCGTTGGAAACGGGAATATCTTCATATAAAATCAAGACAGAAGCATTCTCAGAAACTT
CTCTGTGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCCTTCATAGAGCAGGTTTG
AAACACTCTTTTTGTAATATTTGGAAGTGGACATTTGCAGCGCTTTGAGGCCTATGTTGA
AAAAGGAAATATCTTCTCCTGAAAACCAGACAGAAGCATTCTCAGAAACTTCCTTGTGAT
GTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAACAGTCT
TTTTGTAGAATCTGGAAGTAGATATTTGGACACCTTTGAGGATTTCTTTGGAAACGGGAT
ATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTC
AATTAGCAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACCCCTTTAGTAGGA
TATGCAAGTTGATATTTAGATAACTAGGAAGATTTCCTTGGAAACGGAATATCTTCATAT
AAATCTAGACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAG
TTGATATTCCCTTTTATAGAGCAGGTTTGAAACACTCTTTCTGCACTTACCTGAAGAAGA
CTTTTGCAGCGCTTTGAGGCCTATGTTGAAAAAGGAATATCTTCCCATAAAACTAAACAG
AGCATTCTCAGAAACTTGTTGTGATGTGTG
>KI270423.1 dna:scaffold scaffold:GRCh38:KI270423.1:1:981:1 REF
AGATTTCGTTGGAACGGGATAAACTTCCCAGAACTACACGGAATCATTCTCAAAAACTTC
ATTGTGATGTTTGCATTCAACTCACAGAGTTGAACCTTGCTTTCATAGTTCAGCTTTCAA
ACACTCTTTTTGTAGAATCTGCTAGTGGATATTTGGACCACTTTGTGGCCTTCCTTCGAA
ACGGGTATATCTTCACATCAAACCTAGACAGAAGCATTCTCAGAATGTTTCCTGTGATGA
CTGCATTCAACTCACAGAGGTGAGCAATCCTGTTGATGGAGCAGTTTTGAAACTCTCTTT
CTTTGGAATCTGCAAGTGGATGTGTGGACCTCTTTGAAGATTTCGTTGGAAACAGGTTCT
TCTTCACAGAAAAACTAAACAGAAGCATTCTCAGAAACTACTTTATGACGTTTGTGTTCA
ACTTGCAGAGTGAAATTTCCTCTTGACAGAGCAGCTATGAAACATTGCTTTTCTTGAATC
TGCAAGTGGACATTTGGAGGGCTTTGAGGCCTGTGGCGGAAACGTTAATATCTGCATATA
AAAACTAGATAGAAGCATTCTGAGAATCTACTTTATGATGATTGCATTCGACTCACAGAG
TTGAACCTTCCAATGGATAGAGCAGTTTGTAAACACTCTTTTTGTAGAATCTGTGATTGC
TGATTTGGACTGCATTGAGGCCTACGGTACTAAAGGAAATAACTTCACCTAAAATCCAAA
CGGAAGCATTCACAGAAAATTCTTTGTGATGATTGGATTGAACTAAGAGAGCTGAACATT
CCTTTAGATGGCGCAGTTTCCAAACACACTTTCTGTAGAATCTGCAGGTGGATATTTGGA
CCTCTCTGAGGATTTCGTTGGAAATGGGATAAACTTCCCAGAACTACACGGAAGCATTCT
CAGAAACTTCTTTGTGATGTTTGCATTCACTCACAGAGTTGAACCTTGCTTTCATAGTTC
AGCTTTCAAACACTCTTTTTG
>KI270392.1 dna:scaffold scaffold:GRCh38:KI270392.1:1:971:1 REF
ATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACCTTTGTTTTGATGCAGCATTTTGG
AAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTTGAAGGTTTCGTTGG
AAACGGGAATATCTTCATATAAAATCAATACAGAAGCCTTCTCAGAAACTTCTCTGTGAT
GTTTGCATTGAACTCACAGAGTTGAACACTTCCTTTCATAGAGCTGGTTTGAAATACTCT
TTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTGTGGTGAAAAAGGAGA
TATCTTCTCCTAAAAACCATACAGAAGCATTCTCAGAATCTTTCTTGTGATGTGTGTACT
CAAGTAACACAGTTGAACCTTCAATTTGACAGAGCAGTTTTGAAGCACTCTTTTTGTAGA
ATCTGCAAGTGGATATTTTGATACCTTTGAGGATTTCGTTGGACACTGGATATCTTCATA
TAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTCAATTAACAG
AGTTGAACCTTTGTTTCGATACAGCATTTTGGAAACATTCCTTTAGTAGAATCTGCAAGT
TGATATTCAGATAGCTAGGAAGATTTCCTTGGAAACGGGAATATCTTCATATAAAATCTA
GACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAGTTGAATA
TTCCCTTTCATAGAGTAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAGTGGACGTTTC
AAGCGCTTTCAGGCCTGTGGTGAAAAAGGAAATATCTTCAAATAAAAATTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACATTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTT
GAAGTTTTCGT
>KI270394.1 dna:scaffold scaffold:GRCh38:KI270394.1:1:970:1 REF
AAGTGGATATTTGGATAGCTTTGAGGATTTCGTTGGAAACGGGATTACATATAAAATCTA
GAGAGAAGCATTCTCAGGAACTTCTTTGTGATGTTTGCATTCAAGTCACAGAACTGAACA
TTCCCTTTCATAGAGCAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAACGGACATTTC
ATACGCTTTCAGGCCTATGGTGAGAAAGGAAATATCTTCAAATAAAAACTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGGTTT
GAAGGTTTCGTTGGAAACGGGAATATCTTCATATAAAATCAACACAGAAGCATTCTCAGA
AACTTCTCTGCGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCTTTCATAGAGCTG
GTTTGAAATACTCTTTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTAT
GTTGAAAATGGAAATATCTTCTCCTAAAAACCAGACAGAAGCATTCTCAGAAACTTCCTT
GTGATGTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAAC
AGTCTTTTTGTAGAATCTGGAAGTAGATATTTGGATACCTTTGAGGATTTCTTTGGAAAC
GGGATATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATG
TCCTCAATTAACAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACTCCTTTAG
TAGAATCTGCAAGTTGATACTTAGATAGGAAGATTTCCTTGGAAACGGGAATATCTTCAT
ATAAAATCTAGACGGAAGCATTCTCGGAAACTTCTTTGTGCTGTATGTCCTCAATAACAG
AGTTGAACCT

相关文章

网友评论

      本文标题:python实战

      本文链接:https://www.haomeiwen.com/subject/tzwuuhtx.html