目前为止:笔记内容比较乱,请看的同学,移步到生信宝典。里面内容写得非常好,最近我在练习里面的例题。
目的:写入变量内容到文件
context = '''The best way to learn python contains two steps:
1. Rember basic things mentionded here masterly.
2. Practise with real demands.
'''
方法一:
fh = open("test_file.txt","w")
print(context,file=fh)
fh.close()
for line in open("Test_file.txt"):
print(line,end="")
fh.close()
for line in open("Test_file.txt"):
print(line.strip())
fh.close()
方法二:with open as ;print
with open("test_file2.txt","w") as fh:
print(context,file=fh)
fh.close()
目的:把fasta序列存入字典中,并且多行序列转为一行打印输出
#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#首先把序列存入字典
aDict={}
a=""
b=""
for i in lines:
if i.startswith(">"):
a=i.strip()
else:
b=b+i.strip()
aDict[a]=b
print(aDict)
#对刚刚建立的字典进行循环,输出
for k,v in aDict.items():
print(k)
print(v)
看看效果:

目的:将fasta文件中的序列按照每80个字母一行的序列输出
思路:
前面已经把fasta序列存入字典中。如果想按照80个字母一行输出,那么自然会用到列表或者字符串的切片。用到切片,那么这个切片的索引值哪里来?而且这个索引值是有规律的:每次都是第二个索引比第一个索引多80。我们就想到range()函数可以提供这个功能。
方法1:字典的值是一个列表。
#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
for i in lines:
if (i.startswith(">")):
name = i.rstrip()
aDict[name] = []
else:
i = i.rstrip()
aDict[name].append(i)
#print(aDict)
for a,b in aDict.items():
print(a)
c=''.join(b)#把value给连接起来
#print(c)
for n in list(range(0,len(c),80)):
print(c[n:(n+80)])
方法2:字典的值是一个字符串。我觉得这个更简单些
#!/usr/bin/env python
#fasta序列存入字典,目前最简单理解的方式
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
a=""
b=""
for i in lines:
if i.startswith(">"):
a=i.strip()
else:
b=b+i.strip()
aDict[a]=b
#print(aDict)
for k,v in aDict.items():
print(k)
#print(v)
#print(len(v))
for n in list(range(0, len(v), 80)):
print(v[n:n+80])

目的:写程序sortFasta.py, 读入test2.fa, 并取原始序列名字第一个空格前的名字为处理后的序列名字,排序后输出
• 用到的知识点
– sort
– dict
– aDict[key] = []
– aDict[key].append(value)
#!/usr/bin/env python
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
a=""
b=""
for i in lines:
if i.startswith(">"):
a=i.strip().split()[0]
else:
b=b+i.strip()
aDict[a]=b
#print(aDict)
for i in sorted(aDict):#这个sorted函数可以直接对key进行升序,
print(i)
print(aDict[i])

- 提取给定名字的序列
• 写程序grepFasta.py, 提取fasta.name 中名字对应的test2.fa 的序列,并输出到屏幕。
• 写程序grepFastq.py, 提取fastq.name 中名字对应的test1.fq 的序列,并输出到文件。
– 用到的知识点
- print >>fh, or fh.write()
- 取模运算,4 % 2 == 0
#!/usr/bin/env python
#fasta序列存入字典,目前最简单理解的方式
import sys
fname = sys.argv[1]
f = open(fname)
lines = f.readlines()
#print(lines)
aDict={}
a=""
b=""
for i in lines:
if i.startswith(">"):
a=i.strip().split()[0]
else:
b=b+i.strip()
aDict[a]=b
print(aDict)
#按键的顺序输出。对键进行排序
"""
for i in sorted(aDict):
print(i)
print(aDict[i])
"""
"""
#取某个键值,打印出来
key = input("请输入ID:")
print(key)
print(aDict.get(key,"not exist"))
"""
#写入文件
key = input("请输入ID:")
print(key)
print(aDict.get(key,"not exist"))#打印出来看看
context = aDict.get(key,"not exist")#提取的序列赋值给一个变量
with open("test_file3.txt","w") as fh:#以写的方式创建文件
print(context,file=fh)#写入内容
fh.close()
不过好像有的知识点没有用到
练习用的序列来自人类基因组序列的一段:
>KI270539.1 dna:scaffold scaffold:GRCh38:KI270539.1:1:993:1 REF
GCCCCACGTCCGGGAGGGAGGTGGGGGTGTCAGCCCCCCACCAGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGGGTCAGACCCCCGCCCGGCCAGCCGCCCCGTCCGGGAAGGGAGGGGC
GTCTCTGCCCAGCCACCCCTACTGGGAAGTGAGGAGCTCCTCTGCCGGGCCAGCCACCCC
GTCCGGGAGGGAGGTGGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCGGGAGGG
AGGTGGGGGGGCCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGAGTGAGGGGCGCCT
CTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACTCCTCTGTCCGGCCAGCCGCCCCGTCC
GGGAGGGAGGTGGAGGGGTCAGCCCCCCCGCCCGGCCAGACGCCCCGTCTGGGAGGGAGG
TGGGGGGGTCAGCCCCACGTCCGGGAGGGAGGTGTGGGGGGGTCAGCCCCCTGCCAGGCC
AGCCGCCCCGTCCGGGAGGGAGGTGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCG
GGAGGTGAGGGGCGCCTCTGCCCAGCCGCCCCTACTGGGAAGTGAGGAGCCCCTCTGCCG
GGCCAGCCACCCCGTCCGGGAGGGAGGTAGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCC
CCATCCGGGAGGGAGGTGGGGGGTCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGGG
TGAGGGGCGCCTCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACCCCTCTGCCCGGCCA
GCCGCCCCCTCCGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCTGTC
TGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCATGCGGGAGGTGAG
GGGCGCCTCTGCCTGGCCGCCCCTACTAGGAAGTGAGGCGCCCCGCTGCCCGGCCAGCCG
CCCCGTCCGGGAGGGAGGTGGGGGGTCAGCCCT
>KI270385.1 dna:scaffold scaffold:GRCh38:KI270385.1:1:990:1 REF
TTTCATAGAGCATGTTTGAAACACTCTTTCTGTAGTATCTGAAAACGGACATTTCAAGCG
CTTTCAGGCCTATGGTAAGAAAGGAAATATCTTCAAATAAAAACTAGAGAGAAGCATTCT
CAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTCTGATACA
ACATTTTGGAAACACTCTTTTTGTAGAATCTGCAAGTGGATAATTGGATAGCTTTGAAGG
TTTCGTTGGAAACGGGAATATCTTCATATAAAATCAAGACAGAAGCATTCTCAGAAACTT
CTCTGTGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCCTTCATAGAGCAGGTTTG
AAACACTCTTTTTGTAATATTTGGAAGTGGACATTTGCAGCGCTTTGAGGCCTATGTTGA
AAAAGGAAATATCTTCTCCTGAAAACCAGACAGAAGCATTCTCAGAAACTTCCTTGTGAT
GTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAACAGTCT
TTTTGTAGAATCTGGAAGTAGATATTTGGACACCTTTGAGGATTTCTTTGGAAACGGGAT
ATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTC
AATTAGCAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACCCCTTTAGTAGGA
TATGCAAGTTGATATTTAGATAACTAGGAAGATTTCCTTGGAAACGGAATATCTTCATAT
AAATCTAGACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAG
TTGATATTCCCTTTTATAGAGCAGGTTTGAAACACTCTTTCTGCACTTACCTGAAGAAGA
CTTTTGCAGCGCTTTGAGGCCTATGTTGAAAAAGGAATATCTTCCCATAAAACTAAACAG
AGCATTCTCAGAAACTTGTTGTGATGTGTG
>KI270423.1 dna:scaffold scaffold:GRCh38:KI270423.1:1:981:1 REF
AGATTTCGTTGGAACGGGATAAACTTCCCAGAACTACACGGAATCATTCTCAAAAACTTC
ATTGTGATGTTTGCATTCAACTCACAGAGTTGAACCTTGCTTTCATAGTTCAGCTTTCAA
ACACTCTTTTTGTAGAATCTGCTAGTGGATATTTGGACCACTTTGTGGCCTTCCTTCGAA
ACGGGTATATCTTCACATCAAACCTAGACAGAAGCATTCTCAGAATGTTTCCTGTGATGA
CTGCATTCAACTCACAGAGGTGAGCAATCCTGTTGATGGAGCAGTTTTGAAACTCTCTTT
CTTTGGAATCTGCAAGTGGATGTGTGGACCTCTTTGAAGATTTCGTTGGAAACAGGTTCT
TCTTCACAGAAAAACTAAACAGAAGCATTCTCAGAAACTACTTTATGACGTTTGTGTTCA
ACTTGCAGAGTGAAATTTCCTCTTGACAGAGCAGCTATGAAACATTGCTTTTCTTGAATC
TGCAAGTGGACATTTGGAGGGCTTTGAGGCCTGTGGCGGAAACGTTAATATCTGCATATA
AAAACTAGATAGAAGCATTCTGAGAATCTACTTTATGATGATTGCATTCGACTCACAGAG
TTGAACCTTCCAATGGATAGAGCAGTTTGTAAACACTCTTTTTGTAGAATCTGTGATTGC
TGATTTGGACTGCATTGAGGCCTACGGTACTAAAGGAAATAACTTCACCTAAAATCCAAA
CGGAAGCATTCACAGAAAATTCTTTGTGATGATTGGATTGAACTAAGAGAGCTGAACATT
CCTTTAGATGGCGCAGTTTCCAAACACACTTTCTGTAGAATCTGCAGGTGGATATTTGGA
CCTCTCTGAGGATTTCGTTGGAAATGGGATAAACTTCCCAGAACTACACGGAAGCATTCT
CAGAAACTTCTTTGTGATGTTTGCATTCACTCACAGAGTTGAACCTTGCTTTCATAGTTC
AGCTTTCAAACACTCTTTTTG
>KI270392.1 dna:scaffold scaffold:GRCh38:KI270392.1:1:971:1 REF
ATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACCTTTGTTTTGATGCAGCATTTTGG
AAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTTGAAGGTTTCGTTGG
AAACGGGAATATCTTCATATAAAATCAATACAGAAGCCTTCTCAGAAACTTCTCTGTGAT
GTTTGCATTGAACTCACAGAGTTGAACACTTCCTTTCATAGAGCTGGTTTGAAATACTCT
TTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTGTGGTGAAAAAGGAGA
TATCTTCTCCTAAAAACCATACAGAAGCATTCTCAGAATCTTTCTTGTGATGTGTGTACT
CAAGTAACACAGTTGAACCTTCAATTTGACAGAGCAGTTTTGAAGCACTCTTTTTGTAGA
ATCTGCAAGTGGATATTTTGATACCTTTGAGGATTTCGTTGGACACTGGATATCTTCATA
TAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTCAATTAACAG
AGTTGAACCTTTGTTTCGATACAGCATTTTGGAAACATTCCTTTAGTAGAATCTGCAAGT
TGATATTCAGATAGCTAGGAAGATTTCCTTGGAAACGGGAATATCTTCATATAAAATCTA
GACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAGTTGAATA
TTCCCTTTCATAGAGTAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAGTGGACGTTTC
AAGCGCTTTCAGGCCTGTGGTGAAAAAGGAAATATCTTCAAATAAAAATTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACATTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTT
GAAGTTTTCGT
>KI270394.1 dna:scaffold scaffold:GRCh38:KI270394.1:1:970:1 REF
AAGTGGATATTTGGATAGCTTTGAGGATTTCGTTGGAAACGGGATTACATATAAAATCTA
GAGAGAAGCATTCTCAGGAACTTCTTTGTGATGTTTGCATTCAAGTCACAGAACTGAACA
TTCCCTTTCATAGAGCAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAACGGACATTTC
ATACGCTTTCAGGCCTATGGTGAGAAAGGAAATATCTTCAAATAAAAACTAGACAGAAGC
ATTCTCAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTTTG
ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGGTTT
GAAGGTTTCGTTGGAAACGGGAATATCTTCATATAAAATCAACACAGAAGCATTCTCAGA
AACTTCTCTGCGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCTTTCATAGAGCTG
GTTTGAAATACTCTTTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTAT
GTTGAAAATGGAAATATCTTCTCCTAAAAACCAGACAGAAGCATTCTCAGAAACTTCCTT
GTGATGTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAAC
AGTCTTTTTGTAGAATCTGGAAGTAGATATTTGGATACCTTTGAGGATTTCTTTGGAAAC
GGGATATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATG
TCCTCAATTAACAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACTCCTTTAG
TAGAATCTGCAAGTTGATACTTAGATAGGAAGATTTCCTTGGAAACGGGAATATCTTCAT
ATAAAATCTAGACGGAAGCATTCTCGGAAACTTCTTTGTGCTGTATGTCCTCAATAACAG
AGTTGAACCT
网友评论