美文网首页
python实战

python实战

作者: M78_a | 来源:发表于2020-03-31 23:58 被阅读0次

    目前为止:笔记内容比较乱,请看的同学,移步到生信宝典。里面内容写得非常好,最近我在练习里面的例题。

    目的:写入变量内容到文件

    context = '''The best way to learn python contains two steps:
    1. Rember basic things mentionded here masterly.
    2. Practise with real demands.
    '''
    方法一:
    fh = open("test_file.txt","w")
    print(context,file=fh)
    fh.close()
    
    for line in open("Test_file.txt"):
        print(line,end="")
    fh.close()
    
    for line in open("Test_file.txt"):
        print(line.strip())
    fh.close()
    
    方法二:with open as ;print
    with open("test_file2.txt","w") as fh:
        print(context,file=fh)
    fh.close()
    

    目的:把fasta序列存入字典中,并且多行序列转为一行打印输出

    #!/usr/bin/env python
    import sys
    fname = sys.argv[1]
    f = open(fname)
    lines = f.readlines()
    
    #首先把序列存入字典
    aDict={}
    a=""
    b=""
    for i in lines:
        if i.startswith(">"):
            a=i.strip()
        else:
            b=b+i.strip()
    
            aDict[a]=b
    print(aDict)
    
    #对刚刚建立的字典进行循环,输出
    for k,v in aDict.items():
        print(k)
        print(v)
    

    看看效果:


    image.png

    目的:将fasta文件中的序列按照每80个字母一行的序列输出
    思路:
    前面已经把fasta序列存入字典中。如果想按照80个字母一行输出,那么自然会用到列表或者字符串的切片。用到切片,那么这个切片的索引值哪里来?而且这个索引值是有规律的:每次都是第二个索引比第一个索引多80。我们就想到range()函数可以提供这个功能。
    方法1:字典的值是一个列表。

    #!/usr/bin/env python
    import sys
    fname = sys.argv[1]
    f = open(fname)
    lines = f.readlines()
    #print(lines)
    aDict={}
    for i in lines:
        if (i.startswith(">")):
            name = i.rstrip()
            aDict[name] = []
        else:
            i = i.rstrip()
            aDict[name].append(i)
    #print(aDict)
    
    for a,b in aDict.items():
        print(a)
        c=''.join(b)#把value给连接起来
        #print(c)
        for n in list(range(0,len(c),80)):
            print(c[n:(n+80)])
    

    方法2:字典的值是一个字符串。我觉得这个更简单些

    #!/usr/bin/env python
    #fasta序列存入字典,目前最简单理解的方式
    import sys
    fname = sys.argv[1]
    f = open(fname)
    lines = f.readlines()
    #print(lines)
    aDict={}
    a=""
    b=""
    for i in lines:
        if i.startswith(">"):
            a=i.strip()
        else:
            b=b+i.strip()
    
            aDict[a]=b
    #print(aDict)
    
    for k,v in aDict.items():
        print(k)
        #print(v)
        #print(len(v))
        for n in list(range(0, len(v), 80)):
            print(v[n:n+80])
    
    
    image.png

    目的:写程序sortFasta.py, 读入test2.fa, 并取原始序列名字第一个空格前的名字为处理后的序列名字,排序后输出
    • 用到的知识点
    – sort
    – dict
    – aDict[key] = []
    – aDict[key].append(value)

    #!/usr/bin/env python
    import sys
    fname = sys.argv[1]
    f = open(fname)
    lines = f.readlines()
    #print(lines)
    aDict={}
    a=""
    b=""
    for i in lines:
        if i.startswith(">"):
            a=i.strip().split()[0]
        else:
            b=b+i.strip()
    
            aDict[a]=b
    #print(aDict)
    
    for i in sorted(aDict):#这个sorted函数可以直接对key进行升序,
        print(i)
        print(aDict[i])
    
    
    image.png
    1. 提取给定名字的序列
      • 写程序grepFasta.py, 提取fasta.name 中名字对应的test2.fa 的序列,并输出到屏幕。
      • 写程序grepFastq.py, 提取fastq.name 中名字对应的test1.fq 的序列,并输出到文件。
      – 用到的知识点
    • print >>fh, or fh.write()
    • 取模运算,4 % 2 == 0
    #!/usr/bin/env python
    #fasta序列存入字典,目前最简单理解的方式
    import sys
    fname = sys.argv[1]
    f = open(fname)
    lines = f.readlines()
    #print(lines)
    aDict={}
    a=""
    b=""
    for i in lines:
        if i.startswith(">"):
            a=i.strip().split()[0]
        else:
            b=b+i.strip()
    
            aDict[a]=b
    print(aDict)
    
    #按键的顺序输出。对键进行排序
    """
    for i in sorted(aDict):
        print(i)
        print(aDict[i])
    """
    """
    #取某个键值,打印出来
    key = input("请输入ID:")
    print(key)
    print(aDict.get(key,"not exist"))
    """
    
    #写入文件
    key = input("请输入ID:")
    print(key)
    print(aDict.get(key,"not exist"))#打印出来看看
    context = aDict.get(key,"not exist")#提取的序列赋值给一个变量
    
    with open("test_file3.txt","w") as fh:#以写的方式创建文件
        print(context,file=fh)#写入内容
    fh.close()
    
    不过好像有的知识点没有用到
    

    练习用的序列来自人类基因组序列的一段:

    >KI270539.1 dna:scaffold scaffold:GRCh38:KI270539.1:1:993:1 REF
    GCCCCACGTCCGGGAGGGAGGTGGGGGTGTCAGCCCCCCACCAGGCCAGCCGCCCCGTCC
    GGGAGGGAGGTGGGGTCAGACCCCCGCCCGGCCAGCCGCCCCGTCCGGGAAGGGAGGGGC
    GTCTCTGCCCAGCCACCCCTACTGGGAAGTGAGGAGCTCCTCTGCCGGGCCAGCCACCCC
    GTCCGGGAGGGAGGTGGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCGGGAGGG
    AGGTGGGGGGGCCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGAGTGAGGGGCGCCT
    CTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACTCCTCTGTCCGGCCAGCCGCCCCGTCC
    GGGAGGGAGGTGGAGGGGTCAGCCCCCCCGCCCGGCCAGACGCCCCGTCTGGGAGGGAGG
    TGGGGGGGTCAGCCCCACGTCCGGGAGGGAGGTGTGGGGGGGTCAGCCCCCTGCCAGGCC
    AGCCGCCCCGTCCGGGAGGGAGGTGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCGTCCG
    GGAGGTGAGGGGCGCCTCTGCCCAGCCGCCCCTACTGGGAAGTGAGGAGCCCCTCTGCCG
    GGCCAGCCACCCCGTCCGGGAGGGAGGTAGGGGGCTCAGCCCCCCGCCCGGCCAGCCGCC
    CCATCCGGGAGGGAGGTGGGGGGTCAGCCCCCCGCCCGGCCAGCCACCCCGTCCGGGGGG
    TGAGGGGCGCCTCTGCCCGGCCGCCCCTACTGGGAAGTGAGGAACCCCTCTGCCCGGCCA
    GCCGCCCCCTCCGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCTGTC
    TGGGAGGGAGGTGGGGGGGTCAGCCCCCCGCCCGGCCAGCCGCCCCATGCGGGAGGTGAG
    GGGCGCCTCTGCCTGGCCGCCCCTACTAGGAAGTGAGGCGCCCCGCTGCCCGGCCAGCCG
    CCCCGTCCGGGAGGGAGGTGGGGGGTCAGCCCT
    >KI270385.1 dna:scaffold scaffold:GRCh38:KI270385.1:1:990:1 REF
    TTTCATAGAGCATGTTTGAAACACTCTTTCTGTAGTATCTGAAAACGGACATTTCAAGCG
    CTTTCAGGCCTATGGTAAGAAAGGAAATATCTTCAAATAAAAACTAGAGAGAAGCATTCT
    CAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTCTGATACA
    ACATTTTGGAAACACTCTTTTTGTAGAATCTGCAAGTGGATAATTGGATAGCTTTGAAGG
    TTTCGTTGGAAACGGGAATATCTTCATATAAAATCAAGACAGAAGCATTCTCAGAAACTT
    CTCTGTGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCCTTCATAGAGCAGGTTTG
    AAACACTCTTTTTGTAATATTTGGAAGTGGACATTTGCAGCGCTTTGAGGCCTATGTTGA
    AAAAGGAAATATCTTCTCCTGAAAACCAGACAGAAGCATTCTCAGAAACTTCCTTGTGAT
    GTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAACAGTCT
    TTTTGTAGAATCTGGAAGTAGATATTTGGACACCTTTGAGGATTTCTTTGGAAACGGGAT
    ATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTC
    AATTAGCAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACCCCTTTAGTAGGA
    TATGCAAGTTGATATTTAGATAACTAGGAAGATTTCCTTGGAAACGGAATATCTTCATAT
    AAATCTAGACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAG
    TTGATATTCCCTTTTATAGAGCAGGTTTGAAACACTCTTTCTGCACTTACCTGAAGAAGA
    CTTTTGCAGCGCTTTGAGGCCTATGTTGAAAAAGGAATATCTTCCCATAAAACTAAACAG
    AGCATTCTCAGAAACTTGTTGTGATGTGTG
    >KI270423.1 dna:scaffold scaffold:GRCh38:KI270423.1:1:981:1 REF
    AGATTTCGTTGGAACGGGATAAACTTCCCAGAACTACACGGAATCATTCTCAAAAACTTC
    ATTGTGATGTTTGCATTCAACTCACAGAGTTGAACCTTGCTTTCATAGTTCAGCTTTCAA
    ACACTCTTTTTGTAGAATCTGCTAGTGGATATTTGGACCACTTTGTGGCCTTCCTTCGAA
    ACGGGTATATCTTCACATCAAACCTAGACAGAAGCATTCTCAGAATGTTTCCTGTGATGA
    CTGCATTCAACTCACAGAGGTGAGCAATCCTGTTGATGGAGCAGTTTTGAAACTCTCTTT
    CTTTGGAATCTGCAAGTGGATGTGTGGACCTCTTTGAAGATTTCGTTGGAAACAGGTTCT
    TCTTCACAGAAAAACTAAACAGAAGCATTCTCAGAAACTACTTTATGACGTTTGTGTTCA
    ACTTGCAGAGTGAAATTTCCTCTTGACAGAGCAGCTATGAAACATTGCTTTTCTTGAATC
    TGCAAGTGGACATTTGGAGGGCTTTGAGGCCTGTGGCGGAAACGTTAATATCTGCATATA
    AAAACTAGATAGAAGCATTCTGAGAATCTACTTTATGATGATTGCATTCGACTCACAGAG
    TTGAACCTTCCAATGGATAGAGCAGTTTGTAAACACTCTTTTTGTAGAATCTGTGATTGC
    TGATTTGGACTGCATTGAGGCCTACGGTACTAAAGGAAATAACTTCACCTAAAATCCAAA
    CGGAAGCATTCACAGAAAATTCTTTGTGATGATTGGATTGAACTAAGAGAGCTGAACATT
    CCTTTAGATGGCGCAGTTTCCAAACACACTTTCTGTAGAATCTGCAGGTGGATATTTGGA
    CCTCTCTGAGGATTTCGTTGGAAATGGGATAAACTTCCCAGAACTACACGGAAGCATTCT
    CAGAAACTTCTTTGTGATGTTTGCATTCACTCACAGAGTTGAACCTTGCTTTCATAGTTC
    AGCTTTCAAACACTCTTTTTG
    >KI270392.1 dna:scaffold scaffold:GRCh38:KI270392.1:1:971:1 REF
    ATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACCTTTGTTTTGATGCAGCATTTTGG
    AAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTTGAAGGTTTCGTTGG
    AAACGGGAATATCTTCATATAAAATCAATACAGAAGCCTTCTCAGAAACTTCTCTGTGAT
    GTTTGCATTGAACTCACAGAGTTGAACACTTCCTTTCATAGAGCTGGTTTGAAATACTCT
    TTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTGTGGTGAAAAAGGAGA
    TATCTTCTCCTAAAAACCATACAGAAGCATTCTCAGAATCTTTCTTGTGATGTGTGTACT
    CAAGTAACACAGTTGAACCTTCAATTTGACAGAGCAGTTTTGAAGCACTCTTTTTGTAGA
    ATCTGCAAGTGGATATTTTGATACCTTTGAGGATTTCGTTGGACACTGGATATCTTCATA
    TAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATGTCCTCAATTAACAG
    AGTTGAACCTTTGTTTCGATACAGCATTTTGGAAACATTCCTTTAGTAGAATCTGCAAGT
    TGATATTCAGATAGCTAGGAAGATTTCCTTGGAAACGGGAATATCTTCATATAAAATCTA
    GACGGAAGCATTCTCAGAAAGTGCTTTGTGATGTTTGCATTCAAGTCACAGAGTTGAATA
    TTCCCTTTCATAGAGTAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAGTGGACGTTTC
    AAGCGCTTTCAGGCCTGTGGTGAAAAAGGAAATATCTTCAAATAAAAATTAGACAGAAGC
    ATTCTCAGAAACTTATTTGCGATGTGTGTTCTCAACTAACAGAGTTGAACATTTGTTTTG
    ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGCTTT
    GAAGTTTTCGT
    >KI270394.1 dna:scaffold scaffold:GRCh38:KI270394.1:1:970:1 REF
    AAGTGGATATTTGGATAGCTTTGAGGATTTCGTTGGAAACGGGATTACATATAAAATCTA
    GAGAGAAGCATTCTCAGGAACTTCTTTGTGATGTTTGCATTCAAGTCACAGAACTGAACA
    TTCCCTTTCATAGAGCAGGTTTGAAACACTCTTTCTGTAGTATCTGCAAACGGACATTTC
    ATACGCTTTCAGGCCTATGGTGAGAAAGGAAATATCTTCAAATAAAAACTAGACAGAAGC
    ATTCTCAGAAACTTATTTGCGATGTGTGTCCTCAACTAACAGAGTTGAACCTTTGTTTTG
    ATACAGCATTTTGGAAACACTCTTTTTGTAGGATCTGCAGGTGGATATTTGGATAGGTTT
    GAAGGTTTCGTTGGAAACGGGAATATCTTCATATAAAATCAACACAGAAGCATTCTCAGA
    AACTTCTCTGCGATGTTTGCATTCAACTCATAGAGTTGAACACTTCCTTTCATAGAGCTG
    GTTTGAAATACTCTTTTTGTAATATTTGGAAGTGGACATTGGCAGCGCTTTGAAGCCTAT
    GTTGAAAATGGAAATATCTTCTCCTAAAAACCAGACAGAAGCATTCTCAGAAACTTCCTT
    GTGATGTGTGTACTCAAGTAACAGAGTTGAACCTTACTTTTGACAGAGCCGTTTTGAAAC
    AGTCTTTTTGTAGAATCTGGAAGTAGATATTTGGATACCTTTGAGGATTTCTTTGGAAAC
    GGGATATCTTCATATAAAATCTAGACAGAAGCATTCTCAGAAACTTCTTTGTGCTGTATG
    TCCTCAATTAACAGAGTTGAACCTTTGTGTGGATACAGCATTTTGGAAACACTCCTTTAG
    TAGAATCTGCAAGTTGATACTTAGATAGGAAGATTTCCTTGGAAACGGGAATATCTTCAT
    ATAAAATCTAGACGGAAGCATTCTCGGAAACTTCTTTGTGCTGTATGTCCTCAATAACAG
    AGTTGAACCT
    

    相关文章

      网友评论

          本文标题:python实战

          本文链接:https://www.haomeiwen.com/subject/tzwuuhtx.html