美文网首页python---生信Python
python处理fasta文件序列

python处理fasta文件序列

作者: 东风008 | 来源:发表于2020-04-20 14:39 被阅读0次

    1.获取给定DNA序列的反向互补序列:

    思路:从一个文件中导入序列,然后将互补后的结果输出到另外一个文件中。先定义的一个字典: complement = {'C': 'G', 'G': 'C', 'T': 'A', 'A': 'T'}
    然后 for i in list(seq):
        rev_dna += complement[i]
        rev_dna = rev_dna[::-1]
      print (rev_dna)
    直接得到互补后的结果

    n =  0
    complement = {'A':'T','G':'C','C':'G','T':'A'}
    rev_seq: str= ''
    with open(r'/home/yt/PycharmProjects/test/1minia_result.txt', 'w') as f1:
        with open(r'/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r') as f:
           for dna_seq in f:
               n +=1
               if n < 100:
                   rev_seq = ''
                   if dna_seq.startswith(">"):
                       f1.write(dna_seq)
                   else:
                       dna_seq = list(dna_seq.strip())
                       for i in dna_seq:
                           rev_seq += complement[i]
                           rev_seq = rev_seq[::-1]
                       f1.write(rev_seq)
                       f1.write('\n')
    

    2.把DNA转换成RNA

    思路:将DNA序列中的T换成U就可以啦,直接import re,然后用sub就行啦

    import re
    dnaSeq = ''
    with open('/home/yt/PycharmProjects/test/1minia.contigs.fa') as f:
        for line in f:
            line = line.rstrip()
            dnaSeq += line.upper()
    rnaSeq1 = re.sub('T', 'U', dnaSeq)
    rnaSeq2 = dnaSeq.replace('T', 'U')
    print(rnaSeq1)
    print(rnaSeq2)
    

    3.提取指定ID的序列

    思路:将序列存入字典,打开字典读取序列,利用字典提取指定ID序列

    dict={}
    fw = open('/home/yt/PycharmProjects/test/1minia.result2.txt', 'w')
    fr = open('/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r')
    for line in fr:
        if line.startswith('>'):
            line = line.strip('\n')
            line = line.replace(' ', '')
            name = line
            dict[name] = ''
        else:
            dict[name] += line.replace('\n','')
    fr.close()
    for ID in dict.keys():
        if ID == '>14__len__4122':
            fw.write(ID + "\n" + dict[ID])
    fw.close()
    

    4.每行指定长度输出序列

    思路:使用循环输出列表,利用 计数器控制输出数量,当输出到第十个,计数器归零,重新开始计数,print输出增加end参数可以控制输出后以什么结尾,这里使用range方法快速生成10-90的数字添加进list列表

    results = list(range(10, 90))
    n = 10  # 每10个数换一行
    for i in range(len(results)):
        print(results[i], end=' ')
        if (i+1) % 10 == 0:
            print("\n") 
    
    with open(r'/home/yt/PycharmProjects/test/1minia.result4.txt', 'w') as f1:
        with open(r'/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r') as f:
          seq = ''
          for line in f:
                    seq = seq.strip('\n')
                    if line.startswith('>'):
                        seq = line
                        f1.write(seq)
                    else:
                        seqlist = list(line)
                        seqlist = seqlist[0:20]
                        seq = "".join(seqlist)
                        f1.write(seq+'\n')
    f.close
    

    5.按照序列长度/名字排序

    fasta = {}
    f1 =  open('/home/yt/PycharmProjects/test/1minia.result5.txt', 'w')
    f = open('/home/yt/PycharmProjects/test/1minia.contigs.fa', 'r')
    sortseq = ''
    for line in f:
        if line.startswith('>'):
            line = line.strip('\n')
            name = line + '\n'
            fasta[name] = ''
        else:
            fasta[name] = line
    for key, value in fasta.items():
        seqsort = sorted(fasta.items(), key=lambda i:len(i[1]))
        for n in seqsort:
            for t in n:
                sortseq += str(t)
    f1.write(sortseq)
    

    6.RNA翻译为蛋白质

    首先我们要建立一个dictionary将RNA codon table存进去,key为RNA 序列,value是对应的氨基酸。

    import re
    from collections import OrderedDict
    codonTable = OrderedDict()
    f1 = open('/home/yt/PycharmProjects/test/RNA.result','w')
    f = open('/home/yt/PycharmProjects/test/RNA','r')
        for line in f:
            line = line.rstrip()
            lst = re.split('\s+', line)
            for i in [0, 2, 4, 6]:
                codonTable[lst[i]] = lst[i + 1]
    rnaSeq = ''
    with open('/home/yt/PycharmProjects/test/RNA', 'rt') as f:
        for line in f:
            line = line.rstrip()
            rnaSeq += line.upper()
    aminoAcids = []
    i = 0
    while i < len(rnaSeq):
        codon = rnaSeq[i:i + 3]
        if codonTable[codon] != 'Stop':
            aminoAcids.append(codonTable[codon])
        i += 3
    peptide = ''.join(aminoAcids)
    f1.write(peptide)
    

    相关文章

      网友评论

        本文标题:python处理fasta文件序列

        本文链接:https://www.haomeiwen.com/subject/drqdihtx.html