美文网首页
python处理fastq文件序列

python处理fastq文件序列

作者: 东风008 | 来源:发表于2020-04-27 10:24 被阅读0次

    1.序列长度分布统计

    dict = {}
    output=open('/home/yt/PycharmProjects/test/1.fastq.result','w')
    with open('/home/yt/Desktop/1.fastq') as input_fastq:
            for index , line in enumerate(input_fastq):
                if index % 4 == 0:
                    id = line.strip('\n')
                    dict[id] = ''
                if index % 4 == 1:
                    dict[id] = len(line.strip('\n'))
    #print(dict)
    count_0_50 = 0
    count_50_120 = 0
    for k,v in dict.items():
        if 0 < v < 50:
            count_0_50 += 1
        if 50 <= v < 120:
            count_50_120 += 1
    print("count_0_50:",count_0_50)
    print("count_50_80:",count_50_120)
    

    2.FASTQ转换成FASTA

    def fastaq_2_fasta(fastaq):
        fasta_dict = {}
        fasta_q_split = fastaq.split('\n+\n')[:-1]
        for fastaq in fasta_q_split:
            fasta = fastaq.split('\n')[-2:]
            fasta_dict['>' + fasta[0]] = fasta[1]
        return fasta_dict
    file_read_name = '/home/yt/Desktop/1.fastq'
    with open(file_read_name) as fastafile:
        fileRead = fastafile.read()
    fasta = fastaq_2_fasta(fileRead)
    file_save_name = '/home/yt/PycharmProjects/test/1.fastaq_to_fasta.fasta'
    with open(file_save_name, 'w') as save_file:
        for name in fasta:
            string = name + '\n' + fasta[name] + '\n'
            save_file.write(string)
    

    3.统计各种碱基个数及GC%

    dict = {}
    seq = []
    with open('/home/yt/Desktop/1.fastq') as input_fastq:
            for index , line in enumerate(input_fastq):
                if index % 4 == 0:
                    id = line.strip('\n')
                    dict[id] = ''
                if index % 4 == 1:
                    seq.append(line.strip())
    seq1 = ''.join(seq)
    gc = 0
    for i in seq1:
        if i == 'G' or i == 'C':
            gc += 1
        print('The number of length is {}'.format(len(seq1)))
        print('GC% is {}%'.format(gc / len(seq1) * 100))
    

    相关文章

      网友评论

          本文标题:python处理fastq文件序列

          本文链接:https://www.haomeiwen.com/subject/ekzewhtx.html