美文网首页
python: 获取基因组cluster核心基因

python: 获取基因组cluster核心基因

作者: 胡童远 | 来源:发表于2021-07-30 09:36 被阅读0次

    方法简单,使用set & set遍历基因集即可。

    1 基因组cluster list

    head total_lacto.list
    CNGBCC1950658
    CNGBCC1950669
    CNGBCC1950686
    CNGBCC1950698
    CNGBCC1950902
    

    2 基因组基因list

    head ../cog_uniq/CNGBCC1950658.tsv
    COG0006
    COG0008
    COG0009
    COG0012
    COG0015
    

    3 python 计算 core gene

    思路:
    readlines基因组list,去除换行符
    用list挨个一个基因list给head,
    接着,挨个打开基因list给临时tail,用set & set计算交集

    #!/usr/bin/env python
    import os, sys, re
    g_list = "total_lacto.list"
    with open(g_list, 'r') as g_list_file:
        # 列表文件中的文件
        tmp_list = []
        for tmp in g_list_file.readlines():
            tmp = tmp.strip()
            tmp = "../cog_uniq/{}.tsv".format(tmp)
            tmp_list.append(tmp)
                
        # 两两交集
        num = 1
        with open(tmp_list[0]) as head:
            head = head.readlines()
            print("\t head done...")
            for tail in tmp_list[1:len(tmp_list)]:
                with open(tail) as tail:
                    tail = tail.readlines()
                    # 核心算法
                    head = set(head) & set(tail)
                    num = num + 1
                    print("\t intersect {} done...".format(num))
                        
        # 输出
        out_name = "./lacto_core_cog.tsv"
        with open(out_name, 'w') as o:
            out_file = ''.join(head)
            o.write(out_file)
            print("\t write done...")
    

    手动抽样验证算法准确性

    相关文章

      网友评论

          本文标题:python: 获取基因组cluster核心基因

          本文链接:https://www.haomeiwen.com/subject/payqvltx.html