python: prevalence & annotate

作者: 胡童远 | 来源:发表于2022-06-14 17:06 被阅读0次

python: prevalence & annotate
痴呆症正在成为严重的全球危机
matplot画图技巧更新
R: calculate prevalence
julia + plots 【annotate -- 给图形加
annotate: ID 转换终结者+数据包&NCBI查
android studio 查看类修改时间
【bedtools教程】-annotate功能
Read Books - Annotate, Sign and
Peak annotation

1 根据分组提取特定样本
2 提取特定样本的ko丰度表
3 把KO注释做成字典
4 计算0的数量，每组样本数，非0百分比；给字典
5 KO字典注释，没有KO的不注释（db需要更新）

#!/usr/bin/env python3
import pandas as pd
import numpy as np
import os, re, sys

# meta
meta = pd.read_table("../association_ko_species/meta_group.txt")
sample_g1 = meta[meta['Group'] == "G1"]['Sample']
sample_g2 = meta[meta['Group'] == "G2"]['Sample']
sample_g3 = meta[meta['Group'] == "G3"]['Sample']
sample_g4 = meta[meta['Group'] == "G4"]['Sample']
sample_g5 = meta[meta['Group'] == "G5"]['Sample']
sample_g6 = meta[meta['Group'] == "G6"]['Sample']
# ko
ko = pd.read_table("../association_ko_species/merge_out.txt", 
                   index_col=0)
ko_tmp = pd.DataFrame(ko, columns=list(sample_g6)) ##
# ko_tmp = ko_tmp.iloc[0:10,:]
# ko annotation
file = "/public/home/zzumgg03/huty/databases/kofam/KEGG_KO.txt"
with open(file, 'r') as file:
    Anno = {}
    for line in file:
        line = line.strip()
        key = re.split(r'\t', line)[0]
        Anno[key] = "\t".join(re.split(r'\t', line)[1:5])
        
# summary
zero = (ko_tmp == 0).astype(int).sum(axis=1)
sample = np.full(ko_tmp.shape[0], ko_tmp.shape[1], dtype='int') 

Dict = {}
Dict['num_zero'] = np.array(zero) # number of zero
Dict['num_sample'] = np.array(sample) # number of samples
Dict['ko'] = list(ko_tmp.index) # ko id
Dict['percent'] = Dict['num_zero']/Dict['num_sample']
# anno geneset ko
Dict['Anno'] = []
for each in Dict['ko']:
    if each in Anno.keys():
        Dict['Anno'].append(Anno[each])
    else:
        Dict['Anno'].append("")
    
# save
with open("ko_percent_g6.txt", 'w') as o:
    o.write("ko\tnum_zero\tnum_sample\tpercent\tlevel_1\tlevel_2\tpathway\tgene\n")
    for i in range(0, len(Dict['ko'])):
        o.write("{}\t{}\t{}\t{}\t{}\n".format(
            Dict['ko'][i],
            Dict['num_zero'][i],
            Dict['num_sample'][i],
            Dict['percent'][i],
            Dict['Anno'][i]))

运行

##
nohup python3 sc_ko_percent_anno_g1.py &
nohup python3 sc_ko_percent_anno_g2.py &
nohup python3 sc_ko_percent_anno_g3.py &
nohup python3 sc_ko_percent_anno_g4.py &
nohup python3 sc_ko_percent_anno_g5.py &
nohup python3 sc_ko_percent_anno_g6.py &

网友评论

本文标题：python: prevalence & annotate

本文链接：https://www.haomeiwen.com/subject/gemzmrtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python: prevalence & annotate

相关文章

python: prevalence & annotate

痴呆症正在成为严重的全球危机

matplot画图技巧更新

R: calculate prevalence

julia + plots 【annotate -- 给图形加

annotate: ID 转换终结者+数据包&NCBI查

android studio 查看类修改时间

【bedtools教程】-annotate功能

Read Books - Annotate, Sign and

Peak annotation

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读