美文网首页
test5: pandas merge 两个数据框

test5: pandas merge 两个数据框

作者: 夕颜00 | 来源:发表于2020-06-09 15:02 被阅读0次

目的:从string库下载文件做PPI分析
原因:string网站上下载的蛋白ID为9606.ENSP* 样式,需要转换成gene_name
1、文件1:9606.protein.links.v11.0.txt

protein1 protein2 combined_score
9606.ENSP00000000233 9606.ENSP00000272298 490
9606.ENSP00000000233 9606.ENSP00000253401 198
9606.ENSP00000000233 9606.ENSP00000401445 159

2、注释文件:9606.protein.info.v11.0.txt

protein_external_id preferred_name  protein_size    annotation
9606.ENSP00000000233    ARF5    180 ADP-ribosylation...
9606.ENSP00000000412    M6PR    277 Cation-dependent...

3、目标转换文件:

gene1  gene2
ARF5  CALM2
ARF5  ARHGEF9
ARF5  ERN1

4、脚本:

import pandas as pd

infofile = "E:/Script/python/test5/9606.protein.info.v11.0.txt"
linkfile = "E:/Script/python/test5/9606.protein.links.v11.0.txt"
out = "E:/Script/python/test5/merge.txt"

info = pd.read_table(infofile)
links = pd.read_table(linkfile, sep=" ")   ##文件是以空格分隔符,非常规\t分隔符
# print(links.head())
# print(links.columns)
result1 = pd.merge(links, info, left_on="protein1", right_on="protein_external_id", how='left').iloc[:,0:5]
result2 = pd.merge(result1,info,left_on="protein2",right_on="protein_external_id", how='left').iloc[:,[4,6]]
result2.columns=["gene1","gene2"]
# print(result2.head())
result2.to_csv(out,sep="\t",index =False)

相关文章

网友评论

      本文标题:test5: pandas merge 两个数据框

      本文链接:https://www.haomeiwen.com/subject/dbwstktx.html