美文网首页
test5: pandas merge 两个数据框

test5: pandas merge 两个数据框

作者: 夕颜00 | 来源:发表于2020-06-09 15:02 被阅读0次

    目的:从string库下载文件做PPI分析
    原因:string网站上下载的蛋白ID为9606.ENSP* 样式,需要转换成gene_name
    1、文件1:9606.protein.links.v11.0.txt

    protein1 protein2 combined_score
    9606.ENSP00000000233 9606.ENSP00000272298 490
    9606.ENSP00000000233 9606.ENSP00000253401 198
    9606.ENSP00000000233 9606.ENSP00000401445 159
    

    2、注释文件:9606.protein.info.v11.0.txt

    protein_external_id preferred_name  protein_size    annotation
    9606.ENSP00000000233    ARF5    180 ADP-ribosylation...
    9606.ENSP00000000412    M6PR    277 Cation-dependent...
    

    3、目标转换文件:

    gene1  gene2
    ARF5  CALM2
    ARF5  ARHGEF9
    ARF5  ERN1
    

    4、脚本:

    import pandas as pd
    
    infofile = "E:/Script/python/test5/9606.protein.info.v11.0.txt"
    linkfile = "E:/Script/python/test5/9606.protein.links.v11.0.txt"
    out = "E:/Script/python/test5/merge.txt"
    
    info = pd.read_table(infofile)
    links = pd.read_table(linkfile, sep=" ")   ##文件是以空格分隔符,非常规\t分隔符
    # print(links.head())
    # print(links.columns)
    result1 = pd.merge(links, info, left_on="protein1", right_on="protein_external_id", how='left').iloc[:,0:5]
    result2 = pd.merge(result1,info,left_on="protein2",right_on="protein_external_id", how='left').iloc[:,[4,6]]
    result2.columns=["gene1","gene2"]
    # print(result2.head())
    result2.to_csv(out,sep="\t",index =False)
    

    相关文章

      网友评论

          本文标题:test5: pandas merge 两个数据框

          本文链接:https://www.haomeiwen.com/subject/dbwstktx.html