美文网首页
test_xia9:匹配vcf多列信息,进行注释

test_xia9:匹配vcf多列信息,进行注释

作者: 夕颜00 | 来源:发表于2020-09-09 16:17 被阅读0次

    1、文件1:样本突变文件out.csv


    image.png

    2、cosmic注释文件: hg19_cosmic89_somatic.txt.gz


    image.png
    3、输出文件:
    对突变位点进行cosmic注释
    image.png

    二、脚本:

    import gzip
    import pandas as pd
    
    sample = 'out.csv'
    output = 'out_9.csv'
    with gzip.open('hg19_cosmic89_somatic.txt.gz', 'rt') as f:
        cosmic = pd.read_table(f)
        cosmic['#Chr'] = 'chr' + cosmic['#Chr'].astype(str)
        cosmic['ID'] = cosmic['#Chr'] + '-' + cosmic['Start'].map(str) + '-' + cosmic['Ref'] + '-' + cosmic['Alt']
        anno = cosmic[['ID', 'cosmic_id', 'cosmic_CDS', 'cosmic_pHGVs', 'cosmic_CNT']]
        cosmic_index = anno.set_index('ID')
      
    
        data = pd.DataFrame(pd.read_csv(sample, engine='python'))
        data['sum'] = data['CHROM'] + '-' + data['POS'].map(str) + '-' + data['REF'] + '-' + data['ALT']
        data_index = data.set_index('sum')
        res = pd.merge(data_index,cosmic_index,left_index=True, right_index=True, how='left')
        res.to_csv(output,index=False)
       
    
    
    

    相关文章

      网友评论

          本文标题:test_xia9:匹配vcf多列信息,进行注释

          本文链接:https://www.haomeiwen.com/subject/nfneektx.html