美文网首页Bio_Methods策略大本营2:自由,平等,友爱。知识的搬运者
【生信分析】-文本挖掘目标基因+评估致癌能力,你造嘛?

【生信分析】-文本挖掘目标基因+评估致癌能力,你造嘛?

作者: 研平方 | 来源:发表于2021-07-17 21:34 被阅读0次

    语雀:左手柳叶刀右手炭火烧
    微信公众号:研平方 | 简书:研平方
    关注可了解更多的科研教程及技巧。如有问题或建议,请留言。
    欢迎关注我:一起学习,一起进步!

    最近,小编“扫荡”文献时,发现一个令我十分感兴趣的应用,利用文本文本挖掘技术可以评估选定基因与癌症之间的关联。提到文本挖掘这类技术,小编当然要一探究竟了。

    1.原文如下

    Literature evidence for the identified target genes in cancer

    We used OncoScore, a text mining tool to assess the associations between each gene and specific cancers based on the literature. A cutoff value of 21.09 was suggested to determine true positives and the true negatives in cancer gene identification.

    2.查找资料

    习惯性的打开浏览器,准备打破砂锅问到底,惊喜的发现,OncoScore竟然是一个写好的R包,而且放在了Bioconductor网页,可直接进行安装、使用。虽然文章发在了Sci Rep杂志上,但是小编认为还是值得一试。

    image image

    3.它能干什么

    The OncoScore analysis consists of two parts. One can estimate a score to asses the
    oncogenic potential of a set of genes, given the lecterature knowledge, at the time of the
    analysis, or one can study the trend of such score over time.

    可见,OncoScore不仅可以依据文献中的知识,对一组设定目标基因列表的致癌能力进行评分,还可以研究这个分数随时间的趋势。

    4.开始表演,拿好小板凳看戏

    4.1 准备工作

    if (!requireNamespace("BiocManager", quietly = TRUE))
        install.packages("BiocManager")
    
    BiocManager::install("OncoScore")
    
    # load the library
    library(OncoScore)
    # Define a query
    query = perform.query(c("ASXL1","IDH1","IDH2","SETBP1","TET2"))
    
    ### Starting the queries for the selected genes.
    
    ### Performing queries for cancer literature 
        Number of papers found in PubMed for ASXL1 was: 923 
        Number of papers found in PubMed for IDH1 was: 3691 
        Number of papers found in PubMed for IDH2 was: 1318 
        Number of papers found in PubMed for SETBP1 was: 177 
        Number of papers found in PubMed for TET2 was: 1609 
    
    ### Performing queries for all the literature 
        Number of papers found in PubMed for ASXL1 was: 1018 
        Number of papers found in PubMed for IDH1 was: 3902 
        Number of papers found in PubMed for IDH2 was: 1499 
        Number of papers found in PubMed for SETBP1 was: 229 
        Number of papers found in PubMed for TET2 was: 2117
    

    以上我们可以发现,通过检索,得到了癌症相关研究的文献数量,以及所有与检索基因相关文献数量。

    OncoScore provides a function to merge gene names if requested by the user. This function is useful when there are aliases in the gene list.

    combine.query.results(query, c('IDH1', 'IDH2'), 'new_gene')
             CitationsGene CitationsGeneInCancer
    ASXL1             1018                   923
    SETBP1             229                   177
    TET2              2117                  1609
    new_gene          5401                  5009
    

    当然,OncoScore还可以依据染色体信息检索基因。这里不再演示。

    4.2 重点来啦

    4.2.1 开始计算基因的致癌评分
    result = compute.oncoscore(query)
    
    ### Processing data
    ### Computing frequencies scores 
    ### Estimating oncogenes
    ### Results:
         ASXL1 -> 81.59349 
         IDH1 -> 86.66355 
         IDH2 -> 79.59096 
         SETBP1 -> 67.43283 
         TET2 -> 69.12424
    
    4.2.2 时间趋势分析(OncoScore timeline analysis)
    query.timepoints = perform.query.timeseries(c("ASXL1","IDH1","IDH2","SETBP1","TET2"),
                                                c("2012/03/01", "2013/03/01", "2014/03/01", "2015/03/01", "2016/03/01"))
    
    ### Starting the queries for the selected genes.
    ### Quering PubMed for timepoint 2012/03/01 
        ### Performing queries for cancer literature 
        Number of papers found in PubMed for ASXL1 was: 86 
        Number of papers found in PubMed for IDH1 was: 409 
        Number of papers found in PubMed for IDH2 was: 173 
        Number of papers found in PubMed for SETBP1 was: 5 
        Number of papers found in PubMed for TET2 was: 173 
        ### Performing queries for all the literature 
        Number of papers found in PubMed for ASXL1 was: 92 
        Number of papers found in PubMed for IDH1 was: 489 
        Number of papers found in PubMed for IDH2 was: 235 
        Number of papers found in PubMed for SETBP1 was: 10 
        Number of papers found in PubMed for TET2 was: 197 
    ### Quering PubMed for timepoint 2013/03/01 
        ### Performing queries for cancer literature 
        Number of papers found in PubMed for ASXL1 was: 135 
        Number of papers found in PubMed for IDH1 was: 662 
        Number of papers found in PubMed for IDH2 was: 267 
        Number of papers found in PubMed for SETBP1 was: 11 
        Number of papers found in PubMed for TET2 was: 258 
        ### Performing queries for all the literature 
        Number of papers found in PubMed for ASXL1 was: 150 
        Number of papers found in PubMed for IDH1 was: 753 
        Number of papers found in PubMed for IDH2 was: 336 
        Number of papers found in PubMed for SETBP1 was: 18 
        Number of papers found in PubMed for TET2 was: 303 
    ### Quering PubMed for timepoint 2014/03/01 
        ### Performing queries for cancer literature 
        Number of papers found in PubMed for ASXL1 was: 188 
        Number of papers found in PubMed for IDH1 was: 904 
        Number of papers found in PubMed for IDH2 was: 365 
        Number of papers found in PubMed for SETBP1 was: 29 
        Number of papers found in PubMed for TET2 was: 347
        ### Performing queries for all the literature 
        Number of papers found in PubMed for ASXL1 was: 209 
        Number of papers found in PubMed for IDH1 was: 1003 
        Number of papers found in PubMed for IDH2 was: 440 
        Number of papers found in PubMed for SETBP1 was: 36 
        Number of papers found in PubMed for TET2 was: 431 
    ### Quering PubMed for timepoint 2015/03/01 
        ### Performing queries for cancer literature 
        Number of papers found in PubMed for ASXL1 was: 257 
        Number of papers found in PubMed for IDH1 was: 1198 
        Number of papers found in PubMed for IDH2 was: 468 
        Number of papers found in PubMed for SETBP1 was: 51 
        Number of papers found in PubMed for TET2 was: 461 
        ### Performing queries for all the literature 
        Number of papers found in PubMed for ASXL1 was: 286 
        Number of papers found in PubMed for IDH1 was: 1304 
        Number of papers found in PubMed for IDH2 was: 551 
        Number of papers found in PubMed for SETBP1 was: 66 
        Number of papers found in PubMed for TET2 was: 583 
    ### Quering PubMed for timepoint 2016/03/01 
        ### Performing queries for cancer literature 
        Number of papers found in PubMed for ASXL1 was: 323 
        Number of papers found in PubMed for IDH1 was: 1506 
        Number of papers found in PubMed for IDH2 was: 569 
        Number of papers found in PubMed for SETBP1 was: 68 
        Number of papers found in PubMed for TET2 was: 587
        ### Performing queries for all the literature 
        Number of papers found in PubMed for ASXL1 was: 359 
        Number of papers found in PubMed for IDH1 was: 1625 
        Number of papers found in PubMed for IDH2 was: 661 
        Number of papers found in PubMed for SETBP1 was: 89 
        Number of papers found in PubMed for TET2 was: 745 
    

    perform.query.timeseries ()函数检索了几个设定时间的文献数据信息。

    result.timeseries = compute.oncoscore.timeseries(query.timepoints)
    
    ### Computing oncoscore for timepoint 2012/03/01 
    ### Processing data
    ### Computing frequencies scores 
    ### Estimating oncogenes
    ### Results:
         ASXL1 -> 79.14893 
         IDH1 -> 74.27776 
         IDH2 -> 64.27063 
         SETBP1 -> 34.9485 
         TET2 -> 76.29579 
    ### Computing oncoscore for timepoint 2013/03/01 
    ### Processing data
    ### Computing frequencies scores 
    ### Estimating oncogenes
    ### Results:
         ASXL1 -> 77.54983 
         IDH1 -> 78.71551 
         IDH2 -> 69.99559 
         SETBP1 -> 46.4559 
         TET2 -> 74.81894 
    ### Computing oncoscore for timepoint 2014/03/01 
    ### Processing data
    ### Computing frequencies scores 
    ### Estimating oncogenes
    ### Results:
         ASXL1 -> 78.28121 
         IDH1 -> 81.08963 
         IDH2 -> 73.50788 
         SETBP1 -> 64.97398 
         TET2 -> 71.31087 
         ### Computing oncoscore for timepoint 2015/03/01 
    ### Processing data
    ### Computing frequencies scores 
    ### Estimating oncogenes
    ### Results:
         ASXL1 -> 78.84769 
         IDH1 -> 82.99363 
         IDH2 -> 75.60886 
         SETBP1 -> 64.48853 
         TET2 -> 70.46695 
    ### Computing oncoscore for timepoint 2016/03/01 
    ### Processing data
    ### Computing frequencies scores 
    ### Estimating oncogenes
    ### Results:
         ASXL1 -> 79.37202 
         IDH1 -> 83.9881 
         IDH2 -> 76.89328 
         SETBP1 -> 64.60591 
         TET2 -> 70.53378 
    
    4.2.3 可视化
    ## Oncogenetic potential of the considered genes
    plot.oncoscore(result, col = 'darkblue')
    
    ## Absolute values of the oncogenetic potential of the considered genes over times
    plot.oncoscore.timeseries(result.timeseries)
    
    ## Variations of the oncogenetic potential of the considered genes over times
    plot.oncoscore.timeseries(result.timeseries,
                              incremental = TRUE,
                              ylab='absolute variation')
    
    ## Variations as relative values of the oncogenetic potential of the considered genes over times
    plot.oncoscore.timeseries(result.timeseries,
                              incremental = TRUE,
                              relative = TRUE,
                              ylab='relative variation')
    
    plot1.png plot2.png plot3.png plot4.png

    温馨提示:语雀上的阅读,体验更佳!

    相关文章

      网友评论

        本文标题:【生信分析】-文本挖掘目标基因+评估致癌能力,你造嘛?

        本文链接:https://www.haomeiwen.com/subject/cjqgpltx.html