美文网首页小明数据分析比较基因组R语言可视化
R语言里做基因组共线性可视化R包~GENESPACE

R语言里做基因组共线性可视化R包~GENESPACE

作者: 小明的数据分析笔记本 | 来源:发表于2022-03-20 22:58 被阅读0次

    论文

    GENESPACE: syntenic pan-genome annotations for eukaryotes

    https://www.biorxiv.org/content/10.1101/2022.03.09.483468v1

    还没有发表

    github主页

    https://github.com/jtlovell/GENESPACE

    详细介绍

    https://htmlpreview.github.io/?https://github.com/jtlovell/GENESPACE/blob/master/doc/genespaceOverview.html

    windows系统还不能用 只能在MacOS或者在Linux系统下使用,我试试在linux下使用

    首先安装orthofinder

    conda install -c bioconda orthofinder 
    

    安装MCScanX

    https://github.com/wyp1125/MCScanX

    git clone https://github.com/wyp1125/MCScanX.git
    cd MCScanX
    make
    
    image.png

    这里出现了三个error,但是也出现了三个可执行程序,试了一下可以运行,不知道后面会不会有影响

    image.png

    安装依赖的R包

    conda install r-data.table r-dbscan r-R.utils r-devtools
    conda install bioconductor-Biostrings bioconductor-rtracklayer
    

    安装GENESPAE

    # 启动R radian
    devtools::install_github("jtlovell/GENESPACE", upgrade = F)
    

    运行示例数据

    library(GENESPACE)
    runwd<-file.path("./testGenespace/")
    make_exampleDataDir(writeDir = runwd) ## 这一步会下载示例数据
    
    gids<-c("human","chimp","rhesus") 
    gpar<-init_genespace(genomeIDs = gids,speciesIDs = gids,versionIDs = gids,ploidy = rep(1,3),wd = runwd,gffString = "gff",pepString = "pep",path2orthofinder = "orthofinder",path2mcscanx = "/home/myan/scratch/apps/mingyan/Biotools/MCScanX",path2diamond = "diamond",diamondMode = "fast",orthofinderMethod = "fast",rawGenomeDir = file.path(runwd,"rawGenomes")) 
    
    parse_annotations(gsParam = gpar,gffEntryType = "gene",gffIdColumn ="locus",gffStripText = "locus=",headerEntryIndex = 1,headerSep = " ",headerStripText = "locus=") 
    # 上面这行代码没有看懂是在干啥
    
    gpar<-run_orthofinder(gsParam = gpar)  
    
    ## 运行这行代码出现警告信息
    Warning message:
    In system2(gsParam$paths$orthofinderCall, com, stdout = TRUE, stderr = TRUE) :
      running command ''orthofinder' -b ./testGenespace//orthofinder -t 4 -a 1 -X -og 2>&1' had status 120 and error message 'Interrupted system call'
    ## 不知道时候对后续有影响 有可能是 runwd<-file.path("./testGenespace/") 这行代码最后多了一个斜线 重新运行了一遍没有问题了
    
    gpar<-synteny(gsParam = gpar)
    
    ## 画图展示
    
    pdf(file="abc.pdf",width = 10,height = 8)
    plot_riparianHits(gpar)
    dev.off()
    
    image.png

    画图更多的参数

    pdf(file="abc.pdf",width = 9.6,height = 4)
    plot_riparianHits(gpar, refGenome = "chimp",invertTheseChrs = data.frame(genome = "rhesus", chr = 2),genomeIDs = c("chimp", "human", "rhesus"),labelTheseGenomes = c("chimp", "rhesus"),gapProp = .001,refChrCols = c("#BC4F43", "#F67243"),blackBg = FALSE,returnSourceData = T, verbose = F)
    dev.off()
    
    image.png

    还可以自定义感兴趣的区域

    regs <- data.frame(genome = c("human", "human", "chimp", "rhesus"),chr = c(3, 3, 4, 5),start = c(0, 50e6, 0, 60e6),end = c(10e6, 70e6, 50e6, 90e6),cols = c("pink", "gold", "cyan", "dodgerblue"))
     pdf(file = "abc2.pdf",width = 9.6,height = 4)
    plot_riparianHits(gpar, onlyTheseRegions = regs,blackBg = FALSE)
    dev.off()
    
    
    image.png

    构建泛基因组组

    pg <- pangenome(gpar)
    

    输出一个文件 results/human_pangenomeDB.txt.gz

    打开这个文件,部分结果如下

    image.png

    这个结果怎么看暂时没看懂

    帮助文档里写道

    This is the source data that can be manipulated programatically to extract your regions of interest. Future GENESPACE releases will have auxilary functions that let the user access the pan-genome by rules (e.g. contains these genes, in these regions etc.). For now, we’ll leave this work to scripting by the user.

    接下来就是研究研究如何准备自己的数据

    欢迎大家关注我的公众号

    小明的数据分析笔记本

    小明的数据分析笔记本 公众号 主要分享:1、R语言和python做数据分析和数据可视化的简单小例子;2、园艺植物相关转录组学、基因组学、群体遗传学文献阅读笔记;3、生物信息学入门学习资料及自己的学习笔记!

    相关文章

      网友评论

        本文标题:R语言里做基因组共线性可视化R包~GENESPACE

        本文链接:https://www.haomeiwen.com/subject/qdhgdrtx.html