美文网首页生信工具
使用anvi'o 进行微生物pangenomics泛基因

使用anvi'o 进行微生物pangenomics泛基因

作者: 你猜我菜不菜 | 来源:发表于2018-12-30 22:01 被阅读24次

    1.数据下载

    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome$ wget https://ndownloader.figshare.com/files/11857577 -O Prochlorococcus_31_genomes.tar.gz
    --2018-12-18 14:14:36--  https://ndownloader.figshare.com/files/11857577
    Resolving ndownloader.figshare.com (ndownloader.figshare.com)... 34.240.49.185
    Connecting to ndownloader.figshare.com (ndownloader.figshare.com)|34.240.49.185|:443... connected.
    HTTP request sent, awaiting response... 302 Found
    Location: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/11857577/Prochlorococcus_31_genomes.tar.gz [following]
    --2018-12-18 14:14:38--  https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/11857577/Prochlorococcus_31_genomes.tar.gz
    Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... 52.218.36.170
    Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|52.218.36.170|:443... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 32657898 (31M) [binary/octet-stream]
    Saving to: ‘Prochlorococcus_31_genomes.tar.gz’
    
    Prochlorococcus_31_ 100%[===================>]  31.14M  51.8KB/s    in 13m 50s 
    
    2018-12-18 14:28:30 (38.4 KB/s) - ‘Prochlorococcus_31_genomes.tar.gz’ saved [32657898/32657898]
    

    2.数据解压

    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome$ tar -zxvf Prochlorococcus_31_genomes.tar.gz
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome$ cd Prochlorococcus_31_genomes/
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ ls
    AS9601.db                          MIT9311.db
    CCMP1375.db                        MIT9312.db
    EQPAC1.db                          MIT9313.db
    external-genomes.txt               MIT9314.db
    fix_functional_occurence_table.py  MIT9321.db
    GP2.db                             MIT9322.db
    layer-additional-data.txt          MIT9401.db
    LG.db                              MIT9515.db
    MED4.db                            NATL1A.db
    MIT9107.db                         NATL2A.db
    MIT9116.db                         PAC1.db
    MIT9123.db                         pan-state.json
    MIT9201.db                         PROCHLORO-functions-collection.txt
    MIT9202.db                         PROCHLORO-manual-default-state.json
    MIT9211.db                         SB.db
    MIT9215.db                         SS2.db
    MIT9301.db                         SS35.db
    MIT9302.db                         SS51.db
    

    2.泛基因组数据库的构建

    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-migrate-db *.db
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ ls
    AS9601.db                          MIT9311.db
    CCMP1375.db                        MIT9312.db
    EQPAC1.db                          MIT9313.db
    external-genomes.txt               MIT9314.db
    fix_functional_occurence_table.py  MIT9321.db
    GP2.db                             MIT9322.db
    layer-additional-data.txt          MIT9401.db
    LG.db                              MIT9515.db
    MED4.db                            NATL1A.db
    MIT9107.db                         NATL2A.db
    MIT9116.db                         PAC1.db
    MIT9123.db                         pan-state.json
    MIT9201.db                         PROCHLORO-functions-collection.txt
    MIT9202.db                         PROCHLORO-manual-default-state.json
    MIT9211.db                         SB.db
    MIT9215.db                         SS2.db
    MIT9301.db                         SS35.db
    MIT9302.db                         SS51.db
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcu(anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-gen-genomes-storage -e external-genomes.txt  -o PROCHLORO-GENOMES.db
    
    WARNING
    ===============================================
    Good news! Anvi'o found all these functions that are common to all of your
    genomes and will use them for downstream analyses and is very proud of you:
    'COG_CATEGORY, COG_FUNCTION'.
    
    Internal genomes .............................: 0 have been initialized.        
    External genomes .............................: 31 found.                       
    
    PLEASE READ CAREFULLY
    ===============================================
    Some of your genomes had gene calls identified by gene callers other than the
    gene caller anvi'o used, which was set to 'prodigal' either by default, or
    because you asked for it. The following genomes contained genes that were not
    processed (this may be exactly what you expect to happen, but if was not, you
    may need to use the `--gene-caller` flag to make sure anvi'o is using the gene
    caller it should be using): AS9601 (2 gene calls by "Ribosomal_RNAs"), CCMP1375
    (2 gene calls by "Ribosomal_RNAs"), EQPAC1 (2 gene calls by "Ribosomal_RNAs"),
    GP2 (2 gene calls by "Ribosomal_RNAs"), LG (2 gene calls by "Ribosomal_RNAs"),
    MED4 (2 gene calls by "Ribosomal_RNAs"), MIT9107 (2 gene calls by
    "Ribosomal_RNAs"), MIT9116 (2 gene calls by "Ribosomal_RNAs"), MIT9123 (2 gene
    calls by "Ribosomal_RNAs"), MIT9201 (2 gene calls by "Ribosomal_RNAs"), MIT9202
    (2 gene calls by "Ribosomal_RNAs"), MIT9211 (2 gene calls by "Ribosomal_RNAs"),
    MIT9215 (2 gene calls by "Ribosomal_RNAs"), MIT9301 (2 gene calls by
    "Ribosomal_RNAs"), MIT9302 (2 gene calls by "Ribosomal_RNAs"), MIT9303 (4 gene
    calls by "Ribosomal_RNAs"), MIT9311 (2 gene calls by "Ribosomal_RNAs"), MIT9312
    (2 gene calls by "Ribosomal_RNAs"), MIT9313 (4 gene calls by "Ribosomal_RNAs"),
    MIT9314 (2 gene calls by "Ribosomal_RNAs"), MIT9321 (2 gene calls by
    "Ribosomal_RNAs"), MIT9322 (2 gene calls by "Ribosomal_RNAs"), MIT9401 (2 gene
    calls by "Ribosomal_RNAs"), MIT9515 (2 gene calls by "Ribosomal_RNAs"), NATL1A
    (2 gene calls by "Ribosomal_RNAs"), NATL2A (2 gene calls by "Ribosomal_RNAs"),
    PAC1 (2 gene calls by "Ribosomal_RNAs"), SB (2 gene calls by "Ribosomal_RNAs"),
    SS2 (2 gene calls by "Ribosomal_RNAs"), SS35 (2 gene calls by "Ribosomal_RNAs"),
    SS51 (2 gene calls by "Ribosomal_RNAs").
    
                                                                                    
    * AS9601 is stored with 1,869 genes (0 of which were partial)
    * CCMP1375 is stored with 1,826 genes (0 of which were partial)                 
    * EQPAC1 is stored with 1,892 genes (6 of which were partial)                   
    * GP2 is stored with 1,825 genes (22 of which were partial)                     
    * LG is stored with 1,840 genes (24 of which were partial)                      
    * MED4 is stored with 1,891 genes (0 of which were partial)                     
    * MIT9107 is stored with 1,924 genes (20 of which were partial)                 
    * MIT9116 is stored with 1,914 genes (40 of which were partial)                 
    * MIT9123 is stored with 1,931 genes (31 of which were partial)                 
    * MIT9201 is stored with 1,907 genes (38 of which were partial)                 
    * MIT9202 is stored with 1,918 genes (0 of which were partial)                  
    * MIT9211 is stored with 1,740 genes (0 of which were partial)                  
    * MIT9215 is stored with 1,951 genes (0 of which were partial)                  
    * MIT9301 is stored with 1,846 genes (0 of which were partial)                  
    * MIT9302 is stored with 1,957 genes (25 of which were partial)                 
    * MIT9303 is stored with 2,715 genes (0 of which were partial)                  
    * MIT9311 is stored with 1,921 genes (32 of which were partial)                 
    * MIT9312 is stored with 1,900 genes (0 of which were partial)                  
    * MIT9313 is stored with 2,556 genes (0 of which were partial)                  
    * MIT9314 is stored with 1,924 genes (26 of which were partial)                 
    * MIT9321 is stored with 1,884 genes (16 of which were partial)                 
    * MIT9322 is stored with 1,881 genes (17 of which were partial)                 
    * MIT9401 is stored with 1,893 genes (25 of which were partial)                 
    * MIT9515 is stored with 1,871 genes (0 of which were partial)                  
    * NATL1A is stored with 2,030 genes (0 of which were partial)                   
    * NATL2A is stored with 1,991 genes (1 of which were partial)                   
    * PAC1 is stored with 2,059 genes (10 of which were partial)                    
    * SB is stored with 1,855 genes (8 of which were partial)                       
    * SS2 is stored with 1,844 genes (33 of which were partial)                     
    * SS35 is stored with 1,835 genes (17 of which were partial)                    
    * SS51 is stored with 1,833 genes (18 of which were partial)                    
    
    The new genomes storage ......................: PROCHLORO-GENOMES.db (v6, signature: hash0cde9439)
    Number of genomes ............................: 31 (internal: 0, external: 31)
    Number of gene calls .........................: 60,223
    Number of partial gene calls .................: 409
    

    1. 泛基因组分析
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-pan-genome -g PROCHLORO-GENOMES.db --project-name "Prochlorococcus_Pan" --output-dir PROCHLORO --num-threads 12 --minbit 0.5 --mcl-inflation 10 --use-ncbi-blast
    
    #给基因组添加相关信息
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-import-misc-data layer-additional-data.txt -p PROCHLORO/Prochlorococcus_Pan-PAN.db --target-data-table layers
    
    New data for 'layers' in data group 'default'
    ===============================================
    Data key "clade" .............................: Predicted type: str
    Data key "light" .............................: Predicted type: str
    
    
    NEW DATA
    ===============================================
    Database .....................................: pan
    Data group ...................................: default
    Data table ...................................: layers
    
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-import-state -p PROCHLORO/Prochlorococcus_Pan-PAN.db \
    >                   --state pan-state.json \
    >                   --name default
    (anvio) czh@czh-ubuntu:~/Desktop/add_disk/Anvio_work/31_pangenome/Prochlorococcus_31_genomes$ anvi-display-pan -g PROCHLORO-GENOMES.db                  -p PROCHLORO/Prochlorococcus_Pan-PAN.db
    Interactive mode .............................: pan                             
    Genomes storage .............................................: Initialized (storage hash: hash0cde9439)
    Num genomes in storage ......................................: 31
    Num genomes will be used ....................................: 31
    Pan DB ......................................................: Initialized: PROCHLORO/Prochlorococcus_Pan-PAN.db (v. 12)
    Gene cluster homogeneity estimates ..........................: Functional: [YES]; Geometric: [YES]
                                                                                    
    * Gene clusters are initialized for all 7383 gene clusters in the database.
    
                                                                                    
    * The server is now listening the port number "8080". When you are finished, press
    CTRL+C to terminate the server.
    


    后面还可以对其泛基因组功能进行分析,感兴趣的大家去anvio网站学习吧!

    相关文章

      网友评论

        本文标题:使用anvi'o 进行微生物pangenomics泛基因

        本文链接:https://www.haomeiwen.com/subject/curckqtx.html