qiime2R包整合qiime2和R可视化分析16s数据

作者: 无言_俗人 | 来源:发表于2022-03-06 20:06 被阅读0次

    背景:qiime artifact 是用于存储qiime2的输入输出以及相关的元数据,并提供结果是如何产生的信息,但是qiime2所产生的artifacts(如.qza,虽然其是一个压缩文件)不能直接作为R的直接输入文件,而是要经过一系列的转化成R可接受的文件,所以qiime2R这个包被用来简化从qiime2 artifacts到R中输入文件的步骤,并且尽可能的保留artifacts中的信息,主要通过read_qza函数实现。

    原理: The artifact is unpacked in to a temporary directory and the raw data and associated metadata are read into a named list (see below). Data are typically returned as either a data.frame, phylo object (trees), or DNAStringSets (nucleic acid sequences).

    2.qiime2R包的下载

    github中下载

    if (!requireNamespace("devtools", quietly = TRUE)){install.packages("devtools")}
    devtools::install_github("jbisanz/qiime2R")
    

    3.读取artifacts(.qza)

    依靠read_qza函数实现read_qza(.qza), 例如

    SVs<-read_qza("table.qza")
    names(SVs)
    [1] "uuid"       "type"       "format"     "contents"   "version"   
    [6] "data"       "provenance"
    
    SVs$data[1:5,1:5] #show first 5 samples and first 5 taxa
    #                                 L1S105 L1S140 L1S208 L1S257 L1S281
    #4b5eeb300368260019c1fbc7a3c718fc   2183      0      0      0      0
    #fe30ff0f71a38a39cf1717ec2be3a2fc      5      0      0      0      0
    #d29fe3c70564fc0f69f2c03e0d1e5561      0      0      0      0      0
    #868528ca947bc57b69ffdf83e6b73bae      0   2249   2117   1191   1737
    #154709e160e8cada6bfb21115acc80f5    802   1174    694    406    242
    

    data: the raw data ex OTU table as matrix or tree in phylo format
    uuid: the unique identifer of the artifact
    type :the semantic type of the object (ex FeatureData[Sequence])
    format: the format of the qiime artifact
    provenance: information tracking how the object was created
    contents: a table of all the files contained within the artifact and their file size
    version: the reported version for the artifact, a warning error may be thrown if a new version is seen

    4. 读取metadata

    read_q2metadata()函数

    metadata<-read_q2metadata("sample-metadata.tsv")
    head(metadata) # show top lines of metadata
    #  SampleID barcode-sequence body-site year month day   subject reported-antibiotic-usage days-since-experiment-start
    #2     L1S8     AGCTGACTAGTC       gut 2008    10  28 subject-1                       Yes                           0
    #3    L1S57     ACACACTATGGC       gut 2009     1  20 subject-1                        No                          84
    #4    L1S76     ACTACGTGTGGT       gut 2009     2  17 subject-1                        No                         112
    #5   L1S105     AGTGCGATGCGT       gut 2009     3  17 subject-1                        No                         140
    #6   L2S155     ACGATGCGACCA left palm 2009     1  20 subject-1                        No                          84
    #7   L2S175     AGCTATCCACGA left palm 2009     2  17 subject-1                        No                         112
    

    5.读取taxonomy

    当read_qza读入taxonomy时,返回的是feature id 和未拆分的物种注释以及置信分数,而后续分析需要拆分物种注释到具体的界门纲目科属种,parse_taxonomy()可以实现上述要求。

    taxonomy<-read_qza("taxonomy.qza")
    head(taxonomy$data)
    #                        Feature.ID                                                                                                                            Taxon Confidence
    #1 4b5eeb300368260019c1fbc7a3c718fc                          k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s__  0.9972511
    #2 fe30ff0f71a38a39cf1717ec2be3a2fc                           k__Bacteria; p__Proteobacteria; c__Betaproteobacteria; o__Neisseriales; f__Neisseriaceae; g__Neisseria  0.9799427
    #3 d29fe3c70564fc0f69f2c03e0d1e5561                                k__Bacteria; p__Firmicutes; c__Bacilli; o__Lactobacillales; f__Streptococcaceae; g__Streptococcus  1.0000000
    #4 868528ca947bc57b69ffdf83e6b73bae                          k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides; s__  0.9955859
    #5 154709e160e8cada6bfb21115acc80f5                               k__Bacteria; p__Bacteroidetes; c__Bacteroidia; o__Bacteroidales; f__Bacteroidaceae; g__Bacteroides  1.0000000
    #6 1d2e5f3444ca750c85302ceee2473331 k__Bacteria; p__Proteobacteria; c__Gammaproteobacteria; o__Pasteurellales; f__Pasteurellaceae; g__Haemophilus; s__parainfluenzae  0.9455365
    taxonomy<-parse_taxonomy(taxonomy$data)
    head(taxonomy)
    #                                  Kingdom         Phylum               Class           Order           Family         Genus        Species
    #4b5eeb300368260019c1fbc7a3c718fc Bacteria  Bacteroidetes         Bacteroidia   Bacteroidales   Bacteroidaceae   Bacteroides           <NA>
    #fe30ff0f71a38a39cf1717ec2be3a2fc Bacteria Proteobacteria  Betaproteobacteria    Neisseriales    Neisseriaceae     Neisseria           <NA>
    #d29fe3c70564fc0f69f2c03e0d1e5561 Bacteria     Firmicutes             Bacilli Lactobacillales Streptococcaceae Streptococcus           <NA>
    #868528ca947bc57b69ffdf83e6b73bae Bacteria  Bacteroidetes         Bacteroidia   Bacteroidales   Bacteroidaceae   Bacteroides           <NA>
    #154709e160e8cada6bfb21115acc80f5 Bacteria  Bacteroidetes         Bacteroidia   Bacteroidales   Bacteroidaceae   Bacteroides           <NA>
    #1d2e5f3444ca750c85302ceee2473331 Bacteria Proteobacteria Gammaproteobacteria  Pasteurellales  Pasteurellaceae   Haemophilus parainfluenzae
    

    6.创建phyloseq对象

    qza_to_phyloseq()函数可以连接多个read_qza()创建一个phyloseq对象用于后续分析

    physeq<-qza_to_phyloseq(
        features="inst/artifacts/2020.2_moving-pictures/table.qza",
        tree="inst/artifacts/2020.2_moving-pictures/rooted-tree.qza",
        taxonomy="inst/artifacts/2020.2_moving-pictures/taxonomy.qza",
        metadata = "inst/artifacts/2020.2_moving-pictures/sample-metadata.tsv"
        )
    physeq
    ## phyloseq-class experiment-level object
    ## otu_table()   OTU Table:         [ 759 taxa and 34 samples ]
    ## sample_data() Sample Data:       [ 34 samples by 10 sample variables ]
    ## tax_table()   Taxonomy Table:    [ 759 taxa by 7 taxonomic ranks ]
    ## phy_tree()    Phylogenetic Tree: [ 759 tips and 757 internal nodes ]
    

    7.其他函数

    • read_qza() - Function for reading artifacts (.qza).
    • qza_to_phyloseq() - Imports multiple artifacts to produce a phyloseq object.
    • read_q2metadata() - Reads qiime2 metadata file (containing q2-types definition line,metadata文件中第二行必须要定义哪些列是字符、那些列是数值)
    • write_q2manifest() - Writes a read manifest file to import data into qiime2
    • theme_q2r() - A ggplot2 theme for for clean figures.
    • print_provenance() - A function to display provenance information.展示数据产生的步骤
    • is_q2metadata() - A function to check if a file is a qiime2 metadata file.
    • parse_taxonomy() - A function to parse taxonomy strings and return a table where each column is a taxonomic class.
    • parse_ordination() - A function to parse the internal ordination format.
    • read_q2biom() - A function for reading QIIME2 biom files in format v2.1
    • make_clr() - Transform feature table using centered log2 ratio.
    • make_proportion() - Transform feature table to proportion (sum to 1).
    • make_percent() - Transform feature to percent (sum to 100).
    • interactive_table() - Create an interactive table in Rstudio viewer or rmarkdown html.
    • summarize_taxa()- Create a list of tables with abundances sumed to each taxonomic level.
    • taxa_barplot() - Create a stacked barplot using ggplot2.
    • taxa_heatmap() - Create a heatmap of taxonomic abundances using gplot2.
    • corner() - Show top corner of a large table-like obejct.
    • min_nonzero() - Find the smallest non-zero, non-NA in a numeric vector.
    • mean_sd() - Return mean and standard deviation for plotting.
    • subsample_table() - Subsample a table with or without replacement.
    • filter_features() - Remove low abundance features by number of counts and number of samples they appear in.

    参考资料

    qime2R

    相关文章

      网友评论

        本文标题:qiime2R包整合qiime2和R可视化分析16s数据

        本文链接:https://www.haomeiwen.com/subject/kwxwrrtx.html