美文网首页
gget: 一款强大的基因组参考数据库的高效查询工具

gget: 一款强大的基因组参考数据库的高效查询工具

作者: chSNP | 来源:发表于2023-01-05 12:36 被阅读0次

    开源 Python 和命令行程序 gget 可以高效、轻松地以编程方式访问存储在各种大型公共基因组参考数据库中的信息。 gget 与可获取用户生成的测序数据的现有工具一起使用 ,以取代在基因组数据分析过程中效率低下、可能容易出错的手动网络查询。虽然 gget 模块的灵感来自于繁琐的单细胞 RNA-seq 数据分析任务),但我们预计它们可用于广泛的生物信息学任务。


    gget文章

    可以通过运行“pip install gget”从命令行安装 gget。下图描述了每个 gget 工具的一个用例和相应的输出。每个 gget 工具都有一个详尽的手册,可作为 Python 环境中的函数文档或在命令行中使用帮助标志 [-h] 作为标准输出。

    gget_overview

    gget工具地址

    gget地址:https://pachterlab.github.io/gget/
    gget 示例存储库:https://github.com/pachterlab/gget_examples

    gget安装

    pip install --upgrade gget
    

    或者

    conda install -c bioconda gget
    

    在 Jupyter Lab / Google Colab中调用

    import gget
    

    gget模块

    gget快速入门

    命令行

    # Fetch all Homo sapiens reference and annotation FTPs from the latest Ensembl release
    $ gget ref homo_sapiens
    
    # Get Ensembl IDs of human genes with "ace2" or "angiotensin converting enzyme 2" in their name/description
    $ gget search -s homo_sapiens 'ace2' 'angiotensin converting enzyme 2'
    
    # Look up gene ENSG00000130234 (ACE2) and its transcript ENST00000252519
    $ gget info ENSG00000130234 ENST00000252519
    
    # Fetch the amino acid sequence of the canonical transcript of gene ENSG00000130234
    $ gget seq --translate ENSG00000130234
    
    # Quickly find the genomic location of (the start of) that amino acid sequence
    $ gget blat MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS
    
    # BLAST (the start of) that amino acid sequence
    $ gget blast MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS
    
    # Align nucleotide or amino acid sequences stored in a FASTA file
    $ gget muscle path/to/file.fa
    
    # Use Enrichr for an ontology analysis of a list of genes
    $ gget enrichr -db ontology ACE2 AGT AGTR1 ACE AGTRAP AGTR2 ACE3P
    
    # Get the human tissue expression of gene ACE2
    $ gget archs4 -w tissue ACE2
    
    # Get the protein structure (in PDB format) of ACE2 as stored in the Protein Data Bank (PDB ID returned by gget info)
    $ gget pdb 1R42 -o 1R42.pdb
    
    # Predict the protein structure of GFP from its amino acid sequence
    $ gget setup alphafold # setup only needs to be run once
    $ gget alphafold MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK
    

    Python (Jupyter Lab / Google Colab):

    import gget
    gget.ref("homo_sapiens")
    gget.search(["ace2", "angiotensin converting enzyme 2"], "homo_sapiens")
    gget.info(["ENSG00000130234", "ENST00000252519"])
    gget.seq("ENSG00000130234", translate=True)
    gget.blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
    gget.blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
    gget.muscle("path/to/file.fa")
    gget.enrichr(["ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"], database="ontology", plot=True)
    gget.archs4("ACE2", which="tissue")
    gget.pdb("1R42", save=True)
    
    gget.setup("alphafold") # setup only needs to be run once
    gget.alphafold("MSKGEELFTGVVPILVELDGDVNGHKFSVSGEGEGDATYGKLTLKFICTTGKLPVPWPTLVTTFSYGVQCFSRYPDHMKQHDFFKSAMPEGYVQERTIFFKDDGNYKTRAEVKFEGDTLVNRIELKGIDFKEDGNILGHKLEYNYNSHNVYIMADKQKNGIKVNFKIRHNIEDGSVQLADHYQQNTPIGDGPVLLPDNHYLSTQSALSKDPNEKRDHMVLLEFVTAAGITHGMDELYK")
    

    Call gget from R using reticulate:

    system("pip install gget")
    install.packages("reticulate")
    library(reticulate)
    gget <- import("gget")
    
    gget$ref("homo_sapiens")
    gget$search(list("ace2", "angiotensin converting enzyme 2"), "homo_sapiens")
    gget$info(list("ENSG00000130234", "ENST00000252519"))
    gget$seq("ENSG00000130234", translate=TRUE)
    gget$blat("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
    gget$blast("MSSSSWLLLSLVAVTAAQSTIEEQAKTFLDKFNHEAEDLFYQSSLAS")
    gget$muscle("path/to/file.fa", out="path/to/out.afa")
    gget$enrichr(list("ACE2", "AGT", "AGTR1", "ACE", "AGTRAP", "AGTR2", "ACE3P"), database="ontology")
    gget$archs4("ACE2", which="tissue")
    gget$pdb("1R42", save=TRUE)
    

    相关文章

      网友评论

          本文标题:gget: 一款强大的基因组参考数据库的高效查询工具

          本文链接:https://www.haomeiwen.com/subject/zkqjcdtx.html