美文网首页
TransDecoder、blast+、hmmer安装 & Pf

TransDecoder、blast+、hmmer安装 & Pf

作者: vicLeo | 来源:发表于2022-02-23 15:06 被阅读0次

    安装TransDecoder,blast+,hmmer

    在conda的python3环境下,下载transdecoder

    conda activate python3
    conda install -y -c bioconda/label/cf201901 transdecoder
    下载速度比较慢,下好后,如果直接运行TransDecoder.Predict,会显示command not found。
    
    在whereis transdecoder时,会提醒没有安装URI/Escape模块,使用Perl安装一下
    ###启动 Perl CPAN
    
    perl -MCPAN -e shell
    
    ###安装 Perl URI/Escape 模块:
    
    install URI::Escape
    
    ### 退出:
    q
    

    perl模块安装完成之后,再次输入TransDecoder.Predict,成功!

    (python3) [u20111230014@cpu10 ~]$ TransDecoder.Predict
    ########################################################################################
    #             ______                 ___                  __
    #            /_  __/______ ____ ___ / _ \___ _______  ___/ /__ ____
    #             / / / __/ _ `/ _\(_-</ // / -_) __/ _ \/ _  / -_) __/
    #            /_/ /_/ \_,_/_//_/___/____/\__/\__/\___/\_,_/\__/_/   .Predict
    #
    ########################################################################################
    #
    #  Transdecoder.LongOrfs|http://transdecoder.github.io> - Transcriptome Protein Prediction
    #
    #
    #  Required:
    #
    #   -t <string>                            transcripts.fasta
    #
    #  Common options:
    #
    #
    #   --retain_long_orfs_mode <string>        'dynamic' or 'strict' (default: dynamic)
    #                                        In dynamic mode, sets range according to 1%FDR in random sequence of same GC content.
    #
    # 
    #   --retain_long_orfs_length <int>         under 'strict' mode, retain all ORFs found that are equal or longer than these many nucleotides even if no other evidence 
    #                                         marks it as coding (default: 1000000) so essentially turned off by default.)
    #
    #   --retain_pfam_hits <string>            domain table output file from running hmmscan to search Pfam (see transdecoder.github.io for info)     
    #                                        Any ORF with a pfam domain hit will be retained in the final output.
    # 
    #   --retain_blastp_hits <string>          blastp output in '-outfmt 6' format.
    #                                        Any ORF with a blast match will be retained in the final output.
    #
    #   --single_best_only                     Retain only the single best orf per transcript (prioritized by homology then orf length)
    #
    #   --output_dir | -O  <string>            output directory from the TransDecoder.LongOrfs step (default: basename( -t val ) + ".transdecoder_dir")
    #
    #   -G <string>                            genetic code (default: universal; see PerlDoc; options: Euplotes, Tetrahymena, Candida, Acetabularia, ...)
    #
    #   --no_refine_starts                     start refinement identifies potential start codons for 5' partial ORFs using a PWM, process on by default.
    #
    ##  Advanced options
    #
    #    -T <int>                            Top longest ORFs to train Markov Model (hexamer stats) (default: 500)
    #                                        Note, 10x this value are first selected for removing redundancies,
    #                                        and then this -T value of longest ORFs are selected from the non-redundant set.
    #  Genetic Codes
    #
    #
    #   --genetic_code <string>                Universal (default)
    #
    #        Genetic Codes (derived from: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi)
    #
    #
    Acetabularia
    Candida
    Ciliate
    Dasycladacean
    Euplotid
    Hexamita
    Mesodinium
    Mitochondrial-Ascidian
    Mitochondrial-Chlorophycean
    Mitochondrial-Echinoderm
    Mitochondrial-Flatworm
    Mitochondrial-Invertebrates
    Mitochondrial-Protozoan
    Mitochondrial-Pterobranchia
    Mitochondrial-Scenedesmus_obliquus
    Mitochondrial-Thraustochytrium
    Mitochondrial-Trematode
    Mitochondrial-Vertebrates
    Mitochondrial-Yeast
    Pachysolen_tannophilus
    Peritrich
    SR1_Gracilibacteria
    Tetrahymena
    Universal
    #
    #  --version                           show version (5.5.0)
    #
    #########################################################################################
    
    ##安装blast+
    conda install -y blast
    启动blastp -h
    (python3) [u20111230014@cpu10 ~]$ blastp -h
    USAGE
      blastp [-h] [-help] [-import_search_strategy filename]
        [-export_search_strategy filename] [-task task_name] [-db database_name]
        [-dbsize num_letters] [-gilist filename] [-seqidlist filename]
        [-negative_gilist filename] [-negative_seqidlist filename]
        [-taxids taxids] [-negative_taxids taxids] [-taxidlist filename]
        [-negative_taxidlist filename] [-ipglist filename]
        [-negative_ipglist filename] [-entrez_query entrez_query]
        [-db_soft_mask filtering_algorithm] [-db_hard_mask filtering_algorithm]
        [-subject subject_input_file] [-subject_loc range] [-query input_file]
        [-out output_file] [-evalue evalue] [-word_size int_value]
        [-gapopen open_penalty] [-gapextend extend_penalty]
        [-qcov_hsp_perc float_value] [-max_hsps int_value]
        [-xdrop_ungap float_value] [-xdrop_gap float_value]
        [-xdrop_gap_final float_value] [-searchsp int_value] [-seg SEG_options]
        [-soft_masking soft_masking] [-matrix matrix_name]
        [-threshold float_value] [-culling_limit int_value]
        [-best_hit_overhang float_value] [-best_hit_score_edge float_value]
        [-subject_besthit] [-window_size int_value] [-lcase_masking]
        [-query_loc range] [-parse_deflines] [-outfmt format] [-show_gis]
        [-num_descriptions int_value] [-num_alignments int_value]
        [-line_length line_length] [-html] [-sorthits sort_hits]
        [-sorthsps sort_hsps] [-max_target_seqs num_sequences]
        [-num_threads int_value] [-mt_mode int_value] [-ungapped] [-remote]
        [-comp_based_stats compo] [-use_sw_tback] [-version]
    
    DESCRIPTION
       Protein-Protein BLAST 2.12.0+
    
    Use '-help' to print detailed descriptions of command line arguments
    (python3) [u20111230014@cpu10 ~]$ whereis blastp -h
    blastp: /home/u20111230014/miniconda3/envs/python3/bin/blastp /opt/app/anaconda3/bin/blastp
    
    Usage:
     whereis [options] file
    
    Options:
     -b         search only for binaries
     -B <dirs>  define binaries lookup path
     -m         search only for manuals
     -M <dirs>  define man lookup path
     -s         search only for sources
     -S <dirs>  define sources lookup path
     -f         terminate <dirs> argument list
     -u         search for unusual entries
     -l         output effective lookup paths
    
    For more details see whereis(1).
    
    ##安装hmmer
    conda install -y hmmer
    启动hmmbuild --help
    (python3) [u20111230014@cpu10 ~]$ hmmer -h
    -bash: hmmer: command not found
    (python3) [u20111230014@cpu10 ~]$ hmmbuild --help
    Failed to parse command line:
    No such option "--help".
    Usage: hmmbuild [-options] <hmmfile_out> <msafile>
    
    where basic options are:
      -h     : show brief help on version and usage
      -n <s> : name the HMM <s>
      -o <f> : direct summary output to file <f>, not stdout
      -O <f> : resave annotated, possibly modified MSA to file <f>
    
    To see more help on other available options, do:
      hmmbuild -h
    

    Pfam search

    Pfam 数据库中每个编号代表一个蛋白质家族。Pfam 分 A 和 B 两个数据库,其中 A 数据库是经过手工校正的高质量数据库, B 数据库虽然质量低些,依然可以用来寻找蛋白质家族的保守位点。

    下载 PFAM 数据库(最新版本为35,这里使用版本33.1)
    
    ftp://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam33.1/Pfam-A.hmm.gz 
    
    解压缩 gunzip Pfam-A.hmm.gz
    
    得到 PFAM 数据库的 HMM 文件。HMM 文件是文本文件,需要将其变成二进制格式,以加快运算速度,同时进行压缩
    # 建立索引数据库
    hmmpress Pfam-A.hmm
    [u20111230014@workstation Pfam-A]$ ll
    total 3036508
    -rw-r--r-- 1 u20111230014 u20111230014 1459135873 Feb 23 12:17 Pfam-A.hmm
    -rw-rw-r-- 1 u20111230014 u20111230014  334380860 Feb 23 17:09 Pfam-A.hmm.h3f
    -rw-rw-r-- 1 u20111230014 u20111230014    1259976 Feb 23 17:09 Pfam-A.hmm.h3i
    -rw-rw-r-- 1 u20111230014 u20111230014  604042224 Feb 23 17:09 Pfam-A.hmm.h3m
    -rw-rw-r-- 1 u20111230014 u20111230014  710553501 Feb 23 17:09 Pfam-A.hmm.h3p
    lrwxrwxrwx 1 u20111230014 u20111230014         69 Feb 23 20:24 uniprot_sprot_index.fasta -> /home/u20111230014/workspace/genome/uniprot/uniprot_sprot_index.fasta
    

    相关文章

      网友评论

          本文标题:TransDecoder、blast+、hmmer安装 & Pf

          本文链接:https://www.haomeiwen.com/subject/ggkilrtx.html