

作者: Balloon_vine | 来源:发表于2019-07-10 12:38 被阅读0次

Xialab 培训环境运行



  1. 进入自己的环境,这里以zzh的为例
  2. 查看当前conda可利用的环境
  3. 尝试进入“training”环境,进入环境时报错
  4. 退出,并重新登录
  5. 尝试运行环境中已安装包的命令,检测环境配置
  6. 切换回自己的conda环境


数据存在 /home/tmp/data里面,直接复制或者软连接到自己环境中,之后培训数据都更新在该目录
(base) [zaohai_zeng@localhost ~]$ ln -s /home/tmp/data/
(base) [zaohai_zeng@localhost ~]$ cd data/
(base) [zaohai_zeng@localhost data]$ ll -htr
total 298M
-rw-r--r--. 1 root xialab  62M Jul 10 17:55 embryophyta_odb9.tar.gz
-rw-r--r--. 1 root xialab 102M Jul 10 17:55 eudicotyledons_odb10.tar.gz
-rw-r--r--. 1 root xialab  13M Jul 10 17:55 eukaryota_odb9.tar.gz
-r-xr-xr-x. 1 root xialab 117M Jul 10 18:00 Arabidopsis_thaliana.genome.fa
-r-xr-xr-x. 1 root xialab 4.8M Jul 10 18:00 Arabidopsis_thaliana.genome.gff3



1. 进入自己的环境,这里以zzh的为例

选择 3 进入zzh用户home目录

--    Welcome to Terminal Menu    --
[1] > Start New Selection.(UTF-8 character)
[2] > Start New Selection.(GBK character)

History sessions:
[3] <  zaohai_zeng  SSH  UTF-8

[q] < Quit.

Choice: 3

Prepare to login to the target device, Please wait a second.

Last login: Wed Jul 10 12:05:41 2019 from

2. 查看当前conda可利用的环境


(base) [zaohai_zeng@localhost ~]$ /opt/anaconda/bin/conda info --envs
# conda environments:
                      *  /home/zaohai_zeng/miniconda2
base                     /opt/anaconda
flye_test                /opt/anaconda/envs/flye_test
training                 /opt/anaconda/envs/training

3. 尝试进入“training”环境,进入环境时报错


(base) [zaohai_zeng@localhost ~]$ /opt/anaconda/bin/conda activate training

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
To initialize your shell, run

    $ conda init <SHELL_NAME>

Currently supported shells are:
  - bash
  - fish
  - tcsh
  - xonsh
  - zsh
  - powershell

See 'conda init --help' for more information and options.

IMPORTANT: You may need to close and restart your shell after running 'conda init'.

按照提示运行 /opt/anaconda/bin/conda init命令,将conda写入.bashrc文件

(base) [zaohai_zeng@localhost ~]$ /opt/anaconda/bin/conda init
no change     /opt/anaconda/condabin/conda
no change     /opt/anaconda/bin/conda
no change     /opt/anaconda/bin/conda-env
no change     /opt/anaconda/bin/activate
no change     /opt/anaconda/bin/deactivate
no change     /opt/anaconda/etc/profile.d/conda.sh
no change     /opt/anaconda/etc/fish/conf.d/conda.fish
no change     /opt/anaconda/shell/condabin/Conda.psm1
no change     /opt/anaconda/shell/condabin/conda-hook.ps1
no change     /opt/anaconda/lib/python3.7/site-packages/xontrib/conda.xsh
no change     /opt/anaconda/etc/profile.d/conda.csh
modified      /home/zaohai_zeng/.bashrc

==> For changes to take effect, close and re-open your current shell. <==

4. 退出,并重新登录

(base) [zaohai_zeng@localhost ~]$ exit

--    Welcome to Terminal Menu    --
[1] > Start New Selection.(UTF-8 character)
[2] > Start New Selection.(GBK character)

History sessions:
[3] <  zaohai_zeng  SSH  UTF-8

[q] < Quit.

Choice: 3

Prepare to login to the target device, Please wait a second.

Last login: Wed Jul 10 12:09:19 2019 from
(base) [zaohai_zeng@localhost ~]$

重新进入环境后,执行命令conda activate training,如果 [user] (这里的user是zaohai_zeng@localhost) 前圆括号中的 base —> training,说明你已成功进入环境

(base) [zaohai_zeng@localhost ~]$ which conda
(base) [zaohai_zeng@localhost ~]$ conda info --envs
# conda environments:
base                  *  /opt/anaconda
flye_test                /opt/anaconda/envs/flye_test
training                 /opt/anaconda/envs/training

(base) [zaohai_zeng@localhost ~]$ conda activate training
(training) [zaohai_zeng@localhost ~]$

5. 尝试运行环境中已安装包的命令,如果能出现如下提示那么恭喜你,你的环境配置好了


(training) [zaohai_zeng@localhost ~]$ augustus
AUGUSTUS (3.3.2) is a gene prediction tool
written by M. Stanke, O. Keller, S. König, L. Gerischer and L. Romoth.

augustus [parameters] --species=SPECIES queryfilename

'queryfilename' is the filename (including relative path) to the file containing the query sequence(s)
in fasta format.

SPECIES is an identifier for the species. Use --species=help to see a list.

--strand=both, --strand=forward or --strand=backward
--genemodel=partial, --genemodel=intronless, --genemodel=complete, --genemodel=atleastone or --genemodel=exactlyone
  partial      : allow prediction of incomplete genes at the sequence boundaries (default)
  intronless   : only predict single-exon genes like in prokaryotes and some eukaryotes
  complete     : only predict complete genes
  atleastone   : predict at least one complete gene
  exactlyone   : predict exactly one complete gene
  predict genes independently on each strand, allow overlapping genes on opposite strands
  This option is turned off by default.
  When this option is used the prediction considering hints (extrinsic information) is turned on.
  hintsfilename contains the hints in gff format.
  path to config directory (if not specified as environment variable)
  report alternative transcripts when they are suggested by hints
  report alternative transcripts generated through probabilistic sampling
  For a description of these parameters see section 4 of README.TXT.
  When this option is used the prediction will consider the protein profile provided as parameter.
  The protein profile extension is described in section 7 of README.TXT.
  show a progressmeter
  output in gff3 format
--predictionStart=A, --predictionEnd=B
  A and B define the range of the sequence for which predictions should be found.
  predict the untranslated regions in addition to the coding sequence. This currently works only for a subset of species.
  Do not report transcripts with in-frame stop codons. Otherwise, intron-spanning stop codons could occur. Default: false
  If true and input is in genbank format, no prediction is made. Useful for getting the annotated protein sequences.
  If true, output gene identifyers like this: seqname.gN

For a complete list of parameters, type "augustus --paramlist".
An exhaustive description can be found in the file README.TXT.


(training) [zaohai_zeng@localhost ~]$ braker.pl

braker.pl     Pipeline for predicting genes with GeneMark-ET and AUGUSTUS with


braker.pl [OPTIONS] --genome=genome.fa --bam=rnaseq.bam


--genome=genome.fa                  fasta file with DNA sequences
--bam=rnaseq.bam                    bam file with spliced alignments from
--hints=hints.gff                   Alternatively to calling braker.pl with a
                                    bam file, it is possible to call it with a
                                    file that contains introns extracted from
                                    RNA-Seq (or other data) in gff format.
                                    This flag also allows the usage of hints
                                    from additional extrinsic sources for gene
                                    prediction with AUGUSTUS. To consider such
                                    additional extrinsic information, you need
                                    to use the flag --extrinsicCfgFiles to
                                    specify parameters for all sources in the
                                    hints file (including the source "E" for
                                    intron hints from RNA-Seq).
--prot_seq=prot.fa                  A protein sequence file in multiple fasta
                                    format. This file will be used to generate
                                    protein hints for AUGUSTUS by running one
                                    of the three alignment tools Exonerate
                                    (--prg=exonerate), Spaln (--prg=spaln) or
                                    GenomeThreader (--prg=gth). Default is
                                    GenomeThreader if the tool is not
                                    specified.  Currently, hints from protein
                                    sequences are only used in the prediction
                                    step with AUGUSTUS.
--prot_aln=prot.aln                 Alignment file generated from aligning
                                    protein sequences against the genome with
                                    either Exonerate (--prg=exonerate), or
                                    Spaln (--prg=spaln), or GenomeThreader
                                    To prepare alignment file, run Spaln2 with
                                    the following command:
                                    spaln -O0 ... > spalnfile
                                    To prepare alignment file, run Exonerate
                                    with the following command:
                                    exonerate --model protein2genome \
                                        --showtargetgff T ... > exfile
                                    To prepare alignment file, run
                                    GenomeThreader with the following command:
                                    gth -genomic genome.fa  -protein \
                                        protein.fa -gff3out \
                                        -skipalignmentout ... -o gthfile
                                    A valid option prg=... must be specified
                                    in combination with --prot_aln. Generating
                                    tool will not be guessed.
                                    Currently, hints from protein alignment
                                    files are only used in the prediction step
                                    with AUGUSTUS.
--AUGUSTUS_ab_initio                output ab initio predictions by AUGUSTUS
                                    in addition to predictions with hints by


--species=sname                     Species name. Existing species will not be
                                    overwritten. Uses Sp_1 etc., if no species
                                    is assigned
--softmasking                       Softmasking option for soft masked genome
                                    files. (Disabled by default.)
--esmode                            Run GeneMark-ES (genome sequence only) and
                                    train AUGUSTUS on long genes predicted by
                                    GeneMark-ES. Final predictions are ab initio
--epmode                            Run GeneMark-EP with intron hints provided
                                    from protein data. This mode is not
                                    comptabile with using the aligners
                                    GenomeThreader, Exonerate and Spaln within
                                    braker.pl because etpmode and epmode require
                                    a large database of proteins and such
                                    mapping should be done outside of braker.pl
                                    e.g. on a cluster.
--etpmode                           Run GeneMark-ETP with hints provided from
                                    proteins and RNA-Seq data. This mode is not
                                    compatible with using the aligners
                                    GenomeThreader, Exonerate and Spaln within
                                    braker.pl because etpmode and epmode require
                                    a large database of proteins and such
                                    mapping should be done outside of braker.pl
                                    e.g. on a cluster.
--gff3                              Output in GFF3 format (default is gtf
--cores                             Specifies the maximum number of cores that
                                    can be used during computation. Be aware:
                                    optimize_augustus.pl will use max. 8
                                    cores; augustus will use max. nContigs in
                                    --genome=file cores.
--workingdir=/path/to/wd/           Set path to working directory. In the
                                    working directory results and temporary
                                    files are stored
--nice                              Execute all system calls within braker.pl
                                    and its submodules with bash "nice"
                                    (default nice value)

--alternatives-from-evidence=true   Output alternative transcripts based on
                                    explicit evidence from hints (default is
--crf                               Execute CRF training for AUGUSTUS;
                                    resulting parameters are only kept for
                                    final predictions if they show higher
                                    accuracy than HMM parameters.
--keepCrf                           keep CRF parameters even if they are not
                                    better than HMM parameters
--UTR=on                            create UTR training examples from RNA-Seq
                                    coverage data; requires options
                                    --bam=rnaseq.bam and --softmasking.
                                    Alternatively, if UTR parameters already
                                    exist, training step will be skipped and
                                    those pre-existing parameters are used.
--prg=gth|exonerate|spaln           Alignment tool gth (GenomeThreader),
                                    exonerate (Exonerate) or Spaln2
                                    (spaln) that will be used to generate
                                    protein alignments that will be the
                                    basis for hints generation for gene
                                    prediction with AUGUSTUS (if specified
                                    in combination with --prot_seq) or that
                                    was used to externally generate an
                                    alignment file with the commands listed in
                                    description of --prot_aln (if used in
                                    combination with --prot_aln).
--gth2traingenes                    Generate training gene structures for
                                    AUGUSTUS from GenomeThreader alignments.
                                    (These genes can either be used for
                                    training AUGUSTUS alone with
                                    --trainFromGth; or in addition to
                                    GeneMark-ET training genes if also a
                                    bam-file is supplied.)
--trainFromGth                      No GeneMark-Training, train AUGUSTUS from
                                    GenomeThreader alignments
--version                           Print version number of braker.pl
--help                              Print this help message


--AUGUSTUS_CONFIG_PATH=/path/       Set path to config directory of AUGUSTUS
                                    (if not specified as environment
                                    variable). BRAKER1 will assume that the
                                    directories ../bin and ../scripts of
                                    AUGUSTUS are located relative to the
                                    AUGUSTUS_CONFIG_PATH. If this is not the
                                    case, please specify AUGUSTUS_BIN_PATH
                                    (and AUGUSTUS_SCRIPTS_PATH if required).
                                    The braker.pl commandline argument
                                    --AUGUSTUS_CONFIG_PATH has higher priority
                                    than the environment variable with the
                                    same name.
--AUGUSTUS_BIN_PATH=/path/          Set path to the AUGUSTUS directory that
                                    contains binaries, i.e. augustus and
                                    etraining. This variable must only be set
                                    if AUGUSTUS_CONFIG_PATH does not have
                                    ../bin and ../scripts of AUGUSTUS relative
                                     to its location i.e. for global AUGUSTUS
                                    installations. BRAKER1 will assume that
                                    the directory ../scripts of AUGUSTUS is
                                    located relative to the AUGUSTUS_BIN_PATH.
                                    If this is not the case, please specify
--AUGUSTUS_SCRIPTS_PATH=/path/      Set path to AUGUSTUS directory that
                                    contains scripts, i.e. splitMfasta.pl.
                                    This variable most only be set if
                                    AUGUSTUS_CONFIG_PATH or AUGUSTUS_BIN_PATH
                                    do not contains the ../scripts directory
                                    of AUGUSTUS relative to their location,
                                    i.e. for special cases of a global
                                    AUGUSTUS installation.
--BAMTOOLS_PATH=/path/to/           Set path to bamtools (if not specified as
                                    environment BAMTOOLS_PATH variable). Has
                                    higher priority than the environment
--GENEMARK_PATH=/path/to/           Set path to GeneMark-ET (if not specified
                                    as environment GENEMARK_PATH variable).
                                    Has higher priority than environment
--SAMTOOLS_PATH=/path/to/           Optionally set path to samtools (if not
                                    specified as environment SAMTOOLS_PATH
                                    variable) to fix BAM files automatically,
                                    if necessary. Has higher priority than
                                    environment variable.
--ALIGNMENT_TOOL_PATH=/path/to/tool Set path to alignment tool
                                    (GenomeThreader, Spaln, or Exonerate) if
                                    not specified as environment
                                    ALIGNMENT_TOOL_PATH variable. Has higher
                                    priority than environment variable.
--BLAST_PATH=/path/to/blastall      Set path to NCBI blastall and formatdb
                                    executables if not specified as
                                    environment variable. Has higher priority
                                    than environment variable.
--PYTHON3_PATH=/path/to             Set path to python3 executable (if not
                                    specified as envirnonment variable and if
                                    executable is not in your $PATH).


--augustus_args="--some_arg=bla"    One or several command line arguments to
                                    be passed to AUGUSTUS, if several
                                    arguments are given, separated by
                                    whitespace, i.e.
                                    "--first_arg=sth --second_arg=sth".
--overwrite                         Overwrite existing files (except for
                                    species parameter files)
--skipGeneMark-ES                   Skip GeneMark-ES and use provided
                                    GeneMark-ES output (e.g. provided with
--skipGeneMark-ET                   Skip GeneMark-ET and use provided
                                    GeneMark-ET output (e.g. provided with
--skipGeneMark-EP                   Skip GeneMark-EP and use provided
                                    GeneMark-EP output (e.g. provided with
--skipGeneMark-ETP                  Skip GeneMark-ETP and use provided
                                    GeneMark-ETP output (e.g. provided with
--geneMarkGtf=file.gtf              If skipGeneMark-ET is used, braker will by
                                    default look in the working directory in
                                    folder GeneMarkET for an already existing
                                    gtf file. Instead, you may provide such a
                                    file from another location. If geneMarkGtf
                                    option is set, skipGeneMark-ES/ET/EP/ETP is
                                    automatically also set.
--rounds                            The number of optimization rounds used in
                                    optimize_augustus.pl (default 5)
--skipAllTraining                   Skip GeneMark-EX (training and
                                    prediction), skip AUGUSTUS training, only
                                    runs AUGUSTUS with pre-trained and already
                                    existing parameters (not recommended).
                                    Hints from input are still generated.
                                    This option automatically sets
                                    --useexisting to true.
--useexisting                       Use the present config and parameter files
                                    if they exist for 'species'
--filterOutShort                    It may happen that a "good" training gene,
                                    i.e. one that has intron support from
                                    RNA-Seq in all introns predicted by
                                    GeneMark, is in fact too short. This flag
                                    will discard such genes that have
                                    supported introns and a neighboring
                                    RNA-Seq supported intron upstream of the
                                    start codon within the range of the
                                    maximum CDS size of that gene and with a
                                    multiplicity that is at least as high as
                                    20% of the average intron multiplicity of
                                    that gene.
--skipOptimize                      Skip optimize parameter step (not
--skipGetAnnoFromFasta              Skip calling the python3 script
                                    getAnnoFastaFromJoingenes.py from the
                                    AUGUSTUS tool suite. This script requires
                                    python3, biopython and re (regular
                                    expressions) to be installed. It produces
                                    coding sequence and protein FASTA files
                                    from AUGUSTUS gene predictions and provides
                                    information about genes with in-frame stop
                                    codons. If you enable this flag, these files
                                    will not be produced and python3 and
                                    the required modules will not be necessary
                                    for running braker.pl.
--fungus                            GeneMark-ET option: run algorithm with
                                    branch point model (most useful for fungal
--rnaseq2utr_args=params            Expert option: pass alternative parameters
                                    to rnaseq2utr as string, default parameters:
                                    -r 76 -v 100 -n 15 -i 0.7 -m 0.3 -w 70
                                    -c 100 -p 0.5
--eval=reference.gtf                Reference set to evaluate predictions
                                    against (using the eval package)
--AUGUSTUS_hints_preds=s            File with AUGUSTUS hints predictions; will
                                    use this file as basis for UTR training;
                                    only UTR training and prediction is
                                    performed if this option is given.
--flanking_DNA=n                    Size of flanking region, must only be
                                    specified if --AUGUSTUS_hints_preds is given
                                    (for UTR training in a separate braker.pl
                                    run that builds on top of an existing run)
--verbosity=n                       0 -> run braker.pl quiet (no log)
                                    1 -> only log warnings
                                    2 -> also log configuration
                                    3 -> log all major steps
                                    4 -> very verbose, log also small steps
--downsampling_lambda=d             The distribution of introns in training
                                    gene structures generated by GeneMark-EX
                                    has a huge weight on single-exon and
                                    few-exon genes. Specifying the lambda
                                    parameter of a poisson distribution will
                                    make braker call a script for downsampling
                                    of training gene structures according to
                                    their number of introns distribution, i.e.
                                    genes with none or few exons will be
                                    downsampled, genes with many exons will be
                                    kept. Default value is 2.
                                    If you want to avoid downsampling, you have
                                    to specify 0.


--splice_sites=patterns             list of splice site patterns for UTR
                                    prediction; default: GTAG, extend like this:
--extrinsicCfgFiles=file1,file2,... Depending on the mode in which braker.pl
                                    is executed, it may require one ore several
                                    extrinsicCfgFiles. Don't use this option
                                    unless you know what you are doing!
--stranded=+,-,+,-,...              If UTRs are trained, i.e.~strand-specific
                                    bam-files are supplied and coverage
                                    information is extracted for gene prediction,
                                    create stranded ep hints. The order of
                                    strand specifications must correspond to the
                                    order of bam files. Possible values are
                                    +, -, .
                                    If stranded data is provided, ONLY coverage
                                    data from the stranded data is used to
                                    generate UTR examples! Coverage data from
                                    unstranded data is used in the prediction
                                    step, only.
                                    The stranded label is applied to coverage
                                    data, only. Intron hints are generated
                                    from all libraries treated as "unstranded"
                                    (because splice site filtering eliminates
                                    intron hints from the wrong strand, anyway).
--optCfgFile=ppx.cfg                Optional custom config file for AUGUSTUS
                                    for running PPX (currently not



braker.pl [OPTIONS] --genome=genome.fa --species=speciesname \
braker.pl [OPTIONS] --genome=genome.fa --species=speciesname \

To run with protein data from remote species and GeneMark-EP:

braker.pl [OPTIONS] --genome=genome.fa --hints=proteinintrons.gff --epmode=1

To run with protein data from a very closely related species:

braker.pl [OPTIONS] --genome=genome.fa --prot_seq=proteins.fa --prg=gth \
    --gth2traingenes --trainFromGth

6. 如何切换回自己的conda环境

① 暂时切换到user的conda环境,退出shell重登切换会失效


(base) [zaohai_zeng@localhost ~]$ which conda
(base) [zaohai_zeng@localhost ~]$ conda info --envs
# conda environments:
base                  *  /opt/anaconda
busco                    /opt/anaconda/envs/busco
flye_test                /opt/anaconda/envs/flye_test
training                 /opt/anaconda/envs/training

找到当前用户 conda 环境的 conda 命令所在位置(一般在/path/to/miniconda2/bin下面),如本user的:

(base) [zaohai_zeng@localhost ~]$ ls /home/zaohai_zeng/miniconda2/bin/conda

查看 当前用户自己的conda环境有哪些?

(base) [zaohai_zeng@localhost ~]$ /home/zaohai_zeng/miniconda2/bin/conda info --envs
# conda environments:
base                     /home/zaohai_zeng/miniconda2
py2                      /home/zaohai_zeng/miniconda2/envs/py2

然后执行命令source /home/zaohai_zeng/miniconda2/bin/activate py2切换进 py2 环境:

(base) [zaohai_zeng@localhost ~]$ source /home/zaohai_zeng/miniconda2/bin/activate py2
(py2) [zaohai_zeng@localhost ~]$ which python
(py2) [zaohai_zeng@localhost ~]$ python --version
Python 2.7.13 :: Continuum Analytics, Inc.
② 永久删除/opt/anaconda的环境


vim .bashrc


# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/opt/anaconda/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
    if [ -f "/opt/anaconda/etc/profile.d/conda.sh" ]; then
        . "/opt/anaconda/etc/profile.d/conda.sh"
        export PATH="/opt/anaconda/bin:$PATH"
unset __conda_setup
# <<< conda initialize <<<



退出重登 shell即可永久回到user自己的conda环境中。



