falcon-1

作者: tobebettergirl | 来源:发表于2019-06-29 17:07 被阅读3次

    https://pb-falcon.readthedocs.io/en/latest/

    The current PacBio Assembly suite documentation which includes new bioconda instructions for installing FALCON, FALCON_unzip and their associated dependencies can be found here pb_assembly

    https://github.com/PacificBiosciences/pb-assembly

    input_fofn: list of paths to the input fasta files is specified
    
    input_type: raw or preads
    # the pipeline will skip the entire 0-rawreads pre-assembly phase.
    
    # large genomes
    pa_DBsplit_option=-x500 -s200
    ovlp_DBsplit_option=-x500 -s200
    
    # small genomes (<10Mb)
    pa_DBsplit_option = -x500 -s50
    ovlp_DBsplit_option = -x500 -s50
    #-x: flag filters reads smaller than what's specified
    #-s: flag controls the size of DB blocks
    
    pa_HPCTANmask_option
    #additional arguments for tandem repeat masking that will be passed to HPC.TANmask
    pa_REPmask_code
    #The second phase of masking deals with interspersed repeats and can be run in up to 3 iterations specified with the pa_REPmask_code option. The parameters needed for each iteration are both the group size and coverage specified as group,coverage pairs separated by semicolons as seen above.
    
    genome_size=200000
    
    seed_coverage=30
    # 20-40x seed coverage.
    
    length_cutoff=-1
    
    pa_daligner_option=-h70 -e.75 -l1000 -s100 -k18
    # -e: average correlation rate (average sequence identity),0.70 (low quality data) - 0.80 (high quality data). A higher value will help prevent haplotype collapse.
    # -l: minimum length of overlap,1000 (shorter library) - 5000 (longer library)
    # -k: kmer size,14 (low quality data) - 18 (high quality data),Lower values of -k have higher sensitivity at the tradeoff of increased diskspace, memory consumption and slower run time and tend to work best with lower quality data. In contrast, a larger kmer value for -k has a higher specificity, uses less system resources and runs faster, but will only be suitable for high quality data.
    
    falcon_sense_option=--output-multi --min-idt 0.70 --min-cov 4 --max-n-read 200
    
    # --output-multi flag is necessary for generating proper fasta headers and should not be removed unless your specific use case requires it. 
    # The parameters --min-idt, --min-cov and --max-n-read set the minimum alignment identity, minimum coverage necessary and max number of reads, respectively, for calling consensus to make the preads.
    
    pa_HPCdaligner_option=-v -B24 -M16
    # the -v parameter is passed to the LAsort and LAmerge programs while -B and -M parameters are passed to the daligner sub-commands.
    
    [job.defaults]
    job_type=sge
    #the job_type. Allowed values are sge, pbs, torque, slurm, lsf and local.
    pwatcher_type=blocking
    #pwatcher_type: blocking or fs_based
    #fs_based : the default and relies on the pipeline polling the file system periodically to determine whether a sentinel file has appeared that would signal the pipeline to continue
    #blocking : The other option is to use a blocking process watcher which can help with systems that have issues with filesystem latency
    
    JOB_QUEUE = default
    MB = 32768
    NPROC = 6
    njobs = 32
    submit = qsub -S /bin/bash -sync y -V  \
      -q ${JOB_QUEUE}     \
      -N ${JOB_NAME}      \
      -o "${JOB_STDOUT}"  \
      -e "${JOB_STDERR}"  \
      -pe smp ${NPROC}    \
      -l h_vmem=${MB}M    \
      "${JOB_SCRIPT}"
    
    [job.step.da]
    NPROC=4
    #NPROC number of processors per job
    
    MB=49152
    # MB memory allocated per job
    
    njobs=240
    #number of concurrently running jobs njobs
    
    image.png

    参数的网址: https://github.com/PacificBiosciences/pb-assembly

    相关文章

      网友评论

        本文标题:falcon-1

        本文链接:https://www.haomeiwen.com/subject/uhupcctx.html