美文网首页
2018-04-26 snakemake

2018-04-26 snakemake

作者: aldlhy | 来源:发表于2018-04-26 19:23 被阅读0次

    Snakemake is a tool to manage workflow system. It is written in Python. It use python style code to define rules which describes how to create output files from input files.

    Features

    *  Similar to GNU Make, you specify targets in terms of a pseudo-rule at the top.

    *  For each target and intermediate file, you create rules that define how they are created from input files.

    *  Snakemake determines the rule dependencies by matching file names.

    *  Input and output files can contain multiple named wildcards.

    *  Rules can either use shell commands, plain Python code or external Python or R scripts to create output files from input files.

    *  Snakemake workflows can be easily executed on **workstations**, **clusters**, **the grid**, and **in the cloud** without modification. The job scheduling can be constrained by arbitrary resources like e.g. available CPU cores, memory or GPUs.

    *  Snakemake can automatically deploy required software dependencies of a workflow using [Conda](https://conda.io/) or [Singularity](http://singularity.lbl.gov/).

    *  Snakemake can use Amazon S3, Google Storage, Dropbox, FTP, WebDAV, SFTP and iRODS to access input or output files and further access input files via HTTP and HTTPS.

    Workflow

    In Snakemake, workflows are specified as Snakefiles. Inspired by GNU Make, a Snakefile contains rules that denote how to create output files from input files. Dependencies between rules are handled implicitly, by matching filenames of input files against output files. Thereby wildcards can be used to write general rules.

    Components

    input files --- 

    output files ---

    rules --- describe how to create output files from input files

    Rules

    rule all

    rule my_rule

    Example

    SAMPLES = ['Sample1', 'Sample2']

    rule all:

        input:

            expand('{sample}.txt', sample=SAMPLES)

    rule quantify_genes:

        input:

            genome = 'genome.fa',

            r1 = 'fastq/{sample}.R1.fastq.gz',

            r2 = 'fastq/{sample}.R2.fastq.gz'    

        output:

            '{sample}.txt'    

         shell:

            'echo {input.genome} {input.r1} {input.r2} > {output}'

    Reference

    Snakemake — Snakemake 4.8.1+0.g7f3006d.dirty documentation

    Snakemake—a scalable bioinformatics workflow engine

    相关文章

      网友评论

          本文标题:2018-04-26 snakemake

          本文链接:https://www.haomeiwen.com/subject/nummlftx.html