上一次讲述了如何利用snakemake进行RNA-seq分析流程的构建:
RNAseq分析流程:从测序Rawdata到Count输出,但是只针对单端测序数据。这次我又改进了一点,对双端测序结果也适用。
cat pairEndsnakemake
configfile: "pairedEnd.yaml"
rule all:
input:
expand("results/{sample}.count",sample=config["samples"])
rule bwa:
input:
fw=lambda wildcards: expand("samples/{sample}_R1.fq.gz", sample=wildcards.sample),
re=lambda wildcards: expand("samples/{sample}_R2.fq.gz", sample=wildcards.sample)
params:INDEX=config["annotation"]["reference"]
output:
"results/{sample}.sam"
shell:
"bowtie2 -x {params.INDEX} -1 {input.fw} -2 {input.re} -S {output}"
rule sam2bam:
input:
"results/{sample}.sam"
output:
"results/{sample}.bam"
shell:
"samtools view -Sb {input} > {output}"
rule samSort:
input:
"results/{sample}.bam"
output:
"sorted/{sample}_sort.bam"
shell:
"samtools sort {input} -o {output}"
rule bamIndex:
input:
"sorted/{sample}_sort.bam"
output:
"results/{sample}.bam.bai"
shell:
"samtools index {input} {output}"
rule bamCount:
input:
sample="sorted/{sample}_sort.bam",
annotation=config["annotation"]["gtf"]
output:
"results/{sample}.count"
shell:
"htseq-count -f bam -r name -s no {input.sample} {input.annotation} > {output}"
有需要代码指导的可以联系我,我也会持续更新,先暂时讲到这里。
网友评论