美文网首页
snakemake学习笔记002:hisat2+samtools

snakemake学习笔记002:hisat2+samtools

作者: 小明的数据分析笔记本 | 来源:发表于2022-04-14 00:38 被阅读0次

    今天的内容增加了config文件

    input_folder:
        "/home/myan/scratch/private/practice_data/RNAseq/chrX_data/"
    output_folder:
        "/home/myan/scratch/private/practice_data/RNAseq/snakemake.rnaseq/"
    
    index_folder:
        "/mnt/shared/scratch/myan/private/practice_data/RNAseq/chrX_data/indexes/"
    
    index_name:
        "chrX_tran"
    
    samples_folder:
        "/mnt/shared/scratch/myan/private/practice_data/RNAseq/chrX_data/samples/"
    
    gtf_folder:
        "/mnt/shared/scratch/myan/private/practice_data/RNAseq/chrX_data/genes/chrX.gtf"
    

    config文件主要用来指定文件的存贮路径

    snakemake文件的内容

    configfile: "config.yaml"
    
    import os
    import glob
    print(config)
    print(config['input_folder'])
    print(config['output_folder'])
    
    SRR,FRR = glob_wildcards(config['samples_folder']+"{srr}_chrX_{frr}.fastq.gz")
    
    rule all:
        input:
            expand(config["output_folder"] + "output.sam/{srr}.sam",srr=SRR),
            expand(config['output_folder'] + "output.bam/{srr}.bam",srr=SRR),
            expand(config['output_folder'] + "output.gtf/{srr}.gtf",srr=SRR)
    
    rule hisat2:
        input:
            read01 = config['samples_folder'] + "{srr}_chrX_1.fastq.gz",
            read02 = config['samples_folder'] + "{srr}_chrX_2.fastq.gz",
    
        output:
            sam = config['output_folder'] + "output.sam/{srr}.sam"
        threads:
            8
        params:
            index_file = config['index_folder']+config['index_name']
        shell:
            """
            hisat2 -p {threads} --dta -x {params.index_file} -1 {input.read01} -2 {input.read02} -S {output.sam}
            """
    
    rule samtools:
        input:
            sam = rules.hisat2.output.sam
        output:
            bam = config['output_folder'] + "output.bam/{srr}.bam"
        threads:
            8
        shell:
            """
            samtools sort -@ {threads} -o {output.bam} {input.sam}
            """
    
    rule stringtie:
        input:
            gtf = config['gtf_folder'],
            bam = rules.samtools.output.bam
        output:
            gtf = config['output_folder'] + "output.gtf/{srr}.gtf"
        params:
            l = "{srr}"
        threads:
            8
        shell:
            """
            stringtie -p {threads} -G {input.gtf} -o {output.gtf} -l {params.l} {input.bam}
            """
    

    后面转录本定量的步骤还没有写完,好像还可以把差异表达分析的脚本嵌入进来

    未完待续

    示例数据用到的是论文

    Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie, and Ballgown

    中的数据

    相关文章

      网友评论

          本文标题:snakemake学习笔记002:hisat2+samtools

          本文链接:https://www.haomeiwen.com/subject/upwzsrtx.html