美文网首页snakemake与RNAseq
windows平台通过虚拟机安装ubuntu

windows平台通过虚拟机安装ubuntu

作者: dming1024 | 来源:发表于2019-09-22 11:13 被阅读0次
    1. 下载ubuntu桌面版本
      https://ubuntu.com/download
    2. VMware for windows 下载
      https://my.vmware.com/web/vmware/details?downloadGroup=WKST-1510-WIN&productId=799&rPId=33369
      官方需要注册,所以我在这个完整上找了个不是最新版本的进行下载:
      https://www.cr173.com/soft/68480.html
    3. 安装步骤主要参考以下这篇教程
      如何在Windows环境下安装Linux系统虚拟机
      捕获.PNG
      安装过程

    登录界面

    安装git

    sudo apt-get install git  
    

    安装miniconda

    wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash /Miniconda3-latest-Linux-x86_64.sh
    #添加conda channels
    conda config --add channels bioconda
    conda config --add channels conda-forge
    #显示安装的channels
    conda config --get channels
    

    创建新的conda环境变量

    #利用git clone命令将之前的snakemake pipline克隆到本地服务器
    git clone https://github.com/fanqiang1024/RNA-seq_snakemake.git ./
    Cloning into '.'...
    remote: Enumerating objects: 46, done.
    remote: Counting objects: 100% (46/46), done.
    remote: Compressing objects: 100% (36/36), done.
    Unpacking objects:  60% (28/46)
    
    remote: Total 46 (delta 17), reused 29 (delta 8), pack-reused 0
    Unpacking objects: 100% (46/46), done.
    
    ls
    config.yaml  dag.svg  merge.R  pairEndSnakemake  
    README.md  snakefileR  snakemake.yaml  v3.11.0.tar.bz2
    

    剩下的可以根据之前的教程进行snakemake分析环境的创建
    conda环境变量设置与转移

    利用snakemake构建分析流程
    参考文章:
    Writing a RNA-Seq workflow with snakemake

    1.下载数据

    cd samples/raw
    for i in control_R1 control_R2 treated_R1 treated_R2;
    do 
    wget --no-clobber http://pedagogix-tagc.univ-mrs.fr/courses/data/ngs/abd/${i}.fq.gz 
    2>/dev/null;
    done
    

    2.构建19号染色体的参考基因组索引

    mkdir -p projects/indexes/bowtie2/mm10 # Create a directory to store the index
    cd projects/indexes/bowtie2/mm10       # go to projects/indexes/bowtie2/mm10
    # Download chr19 sequence (mm10 version)
    wget --no-clobber http://hgdownload.soe.ucsc.edu/goldenPath/mm10/chromosomes/chr19.fa.gz 2> chr19.fa_wget.log
    gunzip chr19.fa.gz # uncompress the file
    # to get help about bowtie-build type:
    # bowtie2-build -h
    # To uncomment
    bowtie2-build chr19.fa chr19 &> chr19.fa_bowtie2-build.log
    

    3.下载gtf文件

    cd projects
    mkdir -p annotations/gtf
    cd annotations/gtf
    wget ftp://ftp.ensembl.org/pub/release-83/gtf/mus_musculus/Mus_musculus.GRCm38.83.chr.gtf.gz | gunzip -c | grep "^19" | sed 's/^19/chr19/' > GRCm38.83.chr19.gtf
    

    4.在 UCSC web site 下载染色体信息
    Clade: Mammal.
    Genome: Mouse.
    Assembly: mm10.
    Group: All Tables.
    Table: chromInfo.
    Select all fields from selected table for output. Use the following string “chromInfo_mm10.txt” as output file name.
    Click on get output
    Create a directory projects/annotations/txt and place the downloaded file in that directory.

    5. 创建Snakefile文件

    BASE_DIR = "/Users/projects"
    INDEX = BASE_DIR + "/indexes/bowtie2/mm10/chr19"
    GTF   = BASE_DIR + "/annotations/gtf/GRCm38.83.chr19.gtf"
    CHR   = BASE_DIR + "/annotations/txt/chromInfo_mm10.txt"
    FASTA = BASE_DIR + "/indexes/bowtie2/mm10/chr19.fa"
    
    #录入样本信息
    SAMPLES, = glob_wildcards("samples/raw/{smp}_R1.fq.gz")
    NB_SAMPLES = len(SAMPLES)
    
    for smp in SAMPLES:
      message("Sample " + smp + " will be processed")
    
    #第一个rule,对rawdata进行trimming
    rule final: 
      input: expand("samples/trimmed/{smp}_R1_t.fq", smp=SAMPLES)
    
    rule trimming:
      input:  fwd="samples/raw/{smp}_R1.fq.gz", rev="samples/raw/{smp}_R2.fq.gz"
      output: fwd="samples/trimmed/{smp}_R1_t.fq", 
              rev="samples/trimmed/{smp}_R2_t.fq", 
              single="samples/trimmed/{smp}_R1_singletons.fq"
      message: """--- Trimming."""
      shell: """
            sickle pe -f {input.fwd} -r {input.rev}  -l 25 -q 20 -t sanger  -o {output.fwd} -p {output.rev} -s {output.single} &> {input.fwd}.log                                                                               
      """
    

    6. 增加质控rule

    rule final: 
      input: expand("samples/fastqc/{smp}/{smp}_R1_t.fq_fastqc.zip", smp=SAMPLES)
    
    rule trimming:
      input:  fwd="samples/raw/{smp}_R1.fq.gz", rev="samples/raw/{smp}_R2.fq.gz"
      output: fwd="samples/trimmed/{smp}_R1_t.fq", 
              rev="samples/trimmed/{smp}_R2_t.fq", 
              single="samples/trimmed/{smp}_R1_singletons.fq"
      message: """--- Trimming reads."""
      shell: """                                                                                                                                                                                                                                                                                                                                                            
            sickle pe -f {input.fwd} -r {input.rev}  -l 40 -q 20 -t sanger  -o {output.fwd} -p {output.rev} -s {output.single} &> {input.fwd}.log                                                                               
      """
    
    rule fastqc:
            input:  fwd="samples/trimmed/{smp}_R1_t.fq", 
                    rev="samples/trimmed/{smp}_R2_t.fq"
            output: fwd="samples/fastqc/{smp}/{smp}_R1_t.fq_fastqc.zip", rev="samples/fastqc/{smp}/{smp}_R2_t.fq_fastqc.zip"
            message: """--- Quality check of raw data with Fastqc."""
            shell: """                                                                                                                                                                                        
                  fastqc --outdir  samples/fastqc/{wildcards.smp} --extract  -f fastq {input.fwd} {input.rev}                                                                                                                                                                                                                                                                                    
         """
    

    7. 对数据进行mapping

    rule tophat:
            input:fwd="samples/trimmed/{smp}_R1_t.fq", 
                  rev="samples/trimmed/{smp}_R2_t.fq"
            params: gtf=GTF, index=INDEX
            output: "samples/bam/{smp}.bam"
            shell: """
                    mkdir -p samples/tophat/{wildcards.sample}
                    tophat2                                         \
                            -o samples/tophat/{wildcards.sample}    \
                            -g 1                                    \
                            --library-type fr-firststrand           \
                            -G {params.gtf}                          \
                            -x 1                                    \
                            -p 5                                    \
                            {params.index}                             \
                            {input.fwd} {input.rev} &> samples/tophat/{wildcards.sample}/run_tophat.log
                    cd samples/tophat/{wildcards.sample}
                    mv accepted_hits.bam ../../bam/{wildcards.smp}.bam
                    """
    

    8.寻找novel转录本

    rule cufflinks:
            input: bam="samples/bam/{smp}.bam"
            output: gtf="samples/cufflinks/{smp}/transcripts.gtf"
            params:  gtf=GTF
            message: "--- Searching novel transcript with cufflinks."
            shell: """
              cufflinks -g {params.gtf} -p 5 --library-type fr-firststrand  -o samples/cufflinks/{wildcards.smp}  {input.bam} &> {output}.log
            """
    

    9. 将novel转录本与已知的进行比对
    10. 合并novel和已知的转录本和基因名
    11. 利用featureCounts进行定量
    也可以使用htseq-count进行定量

    rule quantification_with_featureCounts:
            input: novel="samples/new_annotation/all_transcripts.gtf", bam=expand("samples/bam/{smp}.bam", smp=SAMPLES)
            output: "results/counts/gene_counts.txt",  "results/counts/gene_counts_mini.txt"
            shell: """
            featureCounts -p -s 2 -T 15 -t exon -g gene_id -a {input.novel} -o {output[0]} {input.bam} &> {output[0]}.log
            cut -f 1,7- {output[0]}| awk 'NR > 1' | awk '{{gsub("samples/bam/","",$0); print}}'  > {output[1]}
            """
    

    12. R语言作图展示

    rule diagnostic_plot:
            input: "results/counts/gene_counts_mini.txt"
            output: "results/diagnostic_plot/diagnostic.pdf"
            run: R("""
                dir.create("results/diagnostic_plot")
                data <- read.table("{input}", 
                                    sep="\t", 
                                    header=T, 
                                    row.names=1)
                data <- data[rowSums(data) > 0, ]
                data <- log2(data + 1)
                pdf("{output}")
                dev.null <- apply(data, 2, hist, border="white", col="blue")
                boxplot(data, color="blue", pch=16)
                pairs(data, pch=".", col="blue")
                dev.off()
                cat("etc...")
          
            """)
    

    相关文章

      网友评论

        本文标题:windows平台通过虚拟机安装ubuntu

        本文链接:https://www.haomeiwen.com/subject/ljyayctx.html