美文网首页
RNA-seq analysis

RNA-seq analysis

作者: zhoujj2013 | 来源:发表于2019-07-31 16:33 被阅读0次

Download dataset

Navigate to https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE102116

复制:

image.png

Paste to terminal:

cd yourdir
vim gsm.lst

右击粘贴:

image.png

保存退出:

:wq

运行命令生成SRR文件下载地址列表:

zhoujj 15:55:27 ~/project/06ChenProject/data_GSE102116
$perl /home/zhoujj/github/jjUtil/dl/get_srr_from_gsm.pl gsm.lst > srr.lst

查看生成的列表:

zhoujj 16:00:09 ~/project/06ChenProject/data_GSE102116
$cat srr.lst
GSM2724132      WT_rep1_Day0    SRR5886648
GSM2724133      WT_rep2_Day0    SRR5886652

下载SRR文件:

zhoujj 16:01:24 ~/project/06ChenProject/data_GSE102116
$cut -f 3 srr.lst | while read line; do echo prefetch $line;done > prefetch.sh;
zhoujj 16:01:33 ~/project/06ChenProject/data_GSE102116
$sh prefetch.sh

下载完毕,寻找下载的文件:

zhoujj 16:01:33 ~/project/06ChenProject/data_GSE102116
$ls ~/ncbi/public/sra/
zhoujj 16:01:33 ~/project/06ChenProject/data_GSE102116
$cut -f 3 srr.lst | while read line; do mv  ~/ncbi/public/sra/$line.sra .;done;

完成SRR文件下载。

解压SRA文件:

zhoujj 16:07:15 ~/project/06ChenProject/data_GSE102116
$ls
GSM2724134.html  gsm.lst  prefetch.sh  SRR6880514.sra  srr.lst  SRX3052556.html  work.sh
zhoujj 16:07:17 ~/project/06ChenProject/data_GSE102116
$fastq-dump --split-files ./SRR6880514.sra
Read 14772104 spots for ./SRR6880514.sra
Written 14772104 spots for ./SRR6880514.sra
$ls
GSM2724134.html  gsm.lst  prefetch.sh  SRR6880514_1.fastq  SRR6880514_2.fastq  SRR6880514.sra  srr.lst  SRX3052556.html  work.sh

SRR6880514_1.fastq is read1
SRR6880514_2.fastq is read2

Run RNA-seq (此处省略)

  1. 前期准备:
zhoujj 16:10:52 ~/project/06ChenProject/data_GSE102116
$mkdir rnaseq
zhoujj 16:11:18 ~/project/06ChenProject/data_GSE102116
$cd rnaseq/
zhoujj 16:12:32 ~/project/06ChenProject/data_GSE102116/rnaseq
$cp /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/config.txt .
zhoujj 16:12:58 ~/project/06ChenProject/data_GSE102116/rnaseq
$vim config.txt

check read len:

zhoujj 16:16:30 ~/project/06ChenProject/data_GSE102116/rnaseq
$head ../SRR6880514_2.fastq
@SRR6880514.1 R0209720:515:C7WURACXX:1:1101:2358:2187 length=51
TGGTGAATTTCTCTGATCTAGCATGATAAGTAGAAACATTAAACTGTGATA
+SRR6880514.1 R0209720:515:C7WURACXX:1:1101:2358:2187 length=51
@CCDFFFFHHHHHJJJJJJJJJJJJHIJJJJJJIJJJJJJJJJJJJIIGIG
@SRR6880514.2 R0209720:515:C7WURACXX:1:1101:3400:2240 length=51
TCTCCAGGGCATGTCAGAGATGTTTGCGGCAGCCCCTCCCATCACAGGCCT
+SRR6880514.2 R0209720:515:C7WURACXX:1:1101:3400:2240 length=51
C@CFFFFFFGHDHHIIIGAFCHEEDHI<FHHH1DDFEGFHI<FHIEIIFI<
@SRR6880514.3 R0209720:515:C7WURACXX:1:1101:4539:2113 length=51
TCTTTTTACTTAGGATTGTCTTGGCTATATGGCTCTTTTTTGGTTTCATAT

read len = 51

so check parameters in config.txt

OUTDIR  ./
SAMPLE  ./samples.lst

# parameter
READLEN 51 # check this parameters
MINLEN  32 # check this parameters, >= 32 
THREAD  24

#STANDTYPE      FR/FF/RF/RR
# pro
BIN     /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/
FASTQC  /home/zhoujj/software/FastQC/fastqc
STAR    /home/zhoujj/software/STAR/bin/Linux_x86_64_static/STAR
CUFFLINKS       /home/zhoujj/software/cufflinks-2.2.1.Linux_x86_64/cufflinks
SAMTOOLS        /usr/bin/samtools
HOMER   /home/zhoujj/software/homer/bin

# for STAR
GTF     /home/zhoujj/data/hg19/hg19/refGene.gtf
SPE     human
INDEX   /home/zhoujj/data/hg19/star_index
CHROMSIZE       /home/zhoujj/data/hg19/hg19.chrom.sizes
  1. Prepare samples.lst
    Find read files:
zhoujj 16:18:22 ~/project/06ChenProject/data_GSE102116/rnaseq
$ll /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq
-rw-rw-r-- 1 zhoujj zhoujj 3655968420 Jul 31 16:09 /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq
zhoujj 16:18:32 ~/project/06ChenProject/data_GSE102116/rnaseq
$ll /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq
-rw-rw-r-- 1 zhoujj zhoujj 3655968420 Jul 31 16:09 /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq

samples.lst

WT_rep1_Day0    /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq    /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq

Recheck files:

zhoujj 16:20:52 ~/project/06ChenProject/data_GSE102116/rnaseq
$ll
total 16
drwxrwxr-x 2 zhoujj zhoujj 4096 Jul 31 16:20 ./
drwxrwxr-x 3 zhoujj zhoujj 4096 Jul 31 16:16 ../
-rw-rw-r-- 1 zhoujj zhoujj  989 Jul 31 16:12 config.txt
-rw-rw-r-- 1 zhoujj zhoujj  151 Jul 31 16:20 samples.lst
  1. Create makefile and run RNA-seq pipeline

Create makefile:

zhoujj 16:20:54 ~/project/06ChenProject/data_GSE102116/rnaseq
$perl /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/rnaseq.pl config.txt
OUTDIR  ./
SAMPLE  ./samples.lst
READLEN 51
MINLEN  32
THREAD  24
BIN     /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/
FASTQC  /home/zhoujj/software/FastQC/fastqc
STAR    /home/zhoujj/software/STAR/bin/Linux_x86_64_static/STAR
CUFFLINKS       /home/zhoujj/software/cufflinks-2.2.1.Linux_x86_64/cufflinks
SAMTOOLS        /usr/bin/samtools
HOMER   /home/zhoujj/software/homer/bin
GTF     /home/zhoujj/data/hg19/hg19/refGene.gtf
SPE     human
INDEX   /home/zhoujj/data/hg19/star_index
CHROMSIZE       /home/zhoujj/data/hg19/hg19.chrom.sizes

Run RNA-seq pipeline:

zhoujj 16:23:58 ~/project/06ChenProject/data_GSE102116/rnaseq
$cut -f 1 samples.lst | while read line; do echo "cd $line && make && cd -";done > run.sh;
zhoujj 16:25:29 ~/project/06ChenProject/data_GSE102116/rnaseq
$sh run.sh

检查结果

Check statistics:

zhoujj 16:27:02 ~/project/06ChenProject/data_GSE102116/rnaseq
$cat samples.lst
WT_rep1_Day0    /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_1.fastq    /home/zhoujj/project/06ChenProject/data_GSE102116/SRR6880514_2.fastq
zhoujj 16:27:09 ~/project/06ChenProject/data_GSE102116/rnaseq
$perl /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/getMatrics.pl WT_rep1_Day0 > stat.txt

Combine expression profile from multiple samples:

zhoujj 16:27:09 ~/project/06ChenProject/data_GSE102116/rnaseq
$perl /home/zhoujj/gitee/ngskit/rnaseq_rmdup_star_cufflinks/bin/combine_cuff_expr.py WT_rep1_Day0/02quantification/genes.fpkm_tracking:WT_rep1_Day0 WT_rep1_Day3/02quantification/genes.fpkm_tracking:WT_rep1_Day3 > gene.expr

Finished.

相关文章

网友评论

      本文标题:RNA-seq analysis

      本文链接:https://www.haomeiwen.com/subject/qvdtdctx.html