搜索GSEXXX
利用prefetch下载数据
- prefetch安装与使用
prefetch -h # 可以显示帮助文档就说明安装成功
# 如果要下载数据比如SRR文件,直接加ID号,指定输出目录就好
prefetch SRRxxxxxxx -O PATH
- aspera安装
wget http://download.asperasoft.com/download/sw/connect/3.7.4/aspera-connect-3.7.4.147727-linux-64.tar.gz
tar zxvf aspera-connect-3.7.4.147727-linux-64.tar.gz
#安装
bash aspera-connect-3.7.4.147727-linux-64.sh
# 然后cd到根目录下看看是不是存在了.aspera文件夹,有的话表示安装成功
cd && ls -a
# 将aspera软件加入环境变量,并激活
echo 'export PATH=~/.aspera/connect/bin:$PATH' >> ~/.bashrc
source ~/.bashrc
# 最后检查ascp是不是能用了
ascp --help
3.数据下载
wkd=/home/project/single-cell/MCC
cd $wkd/raw
# for patient 2586-4
cat >SRR_Acc_List-2586-4.txt
SRR7722937
SRR7722938
SRR7722939
SRR7722940
SRR7722941
SRR7722942
cat SRR_Acc_List-2586-4.txt |while read i
do prefetch $i -O `pwd` && echo "** ${i}.sra done **"
done
其中Acssesion list 可在GEO-SRA中下载
如果作者将数据上传在EBI中
详见https://www.jianshu.com/p/9040b7573380
理解测序原始数据的几个参数:
I1:library barcode(sample index)文件大小最小
used to multiple samples on one sequencing lane(8bp)
R1:cell barcode
used to identify the cell the read come from (16bp) +
to identify reads that arise during PCR replication
R2:sequencing reads 文件大小最大
to identify the gene a read came from(91 - 98bp)
sra文件转为fastq
time fastq-dump --gzip --split-3 -A $i ${i}.sra && echo "** ${i}.sra to fastq done **"
cat命令
结束用法:ctrl + D
网友评论