1 软件安装
https://www.jianshu.com/p/eb89ab4af035
linux平台下需要安装的软件:fastqc,fastp,hisat2,samtools,htseq
2下载基因组序列和基因组注释文件
黑曲霉N402基因组:
Ensembl Fungi
或者NCBI:
Aspergillus niger (ID 429) - Genome - NCBI (nih.gov)
wget -c https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/248/155/GCA_900248155.1_Aniger_ATCC_64974_N402/GCA_900248155.1_Aniger_ATCC_64974_N402_genomic.fna.gz
构建索引文件
hisat2-build -p 3 GCA_900248155.1_Aniger_ATCC_64974_N402_genomic.fna genome
下载基因组注释文件
wget -c https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/900/248/155/GCA_900248155.1_Aniger_ATCC_64974_N402/GCA_900248155.1_Aniger_ATCC_64974_N402_genomic.gff.gz
过滤raw reads
mkdir -p fastp
ls *.fastq.gz|while read id;
do
fastp -5 20 -3 20 -i $id -o ${id%%.*}.clean.fq.gz \
-h ./fastp/${id%%.*}.html -j ./fastp/${id%%.*}.json;
done
比对
ls *clean.fq.gz|while read id;
do
hisat2 -t -p 3 -x /media/lzx/0000678400004823/Indexs/Hisat2/Aspergillus_niger/Aspergillus_niger \
-U $id \
2>${id%%.*}.hisat2.log \
|samtools sort -@ 3 -o ${id%%.*}_ht2p.bam
done
计数
mkdir -p htseq
ls *.bam |while read id;
do
htseq-count -f bam -s no -t gene -i Dbxref $id /media/lzx/0000678400004823/Gtf_gff/Aspergillus_niger/GCF_000002855.3_ASM285v2_genomic.gff \
1>./htseq/${id%_*}.txt 2>./htseq/${id%_*}.HTseq.log
done
ID转换文件下载:
Aspergillus niger (ID 429) - Genome - NCBI (nih.gov)
网友评论