USEARCH的下载
https://drive5.com/cgi-bin/upload3.py?license=2020032323051700172
下载后放入环境变量
rdp下载
https://drive5.com/usearch/manual/sintax_downloads.html
解压序列文件
gzip -d *.gz
命名为
ren *_1.fastq *_R1.fq
ren *_2.fastq *_R2.fq
批量拼接双端序列
ls *_R1.fq|while read id;
do
usearch11.0.667_win32.exe -fastq_mergepairs $id -relabel @ -fastq_maxdiffs 10 -fastq_pctid 80 -fastqout ${id%%_*}.fq;
done
maxdiffs:最大不匹配数
fastq_pctid:最小对齐百分比
过滤,去除错误碱基
ls *.fq|while read id;
do
usearch11.0.667_win32.exe -fastq_filter $id -fastq_maxee 1.0 -fastaout ${id%%.*}.fa
done
maxee:最大预期错误
合并文件
cat *.fa > sample.fa
查找唯一序列(去除复制)添加大小注释
usearch11.0.667_win32.exe -fastx_uniques sample.fa -fastaout uniques.fa --sizeout --relabel Uniq
输出uniques.fa
转化为OTU,做表
usearch11.0.667_win32.exe --cluster_otus uniques.fa -otus otus.fa -relabel Otu
usearch11.0.667_win32.exe -otutab sample.fa -otus otus.fa -otutabout otubab.txt
上面第二步耗时较长,
#00:00 5.6Mb 100.0% Reading otus.fa
#00:00 5.5Mb 100.0% Masking (fastnucleo)
#00:00 6.4Mb 100.0% Word stats
#00:00 6.4Mb 100.0% Alloc rows
#00:00 6.7Mb 100.0% Build index
#01:23 48Mb 100.0% Searching, 66.4% matched
#123497 / 185971 mapped to OTUs (66.4%)
#01:23 48Mb Writing otubab.txt
#01:23 48Mb Writing otubab.txt ...done.
物种预测
rdp_16s_v16.fa转换格式
usearch11.0.667_win32.exe -makeudb_usearch rdp_16s_v16.fa -output rdp_16s.udb
采用sintax算法,阈值设置为0.8
usearch11.0.667_win32.exe -sintax otus.fa -db rdp_16s.udb -tabbedout otu.sintax -strand both -sintax_cutoff 0.8
usearch11.0.667_win32.exe -calc_distmx otus.fa -tabbedout mx.txt -maxdist 0.2 -termdist 0.3
usearch11.0.667_win32.exe -cluster_aggd mx.txt -treeout clusters.tree -clusterout clusters.txt -id 0.80 -linkage min
网友评论