由于序列长度大于512Mb的染色体会造成bwa和bedtools的报错,所以需要拆分。
测序中unanchored scaffolds的需要在整合成一条染色体chrUn,来进行后续的WGC分析故需要合并在一起。
这是相关命令:
###拆分染色体,首选选取节点
$ m10 chr5_2parts.bed
chr5LG3 0 419514300
chr5LG3 419514300 579269071
$ m10 chaifen.sh
#/usr/bin/bash
bedtools getfasta -fi chr5.fa -bed chr5_2parts.bed -fo chr5_2parts.fasta
less chr5_2parts.fasta |seqkit grep -f chrlist >chr5_part1.fa
less chr5_2parts.fasta |seqkit grep -f chrlist >chr5_part2.fa
###合并scaffolds to chr。同样需要.bed
$ m10 chrun2part.bed
scaffold03481 0 68434
scaffold03482 0 131982
scaffold03483 0 131980
scaffold03484 0 66466
scaffold03485 0 36393
scaffold03486 0 133068
scaffold03487 0 61210
scaffold03488 0 69082
scaffold03489 0 131627
scaffold03490 0 131625
$ m10 hebing.sh
#/usr/bin/bash
bedtools getfasta -fi Pisum_sativum_v1a.fa -bed chrun1part.bed -fo chrun1.fasta
grep "^>" -v chrun1.fasta | awk '{ ORS = "NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN"; $1 = $1; print $0}' > chrUn1.fasta
fold -w 80 chrUn1.fasta > chrUnplaced1.fasta
sed -i '1 i\>chrUn1' chrUnplaced1.fasta
##换行的问题需要追加个N
echo “N” >>chrUnplaced1.fasta
网友评论