# SNP下载:
ftp://ftp-mouse.sanger.ac.uk/current_snps/strain_specific_vcfs/
# vcf格式:
ref:http://www.internationalgenome.org/wiki/Analysis/vcf4.0
# 过滤:
ref:https://biopet.github.io/vcffilter/0.2/index.html 【含下载】
java -jar /home/pc/biosoft/vcffilter-assembly-0.1.jar --help
也可过滤,但是没用。
# 过滤命令:
ref:https://github.com/vcflib/vcflib
/home/pc/biosoft/Vcflib/vcflib/bin/vcffilter -f "FILTER = PASS" BALB_cJ.mgp.v5.snps.dbSNP142.vcf > BALB.vcf
占比:
过滤前 过滤后约passed:4576884/5203549=0.87956969
# 软件vcftools:
ref:https://sourceforge.net/projects/vcftools/
ref2:http://vcftools.sourceforge.net/index.html
# 命令:
cat ref.fa | vcf-consensus file.vcf.gz > out.fa
# 参考基因组:
ftp-mouse.sanger.ac.uk/ref/GRCm38_68.fa
下载后查看:
VCF文件 sanger的fasta文件 UCSC的mm10 gencode也相同 ensembl不同 sanger UCSC相同 ensembl不同结论:使用UCSC的mm10参考基因组进行构建OK!
步骤:
1.下载vcf的tbi文件:
axel -n 10 ftp://ftp-mouse.sanger.ac.uk/current_snps/strain_specific_vcfs/BALB_cJ.mgp.v5.snps.dbSNP142.vcf.gz.tbi
或者自己构建:
gunzip BALB.vcf.gz
bgzip -c BALB.vcf > BALB.vcf.gz
tabix -p vcf BALB.vcf.gz
2.vcftools:
cat ../mm10.chr.fa | vcf-consensus BALB.vcf.gz > BALB.fa
发现chr有问题:
sed 's/>chr/>/g' ../mm10.chr.fa > mm10.fa
cat mm10.fa | vcf-consensus BALB.vcf.gz > BALB.fa
sed -i 's/>/>chr/g' BALB.fa
samtools faidx查看下是否相同:
改成了C OK!
网友评论