上一次讲的这个大鼠基因表达探针的重注释出现的了个小问题,在重注释的最后一步有这个报错:
intersectBed -a Rat230_2_probe.bed -b Rattus_norvegicus.Rnor_6.0.96.gtf -wa -wb > x.txt
***** WARNING: File Rat230_2_probe.bed has inconsistent naming convention for record:
chr6 108169080 108169105 Rat230_2:1367452_at; 1 -
***** WARNING: File Rat230_2_probe.bed has inconsistent naming convention for record:
chr6 108169080 108169105 Rat230_2:1367452_at; 1 -
报错提示说是chr6这个位点有个不一致的命名,接着我就查看了下这个位点到底是怎么回事
cat Rat230_2_probe.bed|sed -n '/108169080/p'
chr6 108169080 108169105 Rat230_2:1367452_at; 1 -
再查看下整个.bed文件,原来从第一行就开始不一致了
cat Rat230_2_probe.bed|less -SN
1 chr6 108169080 108169105 Rat230_2:1367452_at; 1 -
2 chr5 15325895 15325920 Rat230_2:1367452_at; 1 +
3 chr10 105608591 105608616 Rat230_2:1367452_at; 1 -
4 chr5 15325937 15325962 Rat230_2:1367452_at; 1 +
5 chr5 15325986 15326011 Rat230_2:1367452_at; 1 +
6 chr5 15326001 15326026 Rat230_2:1367452_at; 1 +
7 chr6 108168877 108168902 Rat230_2:1367452_at; 1 -
8 chr6 108168798 108168823 Rat230_2:1367452_at; 1 -
9 chr6 108168774 108168799 Rat230_2:1367452_at; 1 -
10 chr10 105608337 105608362 Rat230_2:1367452_at; 1 -
11 chr11 81380587 81380612 Rat230_2:1367452_at; 1 +
只能goggle了,果然是因为染色体命名的方式不同,.bed文件中染色体命名都是以chr开头,而.gtf文件中都是以1,2,3...等命名,这就好办了,将.bed中的chr全部删除
cat Rat230_2_probe.bed|sed 's/chr//' > x_chr.bed
cat x_chr.bed |less -SN
1 6 108169080 108169105 Rat230_2:1367452_at; 1 -
2 5 15325895 15325920 Rat230_2:1367452_at; 1 +
3 10 105608591 105608616 Rat230_2:1367452_at; 1 -
4 5 15325937 15325962 Rat230_2:1367452_at; 1 +
5 5 15325986 15326011 Rat230_2:1367452_at; 1 +
6 5 15326001 15326026 Rat230_2:1367452_at; 1 +
7 6 108168877 108168902 Rat230_2:1367452_at; 1 -
8 6 108168798 108168823 Rat230_2:1367452_at; 1 -
9 6 108168774 108168799 Rat230_2:1367452_at; 1 -
10 10 105608337 105608362 Rat230_2:1367452_at; 1 -
再使用intersectBed命令就OK了
intersectBed -a x_chr.bed -b Rattus_norvegicus_lincRNA.gtf -wa -wb > Rattus_probe.txt
网友评论