问题:两个玉米自交系A和B,先进行杂交,产生的F1与B进行回交,假设有100对标记用于区分A和B。在BC1中,我们利用这100对标记对每个单株进行了分型。求每个单株的亲本片段占比?
考虑到每个标记代表的片段长度不相同,我们用如下脚本完成这个比例计算。
需要三个输入文件:
- 基因组长度文件 (以玉米B73 V4为例,每行包含染色体编号及长度)
file1:
1 307041717
2 244442276
3 235667834
4 246994605
5 223902240
6 174033170
7 182381542
8 181122637
9 159769782
10 150982314
- 标记位置文件(只需要染色体和位置信息即可,用-连接)
file2:
1-12699720
- 需要计算的BC1单株的分型结果中一个亲本的标记(每行一个单株)
file3:
plant1 1-185105598 1-201198031 1-223298828 3-69002888 3-69002888
plant2 1-223298828 3-69002888 3-186335692 3-211312218 3-229253739
plant3 6-38164584 6-40024641 6-83579108 7-14105029 7-49113016
perl脚本如下:
#!/usr/bin/perl
use warnings;
use strict;
open GENOME_LEN,"file1";
my %genome_length = ();
my $genome_length = 0;
while (<GENOME_LEN>) {
chomp;
my @a = split/\s+/;
$genome_length{$a[0]} = $a[1];
$genome_length += $a[1];
}
close GENOME_LEN;
open MARKER_INFO,"file2";
my %marker_info = ();
while (<MARKER_INFO>) {
chomp;
my @a = split/-/;
push @{$marker_info{$a[0]}},$a[1];
}
close MARKER_INFO;
my %marker = ();
foreach my $chr (keys %marker_info) {
chomp $chr;
my @sort_marker_info = sort{$a<=>$b}@{$marker_info{$chr}};
my $start = 0;
my $end = 0;
foreach my $a (0..$#sort_marker_info) {
if ($a == 0) {
$start = 0;
}else{
$start = int(($sort_marker_info[$a] + $sort_marker_info[$a-1])/2);
}
if ($a == $#sort_marker_info) {
$end = $genome_length{$chr};
}else{
$end = int(($sort_marker_info[$a+1] + $sort_marker_info[$a])/2);
}
$marker{$chr}{$sort_marker_info[$a]} = $end - $start;
}
}
open POSITIVE_MARKER,"file3";
open OUT,">recovery_rate.txt";
print OUT "Marker_name\tGenome_length\tRecovery_length\tRecovery_rate\n";
while (<POSITIVE_MARKER>) {
chomp;
next if /^$/;
my @a = split/\s+/;
print OUT "$a[0]\t";
my $recovery_length = 0;
foreach (1..$#a) {
$a[$_] =~ /(\d+)-(\d+)/;
$recovery_length += $marker{$1}{$2};
#print OUT "\n$1-$2\t$marker{$1}{$2}\n";
}
print OUT "$genome_length\t$recovery_length\t";
printf OUT "%.2f",(($recovery_length/$genome_length)*100);
print OUT "%\n";
}
close POSITIVE_MARKER;
close OUT;
输出结果包含四列:Marker_name、Genome_length、Recovery_length和Recovery_rate。
网友评论