现在有两个文件,比较两个文件内容,如果前两列内容相同,则将匹配的行写入新的文件。
如果逐行读取,循环匹配,文件比较大的时候运行速度会非常慢。
如果文件不是太大,可以将文件内容读取到哈希表中,然后在内循环中直接查找,而不是每次都遍历整个文件。这种方法可以显著提高查找速度,特别是当文件中的条目数量较大时。
#!/usr/bin/perl
use strict;
use warnings;
my $filename1 = shift;
my $filename2 = shift;
my $output_filename = shift;
open(my $fh1, '<', $filename1) or die "can not open the file '$filename1' $!";
open(my $fh2, '<', $filename2) or die "can not open the file '$filename2' $!";
open(my $output_fh, '>', $output_filename) or die "can not create the file '$output_filename' $!";
my %data2_hash;
# Read data from file2 into a hash
while (my $line2 = <$fh2>) {
chomp $line2;
my @data2 = split(/\s+/, $line2);
my $key = join(':', @data2[0, 1]); # Assuming first two columns form the key
$data2_hash{$key} = $line2;
}
# Compare data from file1 with hash
while (my $line1 = <$fh1>) {
chomp $line1;
my @data1 = split(/\s+/, $line1);
my $key = join(':', @data1[0, 1]); # Assuming first two columns form the key
if (exists $data2_hash{$key}) {
print $output_fh "$data2_hash{$key}\n";
}
}
close($fh1);
close($fh2);
close($output_fh);
如果文件二的前两列与文件一相同,则将文件二中的行写入一个新的文件。
###################
附上思路简单,但运行速度比较慢的脚本:
#!/usr/bin/perl
use strict;
use warnings;
use Time::HiRes qw(gettimeofday tv_interval);
my $start_time = [gettimeofday];
my $filename1 = shift;
my $filename2 = shift;
my $output_filename = shift;
open(my $fh1, '<', $filename1) or die "can not open the file '$filename1' $!";
open(my $fh2, '<', $filename2) or die "can not open the file '$filename2' $!";
open(my $output_fh, '>', $output_filename) or die "can not create the file '$output_filename' $!";
while (my $line1 = <$fh1>) {
chomp $line1;
my @data1 = split(/\s+/, $line1);
while (my $line2 = <$fh2>) {
chomp $line2;
my @data2 = split(/\s+/, $line2);
if ($data1[0] eq $data2[0] && $data1[1] eq $data2[1]) {
print $output_fh "$line2\n";
}
}
seek($fh2, 0, 0);
}
close($fh1);
close($fh2);
close($output_fh);
my $end_time = [gettimeofday];
my $elapsed_time = tv_interval($start_time);
print "Script execution time: $elapsed_time seconds\n";
网友评论