This program will find the differences between two files, and should be applied in the following situations:
- The files have different number of lines
- The files are unordered.
- `diff` fails to get the differences
- Large files (the reason to use shell `sort` function)
- Single record per line
For example: Imagine you have two files with a list of users, and these listings were taken from a database at different time instants. These listings will be different because users have been added and removed from the database. This script will help you to find the differences when `diff` can't resolve it.
#!/usr/bin/perl -w
use strict;
my $file1= $ARGV[0];
my $file2= $ARGV[1];
unless($file1 and $file2){
print "Usage: $0 <file1> <file2>\n\n";
exit;
}
unless( -f $file1){
print "File 1 does not exist: [$file1]\n\n";
exit;
}
unless( -f $file2){
print "File 2 does not exist: [$file2]\n\n";
exit;
}
my $tmp_file1 = '/tmp/f1.tmp';
my $tmp_file2 = '/tmp/f2.tmp';
`sort $file1 > $tmp_file1`;
`sort $file2 > $tmp_file2`;
open(F1, $tmp_file1) or die "$!";
open(F2, $tmp_file2) or die "$!";
my $read_f1 = 1;
my $read_f2 = 1;
my $s1;
my $s2;
while(1){
if (eof(F1)){print ">>$_" while <F2>;}
if(eof(F2)){print "<<$_" while <F1>;}
if($read_f1){$s1 = <F1>;}
if($read_f2){$s2 = <F2>;}
last unless $s1 and $s2;
$read_f1 = 1;
$read_f2 = 1;
next if ( lc($s1) eq lc($s2) );
if(lc($s1) gt lc($s2)){
print ">$s2";
$read_f1 = 0;
}else{
print "<$s1";
$read_f2 = 0;
}
}
unlink $tmp_file1 or die "$!";
unlink $tmp_file2 or die "$!";