Americas

  • United States
sandra_henrystocker
Unix Dweeb

Nine ways to compare files on Unix

How-To
Apr 17, 20176 mins
Data CenterLinux

apple orange
Credit: Thinkstock

Sometimes you want to know if some files are different. Sometimes you want to how they’re different. Sometimes you might want to compare files that are compressed and sometimes you might want to compare executables. And, regardless of what you want to compare, you probably want to select the most convenient way to see those differences. The good news is that you have a lot more options than you probably imagine when you need to focus on file differences.

First: diff

The command most likely to come to mind for this task is diff. The diff command will show you the differences between two text files or tell you if two binaries are different, but it also has quite a few very useful options.

For text files, the diff command by default uses a format that shows the differences using and > characters to represent the first and second of the two files and designations like 1c1 or 8d7 to report these differences in a format that could be used by the patch command.

$ diff file1 file2
1c1
 0 top of file 2
3c3
 2 two tomatoes
6c6
 5
8d7

If you just happen to have the patch command on your system, you can compare the two files, save the diff output to a file, and then use that file to force the second file to be the same as the first. The only time you'd likely want to do something like this is if you were trying to update files on a number of systems (rather than simply replace them) as the differences might be very small. You could do it like this:

Create the differences file:

$ diff file1 file2 > diffs

Use the differences file to make the seocnd file just like the first:

$ patch -i diffs file2

At this point, you'd have two identical files. Your file2 would be just like your file1. You could then use the diffs file on any number of systems to update the targeted file.

For most of us, this use of diff is probably not something we'd do very often.

If you only want to know if the files are different, you can try a simpler approach.

$ diff -q file1 file2
Files file1 and file2 differ

Second: side-by-side diff

If you want to see the differences between two files, but not the instructions that patch could use, you might like diff's side-by-side view. Note that lines with differences include a |.

$ diff -y file1 file2
0 top of file one                         | 0 top of file 2
1 one                                       1 one
2                                         | 2 two tomatoes
3 three                                     3 three
4                                           4
5 five bananas                            | 5
6                                           6
7 the end                                 

Some of the most useful diff options are these:

-b	ignore white space differences
-B	ignore blank lines
-w	ignore all white space
-i	ignore case differences
-y	side-by-side

Third: top and bottom diff

The output from the diff command with the -c option displays the files sequentially with the different lines marked by exclamation points.

$ diff -c file1 file2
*** file1       2017-04-17 11:16:31.687059543 -0400
--- file2       2017-04-17 12:02:44.194623979 -0400
***************
*** 1,8 ****
! 0 top of file one
  1 one
! 2
  3 three
  4
! 5 five bananas
  6
- 7 the end
--- 1,7 ----
! 0 top of file 2
  1 one
! 2 two tomatoes
  3 three
  4
! 5
  6

Fourth: comparing binary files with diff

You can also use the diff command to compare binary files, but it will only tell you if the files are different unless you use the -s option.

$ diff /usr/bin/diff /usr/bin/cmp
Binary files /usr/bin/diff and /usr/bin/cmp differ
$ diff /usr/bin/diff mydiff
$ diff -s /usr/bin/diff mydiff
Files /usr/bin/diff and mydiff are identical

Fifth: cmp

The cmp command tells you if two files are different and where the first difference appears.

Here's an example comparing text files:

$ cmp file1 file2
file1 file2 differ: byte 15, line 1

Here's an example comparing binary files:

$ cmp /usr/bin/diff /usr/bin/cmp
/usr/bin/diff /usr/bin/cmp differ: byte 25, line 1

To illustrate just why we're getting this particular response with the second command above, you can use the od command to view the top of each of these files. What you see below is the heading that is assigned to binary files. The content that represents the coding of these executables begins at the 25th byte.

$ od -bc /usr/bin/diff | head -4
0000000 177 105 114 106 002 001 001 000 000 000 000 000 000 000 000 000
        177   E   L   F 002 001 001  
sandra_henrystocker
Unix Dweeb

Sandra Henry-Stocker has been administering Unix systems for more than 30 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She lives in the mountains in Virginia where, when not working with or writing about Unix, she's chasing the bears away from her bird feeders.

The opinions expressed in this blog are those of Sandra Henry-Stocker and do not necessarily represent those of IDG Communications, Inc., its parent, subsidiary or affiliated companies.