Verify a burned CD/DVD image on Linux

Recently, much to my surprise, I found out that some critical data I had burned on DVD could not be read back from the medium. A more thorough examination showed that many of the files on the DVD had different filesizes than the ones on the hard disk. This is when I started to seriously think about verifying my critical backups.

I don’t really know what had caused that faulty copy. That incident made me add the step of data verification to the backup process, at least for my server backups. I am aware that some graphical burning programs can verify a copy, but I needed a CLI method and this is what will be described in this small article.

It seems that there are two ways to verify the burned data:

  1. Make a list of the files on the hard disk and get their md5 sums, then do the same for the burned files and finally compare the md5sums for each file.
  2. Move the files to a central directory, make an ISO image, burn the image and finally compare the on-disk image’s md5 sum with the md5 sum of the written data on the medium.

I tend to stick to the second method, because it is less pain, regardless of the fact that it requires to move the files to a single directory in order to create that ISO image. But, be advised, this method can give false negatives, if it’s not done in the right way.

So, assuming that all backups have been moved to a directory, create the ISO image with mkisofs:

$ mkisofs -J -l -R -V "Sep2006" -o sep2006.iso /path/to/backups/

The sep2006.iso image gets created and mkisofs (or genisoimage) prints something like the following:

Total translation table size: 0
Total rockridge attributes bytes: 39644
Total directory bytes: 71018
Path table size(bytes): 142
Max brk space used 42000
169383 extents written (330 MB)

What is critical to take a note of is in the last line: the number of extents (blocks) that have been written to the image.

Use whatever program to write the image to a CD/DVD. For example, to write it to a DVD using growisofs, the command would be:

$ growisofs -Z /dev/hdc=sep2006.iso

Although growisofs accepts mkisofs (or genisoimage) options, making it easy to write the files directly to the DVD with the desired extensions, the image-creation stage is still necessary, so to be able to easily calculate the md5 sum of the data on-disk. I bet there could be a way to pass the directory contents through md5sum with a long BASH oneliner, but I haven’t tried it.

Also, note that growisofs, having finished writing, outputs the number of extents it has written to the DVD. For example, in my test it was:

builtin_dd: 169392*2KB out @ average 4.4x1385KBps

This number is not the number of blocks of the ISO image data. growisofs also writes some other data to the medium, eg when closing the session etc, so, do not to take it into account. You can safely use either the number from the mkisofs (or genisoimage) output or calculate the number of extents (blocks of size 2048 bytes) yourself with ls and awk:

$ echo $(( $(ls -l sep2006.iso | awk '{ print $5 }') / 2048 ))
169383

The above divides the image’s filesize by 2048 and prints the result.

So, getting back to the md5 sum calculations, you can get the on-disk image’s md5 sum with the following:

$ cat sep2006.iso | md5sum
cc363de222ba6fe7455258e72b6c26ca  -

The final step is to calculate the md5 sum of the burned data. The dd command can be used to read the DVD, but the crucial part is that dd must read as much data as the size of the ISO image. Otherwise, it is almost certain that you’ll get a false negative about the quality of the copy.

Data is written on a CD/DVD in blocks of size: 2048 bytes. The number of the written blocks is the number of extents mkisofs (or genisoimage) had printed to the stdout when creating the ISO image. The following command instructs dd to read 169383 blocks, 2048 bytes each, and pipe it to md5sum:

$ dd if=/dev/hdc bs=2048 count=169383 | md5sum
169383+0 records in
169383+0 records out
cc363de222ba6fe7455258e72b6c26ca  -

The two md5 sums are identical, which means that the DVD copy is good.

A mistake I’d been making, before starting to take into account the number of blocks written on the DVD, was that I calculated the DVD’s md5 sum with the following:

$ dd if=/dev/hdc | md5sum

This is a totally wrong approach, because this method, apart from the ISO image data, also feeds md5sum with other data that is written to the medium, eg data that is written when closing the session or whatever. My knowledge does not help me with this… The fact is that the last method is wrong.

The procedure described above may seem a bit complicated, but it’s not. This small article was written in a very fast pace, but I hope the procedure is clear.

Verify a burned CD/DVD image on Linux by George Notaras is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright © 2006 - Some Rights Reserved

George Notaras avatar

About George Notaras

George Notaras is the editor of the G-Loaded Journal, a technical blog about Free and Open-Source Software. George, among other things, is an enthusiast self-taught GNU/Linux system administrator. He has created this web site to share the IT knowledge and experience he has gained over the years with other people. George primarily uses CentOS and Fedora. He has also developed some open-source software projects in his spare time.

19 responses on “Verify a burned CD/DVD image on Linux

  1. linportal Permalink →

    This is an interesting approach. But, here’s another one that I’m using:

    Mount freshly burned DVD and run:

    diff -urN /ondisk /dvd
    echo $?
    dmesg

    diff will give you info if all the files match content, doublecheck with $? (it will be 0 if everything’s OK) and triplecheck with dmesg (see if you have any bad sector errors).
    This can work for iso images too if you mount them via the loop device.
    Finally, if you check video dvd’s like this, mount them like UDF filesystem so the files on them are lowercase.

    Beside all this, there’s a neat tool called cdck that reports some interesting statistics about your media.

  2. George Notaras Post authorPermalink →

    This is very interesting. I never thought of using diff for this, but I’ll also try the method you mentioned next time.
    I had tried cdck in the past. It’s a good tool for checking optical media, but, IIRC, it needed a significant amount of time to check a single disc. I ended up with md5sum because it can be computed fast and generally, if the two sums don’t match, gives an idea that something might have gone wrong.

    Thanks for this info.

  3. Algis Kabaila Permalink →

    This is a great article! I must confess that after reading it I have settled on the simpler diff procedure, outlined in the comment by linportal.

    WhY? I am a really old man and much as I like computers in general and Linux in particular, my time is now limited “by external factors”, so I have not learned some of the finer points of commands, invokable only in CLI. Also, some of the lines of code in the article seem unnecessarily complex (I may well be totally wrong in this…) For instance, the author uses the following command:

    $cat sep2006.iso | md5sum

    Would it not be the same to do in one step without piping, viz
    $md5sum sep2006.iso

    This is not criticism, just would like to know… Great article, loved it!
    OldAl.

  4. NotME Permalink →

    linportal: your method only works if the DVD/CD contains simply data that can be mounted. It won’t work for audio tracks or extra data like redundant recovery information (look at dvdisaster)

  5. George Notaras Post authorPermalink →

    Hi Al,
    You’re right about the md5sum usage. I have got used to using the “cat” command so often, because I usually pipe the data to a little “pipe-monitoring” program, called pv, in order to have some kind of progress indicator about the whole operation, since md5sum does not have one. So, the actual command was (for example):

    $ cat knoppix.iso | pv | md5sum

    But, when I was writing the post, I just stripped the “pv” part off the command line so that I wouldn’t have to provide any explanation about it, since it’s not very popular. But, I liked your feedback because it gave me the chance to write about the pipe-viewer, which I find very useful in many occasions.
    Also – just to criticize myself a bit :) – the iso_size/2048 calculation could have taken place inside the awk statement. I’ll correct these things when I have some free time, because, in the way I have written them, they add unnecessary complexity to the whole operation.

    Regards,
    GNot

  6. George Notaras Post authorPermalink →

    NotME: very useful information. btw, dvdisaster is an excellent program. I just now noticed that it has a CLI interface too.

  7. Jetero Permalink →

    I was searching for a simple way to check the written CD image when I said to myself – this should be really easy because Linux people like things to be straight forward and as simple as they can. Tried:

    $jetero@jetero-desktop:/etc$ md5sum /media/cdrom
    md5sum: /media/cdrom: Is a directory

    then:
    $jetero@jetero-desktop:/etc$ md5sum /media/cdrom0
    md5sum: /media/cdrom0: Is a directory

    I thought I did some small stupid mistake or missed an argumet. I kept searching but everything I found seemed so complicated compared to the everyday task I wanted to accomplish. Just before I started this adventure I was removing the automount option in fstab. I remembered that in fstab my cd was /dev/hda. Then I tried:

    $jetero@jetero-desktop:/etc$ md5sum /dev/hda
    b950a4d7cf3151e5f213843e2ad77fe3 /dev/hda

    Worked perfectly well and even got the same sum (ubuntu-6.10-desktop-i386.iso).

    I am new to Linux, so please forgive me if I am totally wrong. The results above were obtained on Ubuntu 6.06.1
    I hope this will help some1 :)

  8. Hellfire Permalink →

    Hello GNot,
    As far as I understand your approach, there is one major gap in your verifcation process: You verify the burned dvd against the on-disk-image but have no means to assure that this image is correct compared to the original data in your directory tree.
    So, in my opinion, linportals approach is more safe because it compares the data on the dvd directly with the source data as a whole and not only by some hash value. I found no information about the exact method of comparing the contents of files used by diff but I assume it to be more exact than comparing the hashes (there is a -distinct- possibility of two files having the same hash).

  9. George Notaras Post authorPermalink →

    Jetero,
    as it was mentioned, md5sum /dev/hdX (where hdX the cdrom device node), is the wrong approach. CD/DVD burning utilities write slightly more data on the medium than the image data, eg when closing sessions etc.

  10. George Notaras Post authorPermalink →

    Hello Hellfire,
    This is correct. The on-hard-disk ISO image is not checked, but it is assumed that the hard disk itself and mkisofs are functioning correctly, so the image contains the exact data as it is in the directory tree. This is indeed a gap in the verification process, but I suppose that it must be extremely rare to create a bad ISO image with mkisofs.

  11. john Permalink →

    Hi,

    I ended up here when trying to verify my dvd vs the downloaded debian iso file. GNot’s method makes sense to me because of the huge volume.

    In my case, growisofs apparently added 4096 bytes at the end of the dvd. growisofs has an option
    “-dvd-compat”, I wonder this may be the perfect solution.

  12. George Notaras Post authorPermalink →

    Hi John,
    IIRC the –dvd-compat switch should be used when burning dvd video, so to preserve compatibility with Hi-Fi players.

  13. Dan Permalink →

    Regarding “cat FILE | md5sum” vs. “md5sum FILE”:
    I also use the first one as it happens that some md5sum version have a problem with the 2 GB file limit (basically not giving all needed options to the open() call). So given DVD ISO images are mostly larger than 2 GB one can bypass this error by letting cat do the open/reading and stream the result into md5sum.

  14. George Notaras Post authorPermalink →

    Hi Dan. Thanks for this piece of info. I almost always use the “cat FILE | md5sum” form but I didn’t know that there were md5sum builds around without large file support.

  15. dee Permalink →

    A handy tool for md5sums is md5deep ( http://md5deep.sourceforge.net/ )
    I usually generate quickly a list of md5sums with it and I include it in the medium. Not exactly what you are trying to achieve, but a handy utility nevertheless.

  16. Luc (Fr) Permalink →

    Hi!
    I find another way of doing it here :
    http://wiki.linuxquestions.org/wiki/Md5sum

    To check the md5sum from a just burned dvd:

    $ md5sum /dev/dvd

    Trailing zero’s and nuls at the end can change the MD5 hash. So to calculate the md5sum we need to:
    1) find the size of the ISO in bytes
    2) run dd with this exact size in bytes: dd if=/dev/dvd | head –bytes= | md5sum
    So for example:

    $ dd if=/dev/dvd | head –bytes=3621957632 | md5sum

  17. taygan Permalink →

    Whoa, this is great! THANK YOU!

    I’ve been using this info a lot, it’s a well-used bookmark. I *LOVE* pv for watching the progress. cdck is excellent too, but I found I need dmesg *after* a cdck to check the bad sectors.

    Now just to combine everything with md5check from etree-scripts for those DVD-archives of my flac files..

    Now for a frontend or script that combines the md5sum, cdck and dmesg, uhhhh, I’ll put it on the project list :)

  18. George Notaras Post authorPermalink →

    That would be a nice idea! Msg me to add a link whenever or if it happens :-)

    Also, thanks for your kind words. I am glad you have found this information helpful.