<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule"
>
<channel>
	<title>G-Loaded Journal &#187; Verification</title>
	<atom:link href="http://www.g-loaded.eu/tag/verification/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.g-loaded.eu</link>
	<description>An open-source software and technology related journal</description>
	<lastBuildDate>Mon, 05 Dec 2011 19:55:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/</creativeCommons:license>
		<item>
		<title>Cheap Biometrics &#8211; Use Keystroke Dynamics to Identify and Verify Users</title>
		<link>http://www.g-loaded.eu/2008/05/08/cheap-biometrics-use-keystroke-dynamics-to-identify-and-verify-users/</link>
		<comments>http://www.g-loaded.eu/2008/05/08/cheap-biometrics-use-keystroke-dynamics-to-identify-and-verify-users/#comments</comments>
		<pubDate>Thu, 08 May 2008 17:49:25 +0000</pubDate>
		<dc:creator>George Notaras</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[Authentication]]></category>
		<category><![CDATA[Biometrics]]></category>
		<category><![CDATA[Review]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Verification]]></category>
		<guid isPermaLink="false">http://www.g-loaded.eu/?p=485</guid>
		<description><![CDATA[You may have obtained my password, but you can&#8217;t type it like me! This could be the summary of the excellent article, titled Identify and verify users based on how they type by Nathan Harrington, which demonstrates how it is possible to enhance a computer system&#8217;s security by using a special algorithm which, in addition [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>You may have obtained my password, but you can&#8217;t type it like me!</p></blockquote>
<p>This could be the summary of the excellent article, titled <em><a href="http://www.ibm.com/developerworks/opensource/library/os-identify/?ca=dgr-lnxw02GDMTyping&#038;S_TACT=105AGX59">Identify and verify users based on how they type</a></em> by <em>Nathan Harrington</em>, which demonstrates how it is possible to enhance a computer system&#8217;s security by using a special algorithm which, in addition to the validity of the password, checks whether the keyboard buttons have been pressed/released in the user&#8217;s pre-recorded and unique way of typing that particular password. The author provides all the necessary code in order to add this biometric technique to the <a href="http://www.gnome.org/projects/gdm/">GNOME Display Manager</a> (<em>GDM</em>).</p>
<p>It may sound like a relatively easy implementation, but, taking into account that it is almost impossible for our neuro-myo-skeletal system to produce two identical patterns while performing a complicated action such as typing, user identification and verification using this biometric technique becomes a real challenge. The article author notes:</p>
<blockquote><p>As a biometric, keystroke dynamics are relatively imprecise. Unlike Iris scans or fingerprints, even the most highly repetitive individuals make subtle variations in their typing patterns. The challenge in using keystroke dynamics in an authentication or verification context is to discern acceptable variations from incorrect credentials.</p></blockquote>
<p>One could say that the pros and cons of such an implementation are quite obvious.</p>
<ul>
<li>Unlike iris or fingerprint scanners, this biometric does not require any special hardware equipment. It is just an algorithm that can be compiled and run on any computer system. This generally means: <em>cheap biometric methods</em>.</li>
<li>It can be added on top of the currently used authentication systems without requiring any extra action from the users at all.</li>
<li>The combination of such a biometric method with the existing user/password authentication scheme greatly enhances security.</li>
</ul>
<p>On the other hand::</p>
<ul>
<li>It is extremely easy for anyone who has privileged access to the computer system to record each user&#8217;s typing pattern. This could be done by using keylogging software or, worse, using a specially crafted keyboard in case of physical access to the system.</li>
<li>It requires that the users are actually familiar with the keyboard, at least to the extend that they are able to repeat the password typing pattern without big variations.</li>
<li>Depending on the strictness of the algorithm, false negatives might occur.</li>
</ul>
<p>I didn&#8217;t have the necessary free time to patch GDM with the provided code, compile it and test the algorithm&#8217;s effectiveness.</p>
<p>I mainly found this article very interesting because it is proof that we have <em>just scratched the surface of biometrics</em>. The simplicity of the concept behind this biometric method indicates that there are many new human identification techniques to be discovered and implemented (not only in computer systems), possibly at a cost in terms of personal freedom as a result of misuse of such technologies. But, I guess this has always been the problem with the technological progress, so, once again, we will have to deal with and resolve any issues that may arise in the future.</p>
<div class="cc-block"><em><a href="http://www.g-loaded.eu/2008/05/08/cheap-biometrics-use-keystroke-dynamics-to-identify-and-verify-users/">Cheap Biometrics &#8211; Use Keystroke Dynamics to Identify and Verify Users</a></em>, unless otherwise expressly stated, is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>. Terms and conditions beyond the scope of this license may be available at <a href="http://www.g-loaded.eu/about/disclaimer-and-license/">www.g-loaded.eu</a>.</div>
<h4>Related Articles</h4>
<ul><li><a href="http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/" rel="bookmark">VeriTAR &#8211; Verify checksums of files within a TAR archive</a></li>
<li><a href="http://www.g-loaded.eu/2007/02/12/lock-out-a-user-after-n-failed-login-attempts/" rel="bookmark">Lock out a user after N failed login attempts</a></li>
<li><a href="http://www.g-loaded.eu/2006/10/07/verify-a-burned-cddvd-image-on-linux/" rel="bookmark">Verify a burned CD/DVD image on Linux</a></li>
<li><a href="http://www.g-loaded.eu/2007/10/19/zim-a-desktop-wiki/" rel="bookmark">Zim &#8211; a Desktop Wiki</a></li>
<li><a href="http://www.g-loaded.eu/2009/05/07/descramble-passwords-from-gftp-bookmarks-using-python/" rel="bookmark">Descramble Passwords from gftp Bookmarks using Python</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://www.g-loaded.eu/2008/05/08/cheap-biometrics-use-keystroke-dynamics-to-identify-and-verify-users/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>VeriTAR &#8211; Verify checksums of files within a TAR archive</title>
		<link>http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/</link>
		<comments>http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/#comments</comments>
		<pubDate>Sat, 01 Dec 2007 14:24:44 +0000</pubDate>
		<dc:creator>George Notaras</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Filesystem]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[TAR]]></category>
		<category><![CDATA[Verification]]></category>
		<guid isPermaLink="false">http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/</guid>
		<description><![CDATA[In my opinion, the biggest problem of the tar format (&#8216;ustar&#8216;) is that it does not store the checksums of the files it contains. So, in order to be able to verify the contents of the tar archive, you either need to keep the original data on the hard drive and compare the archive contents [...]]]></description>
			<content:encoded><![CDATA[<p>In my opinion, the biggest problem of the <a href="http://en.wikipedia.org/wiki/Tar_(file_format)#USTAR_format">tar format</a> (&#8216;<em>ustar</em>&#8216;) is that it does not store the checksums of the files it contains. So, in order to be able to verify the contents of the tar archive, you either need to keep the original data on the hard drive and compare the archive contents against that data using the <code>-d</code> tar switch or keep the MD5 sums of the files in a separate document and also use an external program in order to check them against the calculated MD5 sums of the archived files. In this short post I introduce you to a method of creating tar archives and keeping the md5sums of the files at the same time and a utility, veritar, which can compare those md5 sums with the checksums of the contents of the archive in-place, without the need to extract.<br />
<span id="more-468"></span></p>
<h4>Creation of the TAR archive and the MD5 sums file</h4>
<p>In the following example it is assumed that the files to backup reside in the <code>myfiles/</code> subdirectory, the name of the tar archive will be <code>mybackup.tar</code> and the name of the file containing the md5sums will be <code>mybackup.md5</code>.</p>
<pre class="console">
$ tar -cvpf mybackup.tar myfiles/ \
    | xargs -I '{}' sh -c "test -f '{}' &#038;&#038; md5sum '{}'" \
    | tee mybackup.md5
</pre>
<p>Some notes:</p>
<ul>
<li>You can use any tar switch for the creation of the archive except <strong>-C</strong>. If you need to change to another directory, do it using <strong>cd</strong> or else no md5 sums will be recorded.</li>
<li>Make sure that you include the <strong>-v</strong> (<strong>&#8211;verbose</strong>) switch when invoking tar, as the paths need to be printed to stdout in order to be processed by <strong>xargs</strong>.</li>
<li>In the xargs statement, the <strong>-I &#8216;{}&#8217;</strong> part indicates that the <code>'{}'</code> string will be replaced by the path that is passed to xargs through the pipe.</li>
<li>The <strong>sh -c &#8220;test -f &#8216;{}&#8217; &#038;&#038; md5sum &#8216;{}&#8217;&#8221;</strong> does two things: tests if the path  (<code>'{}'</code>) is a file and calculates the md5 sum for it.</li>
<li>In the last part, <strong>tee</strong> is used in order to print the md5sum to the stdout and also to the <code>mybackup.md5</code> file.</li>
</ul>
<p>When this operation ends, you will end up with two files: <strong>mybackup.tar</strong> and <strong>mybackup.md5</strong>.</p>
<p><strong></strong><strong>Special thanks</strong> to:</p>
<p> <strong>*</strong> <em>Anvil</em> for the suggestion to use <code>bash -c "...test goes here..."</code> stuff.<br />
 <strong>*</strong> <em><a href="http://keramida.wordpress.com/">Giorgos Keramidas</a></em> for the improvement he suggested, so that the md5 sum calculation is not limited to regular files only:</p>
<pre class="codesnp">sh -c "test -d '{}' || md5sum '{}'"</pre>
<p>VeriTAR will verify the md5 sums of regular files only, so either test you use when creating the TAR archive, it is still fine.</p>
<h4>VeriTAR &#8211; Tar archive verification</h4>
<p><strong>VeriTAR</strong> [<code>Veri(fy)TAR</code>] is a command-line utility that verifies the md5 sums of files within a tar archive. Due to the tar (&#8216;<code>ustar</code>&#8216;) format limitations the md5 sums are retrieved from a separate file and are checked against the md5 sums of the files within the tar archive. The process takes place without actually exctracting the files.</p>
<p>It works with corrupted tar archives. The program carries on to the next file within the archive skipping the damaged parts. At the moment, this relies<br />
on Python&#8217;s tarfile module internal functions.</p>
<p>VeriTAR is written in Python.</p>
<p>Works with compressed TAR archives (gzip or bz2).</p>
<ul>
<li><a href="http://www.codetrax.org/projects/veritar/wiki">VeriTAR Development Website and Bug Tracking</a></li>
<li><a href="http://www.codetrax.org/projects/veritar/files">Downloads</a></li>
</ul>
<p>Provided that you have used the method above (or any other method) in order to create a file with the md5 sums together with the tar archive, you can easily verify the contents of the archive with veritar.</p>
<pre class="console">
$ veritar mybackup.tar mybackup.md5
</pre>
<p>Please not that veritar&#8217;s output and command line switched need some work, but for now it does the job.</p>
<p>Veritar is released under the <a href="http://www.codetrax.org/licenses/ApacheLicenseV2">Apache License version 2</a>.</p>
<p>It is completely unsupported, but you can still get community support at our software forums. This is also the place where you can inform me about any bugs.</p>
<h5>Known issues</h5>
<ol>
<li>Multi-volume tar archives are not supported at the moment</li>
<li>Tar archives in which the metadata of the first archived file has been corrupted cannot be processed due to a limitation in the tarfile Python module at the time of writing</li>
<li>Although the checksum of any algorithm, <strong>md5</strong>, <strong>sha1</strong>, <strong>crc</strong>(<strong>crc32</strong>), could be used, the current alpha version is not very flexible.</li>
<li>It may crash on damaged archives on older python versions.</li>
</ol>
<div class="cc-block"><em><a href="http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/">VeriTAR &#8211; Verify checksums of files within a TAR archive</a></em>, unless otherwise expressly stated, is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>. Terms and conditions beyond the scope of this license may be available at <a href="http://www.g-loaded.eu/about/disclaimer-and-license/">www.g-loaded.eu</a>.</div>
<h4>Related Articles</h4>
<ul><li><a href="http://www.g-loaded.eu/2006/10/07/verify-a-burned-cddvd-image-on-linux/" rel="bookmark">Verify a burned CD/DVD image on Linux</a></li>
<li><a href="http://www.g-loaded.eu/2007/12/01/choosing-a-format-for-data-backups-tar-vs-cpio/" rel="bookmark">Choosing a format for data backups &#8211; tar vs cpio</a></li>
<li><a href="http://www.g-loaded.eu/2008/01/28/how-to-extract-rpm-or-deb-packages/" rel="bookmark">How to extract RPM or DEB packages</a></li>
<li><a href="http://www.g-loaded.eu/2007/02/25/error-when-using-old-runbin-installers-under-linux/" rel="bookmark">Error when using old run/bin installers under Linux</a></li>
<li><a href="http://www.g-loaded.eu/2008/05/08/cheap-biometrics-use-keystroke-dynamics-to-identify-and-verify-users/" rel="bookmark">Cheap Biometrics &#8211; Use Keystroke Dynamics to Identify and Verify Users</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/</creativeCommons:license>
	</item>
		<item>
		<title>Choosing a format for data backups &#8211; tar vs cpio</title>
		<link>http://www.g-loaded.eu/2007/12/01/choosing-a-format-for-data-backups-tar-vs-cpio/</link>
		<comments>http://www.g-loaded.eu/2007/12/01/choosing-a-format-for-data-backups-tar-vs-cpio/#comments</comments>
		<pubDate>Sat, 01 Dec 2007 02:00:02 +0000</pubDate>
		<dc:creator>George Notaras</dc:creator>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Archives]]></category>
		<category><![CDATA[Backup]]></category>
		<category><![CDATA[Comparison]]></category>
		<category><![CDATA[Filesystem]]></category>
		<category><![CDATA[Storage]]></category>
		<category><![CDATA[Verification]]></category>
		<guid isPermaLink="false">http://www.g-loaded.eu/2007/12/01/choosing-a-format-for-data-backups-tar-vs-cpio/</guid>
		<description><![CDATA[A few days ago, I had decided to revise my data backup methods, so to be able to easily recover as much data as possible after a partial corruption of the medium, a DVD that is, on which the data has been stored. I should clarify that by corruption I by no means include the [...]]]></description>
			<content:encoded><![CDATA[<p>A few days ago, I had decided to revise my data backup methods, so to be able to easily recover as much data as possible after a partial corruption of the medium, a DVD that is, on which the data has been stored. I should clarify that by <em>corruption</em> I by no means include the possibility of mechanical damage of the medium.  After some reasearch on the web, some questions on mailing lists and IRC channels, the quest ended with two formats to choose from, tar and cpio.<br />
<span id="more-467"></span><br />
What I need more when it comes to partial corruption of a backup is to be able to easily extract the healthy archived files. In order to finally make a decision about which format I would finally choose, I performed the following tests:</p>
<ul>
<li>Tests using tar:
<ol>
<li>Random 1-byte corruption.</li>
<li>Partial corruption of one of the archived files metadata.</li>
</ol>
</li>
<li>Tests with cpio:
<ol>
<li>Random 1-byte corruption.</li>
<li>Total corruption of one of the archived files metadata. (same result with partial header corruption)</li>
</ol>
</li>
</ul>
<p>Information about the two formats was found at the following web pages:</p>
<ul>
<li><a href="http://leaf.dragonflybsd.org/cgi/web-man?command=cpio&#038;section=5">CPIO specification</a> (New <code>ASCII</code> format with <code>CRC</code> added)</li>
<li><a href="http://en.wikipedia.org/wiki/Tar_(file_format)#USTAR_format">TAR specification</a> (<code>USTAR</code> format)</li>
</ul>
<p>The following tests assume the directory and file structure outlined below:</p>
<pre class="codesnp">
WORKING_DIR/
          bak/
               1.pdf
               2.pdf
               3.pdf
</pre>
<p>Before continuing I would like to thank the folks at the <em>Linux-Greek-Users</em> mailing list for their advice and ideas. I had initially posted the following material in the LGU list.</p>
<h4>TAR Tests</h4>
<p>Testing corruption of tar archives.</p>
<h5>Random 1-byte corruption of the tar archive</h5>
<p>In this test one random byte of the archive was replaced by a zero (0).</p>
<pre class="console">
$ md5sum bak/*
11875e4e35a40686d81a37aa448aac2e  bak/1.pdf
30c63be455dbada1ffc985c5465d0723  bak/2.pdf
096dc1c77a2a0f4d9f953abd7264843f  bak/3.pdf
</pre>
<pre class="console">
$ tar -cvf bak.tar bak/
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
</pre>
<pre class="console">
$ tar -dvf bak.tar bak/
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
</pre>
<pre class="console">
$ python -c 'f=open("bak.tar","r+"); f.seek(12334); f.write("0"); f.close()'
</pre>
<pre class="console">
$ tar -dvf bak.tar bak/
bak/
bak/2.pdf
bak/3.pdf
bak/3.pdf: Contents differ
bak/1.pdf
</pre>
<pre class="console">
$ mkdir out
</pre>
<pre class="console">
$ tar -xvf bak.tar -C out/
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
</pre>
<pre class="console">
$ md5sum out/bak/*
11875e4e35a40686d81a37aa448aac2e  out/bak/1.pdf
30c63be455dbada1ffc985c5465d0723  out/bak/2.pdf
2d0b2aa54047d6e97b45fbb43f8f1bdc  out/bak/3.pdf
</pre>
<p><strong>Conclusion</strong>: The md5 sums of the original 3.pdf and the extracted 3.pdf differ. The rest of the files has been extracted accurately.</p>
<h5>Partial corruption of one of the archived files metadata</h5>
<p>In this test, 200 bytes of the total 500 bytes of metadata of the 2nd archived file are destroyed. Note that the 1st archived file is the directory <code>bak/</code></p>
<pre class="console">
$ md5sum bak/*
b0ec395ca8cb79f2ce98397ec0e00981  bak/1.pdf
fbe2f3f799579251682ee6de0e4d828d  bak/2.pdf
afb18f2dbbb43673c641691b458dbcce  bak/3.pdf
</pre>
<pre class="console">
$ tar -cvf bak.tar bak/
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
</pre>
<pre class="console">
$ tar -dvf bak.tar bak/
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
</pre>
<p>In USTAR format, metadata occupy 500 bytes. The tar <strong>magic string</strong> starts at position 257 after the metadata start position. In this test, as it was already mentioned, 200 bytes of data are destroyed (range 200->400):</p>
<pre class="console">
$ python
Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-33)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> magic = "ustar  \x00"
>>> f = open("bak.tar", "rb+")
>>> magic2_pos = f.read().find(magic, 258)
>>> meta2_start = magic2_pos - 57
>>> f.seek(meta2_start)
>>> f.write("0"*200)
>>> f.close()
>>>
</pre>
<pre class="console">
$ tar -dvf bak.tar bak/
bak/
tar: Skipping to next header
bak/3.pdf
bak/1.pdf
tar: Error exit delayed from previous errors
</pre>
<pre class="console">
$ mkdir out
</pre>
<pre class="console">
$ tar -xvf bak.tar -C out/
bak/
tar: Skipping to next header
bak/3.pdf
bak/1.pdf
tar: Error exit delayed from previous errors
</pre>
<pre class="console">
$ md5sum out/bak/*
b0ec395ca8cb79f2ce98397ec0e00981  out/bak/1.pdf
afb18f2dbbb43673c641691b458dbcce  out/bak/3.pdf
</pre>
<p><strong>Conclusion</strong>: Although one of the archived files metadata has been destroyed, tar has managed to successfully extract the rest of the files, regardless of the fact that they were after the corrupted part of the archive. The success of the extraction is confirmed by comparing the extracted files&#8217; md5 sums with the chewcksums of the original files.</p>
<h4>CPIO Tests</h4>
<p>Testing corruption of cpio archives.</p>
<h5>Random 1-byte corruption of the cpio archive</h5>
<p>In this test one random byte of the archive was replaced by a zero (0).</p>
<pre class="console">
$ md5sum bak/*
11875e4e35a40686d81a37aa448aac2e  bak/1.pdf
30c63be455dbada1ffc985c5465d0723  bak/2.pdf
096dc1c77a2a0f4d9f953abd7264843f  bak/3.pdf
</pre>
<pre class="console">
$ find bak/ | cpio -v -o -H crc > bak.cpio
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
25919 blocks
</pre>
<pre class="console">
$ cpio -vi --only-verify-crc < bak.cpio
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
25919 blocks
</pre>
</pre>
<pre class="console">
$ python -c 'f=open("bak.tar","r+"); f.seek(12334); f.write("0"); f.close()'
</pre>
<pre class="console">
$ cpio -v -i --only-verify-crc < bak.cpio
bak/
bak/2.pdf
cpio: bak/3.pdf: checksum error (0x2b7dbd48, should be 0x2b7dbda8)
bak/3.pdf
bak/1.pdf
25919 blocks
</pre>
</pre>
<pre class="console">
$ mkdir out2
</pre>
<pre class="console">
$ cd out2/
</pre>
<pre class="console">
$ cpio -vid < ../bak.cpio
bak
bak/2.pdf
cpio: bak/3.pdf: checksum error (0x2b7dbd48, should be 0x2b7dbda8)
bak/3.pdf
bak/1.pdf
25919 blocks
</pre>
</pre>
<pre class="console">
$ cd ..
</pre>
<pre class="console">
$ md5sum out2/bak/*
11875e4e35a40686d81a37aa448aac2e  out2/bak/1.pdf
30c63be455dbada1ffc985c5465d0723  out2/bak/2.pdf
cd9ea8e6298a42f44b59322b31e55958  out2/bak/3.pdf
</pre>
<p><strong>Conclusion</strong>: The md5 sums of the original 3.pdf and the extracted 3.pdf differ. The rest of the files has been extracted accurately.</p>
<h5>Corruption of one the archived files metadata</h5>
<p>In this test the metadata of one of the archived files is destroyed.</p>
<pre class="console">
$ md5sum bak/*
11875e4e35a40686d81a37aa448aac2e  bak/1.pdf
30c63be455dbada1ffc985c5465d0723  bak/2.pdf
096dc1c77a2a0f4d9f953abd7264843f  bak/3.pdf
</pre>
<pre class="console">
$ find bak/ | cpio -v -o -H crc > bak.cpio
bak/
bak/2.pdf
bak/3.pdf
bak/1.pdf
25919 blocks
</pre>
<pre class="console">
$ python
Python 2.5.1 (r251:54863, Oct 30 2007, 13:54:11)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-33)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> magic = "070702"
>>> f = open("bak.cpio", "r+")
>>> magic2_pos = f.read().find(magic, 1)
>>> f.seek(magic2_pos)
>>> metadata_length = magic2_pos + 6 + 13*8 + 4  # 4: μέρος του pathname
>>> f.write("0"*metadata_length)
>>> f.close()
>>>
</pre>
<pre class="console">
$ cpio -v -i --only-verify-crc < bak.cpio
bak/
cpio: premature end of file
</pre>
</pre>
<pre class="console">
$ mkdir out3
</pre>
<pre class="console">
$ cd out3
</pre>
<pre class="console">
$ cpio -vid < ../bak.cpio
bak
cpio: premature end of file
</pre>
<p><strong>Conclusion</strong>: Neither verification nor extraction. cpio (at least Fedora's version) does not have the ability to skip to a healthy header and the operation ends prematurely. The use of a recovery tool in order to recover the healthy files within the archives is mandatory.</p>
<h4>Conclusion</h4>
<p>Here follow the pros and cons (this is not a complete list) of each format:</p>
<p><strong>CPIO</strong><br />
+ per-file CRC checksum. The backed up data on the DVD can be verified in-place without the need of any 3rd party software.<br />
+ No limit for pathnames.</p>
<p>- when the cpio archive gets partially corrupted, as it can happen on a DVD, then the cpio program cannot skip the damaged files and move on to the next healthy archived file. The use of recovery software is needed.<br />
- you have to use the find command's tests in order to include/exclude files in/from the archive.<br />
- It cannot save extended attributes.</p>
<p><strong>TAR</strong><br />
+ Even if some part of the archive gets corrupted, the tar program can skip to the next healthy archived file and extract it. This is very important as it eliminates the need of the 3rd party recovery software.<br />
+ File and directory inclusions/exclusions are possible with command-line options and with file/dir lists read from a file.<br />
+ It can save extended attributes, but 3rd party software may not be able to read the archive correctly.</p>
<p>- No CRC checksum is saved, so checking the data in-place requires two things: to have kept the checksums of the archived files and to have an external program that can check those checksums against the archived data. If this is not possible, then keeping the data on the hard drive in addition to the backup is needed in order to compare them using tar's -d switch.<br />
- The maximum length of a pathname in the USTAR format is 156 bytes.</p>
<p>It is obvious that both of the two formats and/or programs are incomplete. The pros of one are the cons of the other. This was rather a surprise.</p>
<p>My final choice was the <strong>tar</strong> format because I consider the fact that it does not need a 3rd party program to extract the data from a damaged archive a great advantage. I have also created an utility, <a href="http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/">Veritar</a>, that can verify the md5 sums of the files inside a tar archive with the md5sums that have been kept in a separate file during the creation of the archive. More information in my upcoming post about <a href="http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/">tar crc/md5 verification</a>....
</pre>
<div class="cc-block"><em><a href="http://www.g-loaded.eu/2007/12/01/choosing-a-format-for-data-backups-tar-vs-cpio/">Choosing a format for data backups &#8211; tar vs cpio</a></em>, unless otherwise expressly stated, is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/3.0/">Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License</a>. Terms and conditions beyond the scope of this license may be available at <a href="http://www.g-loaded.eu/about/disclaimer-and-license/">www.g-loaded.eu</a>.</div>
<h4>Related Articles</h4>
<ul><li><a href="http://www.g-loaded.eu/2007/12/01/veritar-verify-checksums-of-files-within-a-tar-archive/" rel="bookmark">VeriTAR &#8211; Verify checksums of files within a TAR archive</a></li>
<li><a href="http://www.g-loaded.eu/2008/01/28/how-to-extract-rpm-or-deb-packages/" rel="bookmark">How to extract RPM or DEB packages</a></li>
<li><a href="http://www.g-loaded.eu/2009/01/22/effective-data-wiping-with-a-single-complete-overwrite/" rel="bookmark">Effective data wiping with a single complete overwrite</a></li>
<li><a href="http://www.g-loaded.eu/2006/12/08/more-data-recovery-tools/" rel="bookmark">More Data Recovery Tools</a></li>
<li><a href="http://www.g-loaded.eu/2010/02/27/regular-data-backups/" rel="bookmark">The importance of regular data backups</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://www.g-loaded.eu/2007/12/01/choosing-a-format-for-data-backups-tar-vs-cpio/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	<creativeCommons:license>http://creativecommons.org/licenses/by-nc-sa/3.0/</creativeCommons:license>
	</item>
	</channel>
</rss>

