TimeMachine does not correctly backup hardlinked files

Originator:fracai
Number:rdar://8005841 Date Originated:19-May-2010 07:57 PM
Status:Duplicate/4858514 (Open) Resolved:
Product:Mac OS X Product Version:10.6.3
Classification:Other Bug Reproducible:Always
 
Summary:
When backing up hardlinked files, TimeMachine always creates independent data records for each file rather than one record for each set of hardlinks.



Steps to Reproduce:
Create a test directory, and within it, a test file and several hardlinks to that file.

$ mkdir ~/test
$ cd ~/test
$ dd if=/dev/urandom of=link0 count=1 bs=1m
$ ln link0 link1
$ ln link0 link2
$ ln link0 link3
$ ln link0 link4

Check the files to ensure that they are indeed hardlinks.

$ ls -lih
total 5.0M
28897309 -rw-r--r-- 5 arno staff 1.0M May 19 19:25 link0
28897309 -rw-r--r-- 5 arno staff 1.0M May 19 19:25 link1
28897309 -rw-r--r-- 5 arno staff 1.0M May 19 19:25 link2
28897309 -rw-r--r-- 5 arno staff 1.0M May 19 19:25 link3
28897309 -rw-r--r-- 5 arno staff 1.0M May 19 19:25 link4

$ du -hc link*
1.0M	link0
1.0M	total

Initiate a TimeMachine backup (ensure that the test directory is not excluded from backups).



Expected Results:
The files backed up by TimeMachine should exactly mirror the original; hardlinks should be preserved.  This both saves space as well as correctly represents the original file structure.
Checking the files on the TimeMachine volume should result as follows: 

$ cd /Volumes/Backup\ of\ Computer/Backups.backupdb/Computer/Latest/MacHD/Users/arno/test/
$ ls -lih
total 5.0M
108469821 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link0
108469821 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link1
108469821 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link2
108469821 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link3
108469821 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link4

$ du -hc link*
1.0M	link0
1.0M	total



Actual Results:
The hardlinked files backed up by TimeMachine are copied independently, breaking the hardlink structure.  While no file data is lost, the file structure is not exactly mirrored.  The biggest harm of this is that TimeMachine backups end up taking more space than the original data.  A 100 MB file that is hardlinked 10 times should, and does, take up 100 MB of space on the original drive.  When backed up by TimeMachine the files will take up 1 GB as the hardlinks are not respected.  Restoring from this backup will result in 1 GB as well.

The actual output of checking the data on the backup is as follows: 

$ cd /Volumes/Backup\ of\ Computer/Backups.backupdb/Computer/Latest/MacHD/Users/arno/test/
$ ls -lih
total 5.0M
108469821 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link0
108469822 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link1
108469823 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link2
108469824 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link3
108469825 -rw-r--r-- 1 arno staff 1.0M May 19 19:25 link4

$ du -hc link*
1.0M	link0
1.0M	link1
1.0M	link2
1.0M	link3
1.0M	link4
5.0M	total



Regression:
--



Notes:
Hardlinks cannot span volumes, but they should be re-created on the backup disk in order to appropriately mirror the original data.  No file data is lost by breaking the hardlinks, but restoring from this backup could potentially cause inconsistencies where hardlinked files are expected.  Of course, restoring hardlinks during a restore may not be possible if only one of the hardlinked files was restored during a restoration, but the link should at least be retained during the backup.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!