NetBoot: diskless booting not reliable, incorrect Shadow file permissions (bootpd)

Originator:brunerd
Number:rdar://13937874 Date Originated:5/20/2013
Status:Closed Resolved:
Product:OS X Server Product Version:10.8.3/12D78
Classification:Serious Bug Reproducible:Sometimes
 
Summary:
When attempting a diskless NetBoot with OS X server (2.2.1), the Shadow file is instead put on the local drive. Checking the server reveals incorrect file permissions on the Shadow file. After an arbitrary number of reboots (1 or 2 usually) the server will correct the permissions on the Shadow file and the Shadow file is kept on the server allowing the local drive to be unmounted for repair or reinitializing.

Steps to Reproduce:
1. Set up Netboot on OS X 10.8 with Server with a NetBoot image set to diskless  (BTW why did that option go away in the GUI unless you first edit the plist?!)
2. Boot up to that volume
3. Check Disk Utility to see what the mount point is for Macintosh HD (or run mount from Terminal)

Expected Results:
Expected mount point for Macintosh HD is /Volumes/Macintosh HD

Actual Results:
Usually the mount point is /private/var/netboot for Macintosh HD, resulting in Macintosh HD being unable to unmount for disk utilities or re-imaging.

Looking on the server before and after rebooting the client reveals the Shadow file having erroneous group permissions that are eventually taken away.

Permissions on Server when mount point incorrect:
/Library/NetBoot/NetBootClients0/NetBoot147:
total 98304
-rw-rw----  1 netboot111  admin  50331648 May 20 09:15 Shadow


Permissions after several reboots of client computer and mount point is correct:
/Library/NetBoot/NetBootClients0/NetBoot147:
total 98304
-rw-------  1 netboot111  admin  50331648 May 20 09:25 Shadow


Regression:
_Never_ had this issue with 10.6 Server
Issue started in 10.7 with Server

Notes:
I have followed ALL the steps in the KB article TS4316 for Netboot issues (turning off services, deleting /Library/Netboot/NetBootClients0) starting with a clean clients directory, the permissions are still set incorrectly initially and only after several boot attempts do they get corrected by the Server.

Also puzzling is why the folder names and the owner names do not correspond? Otherwise I'd make a script enumerates through all the netboot users, creates folders with correct permissions and touches a Shadow file with correct permissions, as well, to try and mitigate. However since folder names and file owners do not match this does not seem possible.

I also corrected the BASH Parameter Expansion syntax error in /private/etc/rc.netboot on the netboot image, but to no avail:
INCORRECT ===> NETBOOT_SHADOW=${NETBOOT_SHADOW:-NETWORK-}
CORRECT ===> NETBOOT_SHADOW=${NETBOOT_SHADOW:--NETWORK-}

(it doesn't matter because the Case switch falls through to the wildcard *, but FYI the :- operator has the effect of setting the parameter to "NETWORK-" not "-NETWORK-" as intended, thus a double -- is needed)

21-May-2013 12:03 PM Joel Bruner:
Adding an fs_usage log that shows bootpd making the file, and then after several client reboots AppleFileService corrects the file permissions. Extraneous messages edited out (mds, mdworker, repeating messages)

##############################################

# fs_usage -w -f filesys | grep Shadow

#initial creation of Shadow, file permissions are incorrect (rw-rw----)
11:17:01.903533  chown                  [  2]           /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                            0.000005   bootpd.5177
11:17:01.903572  open              F=5        (_WC_T_)  /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                   0.000039   bootpd.5177
11:17:01.903702  chown                                  /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                             0.000010   bootpd.5177

#client reboot - file perms still wrong (rw-rw----)
11:29:51.936328  stat64                                 /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000007   bootpd.5177

#client reboot - file perms still wrong (rw-rw----)
11:37:01.534399  stat64                                 /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000006   bootpd.5177
11:37:50.091950  getattrlist                            /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000013   AppleFileServer.10771

#finally permissions are corrected to (rw-------) by the AFP service
11:37:50.099502  setattrlist                            /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000040   AppleFileServer.10775
11:37:50.099974  getattrlist                            /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000012   AppleFileServer.10774
11:37:50.135514  lstat64                                /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000005   AppleFileServer.10776

#the file is opened read/write and the client harddrive mounts to /Volumes instead of /private/var/netboot
11:37:50.135547  open              F=15       (RW____)  /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000015   AppleFileServer.10776
11:38:08.943086  getattrlist                            /Library/NetBoot/NetBootClients0/NetBoot092/Shadow                                0.000012   AppleFileServer.10701

#############################################
file listing showing incorrect, then corrected permissions
#############################################

bash-3.2# ls -lRt
total 16
drwxrwx---  3 netboot129  admin   102 May 21 11:17 NetBoot092
-rw-rw-r--@ 1 root        admin  6148 May  9 10:28 .DS_Store

./NetBoot092:
total 98304
-rw-rw----  1 netboot129  admin  50331648 May 21 11:17 Shadow

bash-3.2# ls -lRt
total 16
drwxrwx---  3 netboot129  admin   102 May 21 11:17 NetBoot092
-rw-rw-r--@ 1 root        admin  6148 May  9 10:28 .DS_Store

./NetBoot092:
total 258064
-rw-------  1 netboot129  admin  132126720 May 21 11:39 Shadow

Comments

Engineering reports this fixed in Yosemite, however rc.netboot typo remains unfixed


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!