APFS: Provide posix_fallocate() (WWDC 2017)

Originator:mark
Number:rdar://32720223 Date Originated:2017-06-12
Status:Open Resolved:
Product:macOS + SDK Product Version:10.13db1 17A264c
Classification:Suggestion Reproducible:
 
In context of https://developer.apple.com/videos/play/wwdc2017/715/ from WWDC 2017:

Since the operating system and default filesystem (APFS) now support sparse files by default, it should also provide the posix_fallocate() system call. See http://pubs.opengroup.org/onlinepubs/9699919799/functions/posix_fallocate.html.

posix_fallocate() can be used to ensure that disk blocks are physically allocated for a file.

Why does this matter? Why would anyone care about whether blocks are physically allocated when they’ll read as zero?

When a file is mapped into memory with mmap(), the memory-mapped region will be backed by the file on disk, and writes to the region will result in writes to the file on disk. Unlike the defined failure and error-reporting mechanism used by traditional write mechanisms such as the write() system call, errors that occur during writes to memory-mapped files result in signals being raised. It’s much trickier to deal with signals appropriately, and when not handled, the default disposition will crash the application.

This situation will arise during normal operation when attempting to write to an unallocated block in a sparse file on a full disk. Where write() would fail with ENOSPC which is trivial to handle gracefully, the write to mapped memory will result in SIGBUS.

Because of this risk, my application fully allocates disk blocks for files intended to be used with memory-mapped I/O in this manner. The preferred mechanism for this is to call posix_fallocate() to ensure that physical blocks are allocated, which allows the underlying filesystem to choose the most expeditious manner to perform this task. While it’s possible to write a substitute for a proper posix_fallocate() system call entirely in user space, doing so is inherently racy (it may lead to data loss) and can never be as efficient as allowing the kernel and filesystem to perform the operation on their own.

Here’s where we ensure that files used for memory-mapped I/O have all blocks physically allocated on disk: https://chromium.googlesource.com/chromium/src/+/0413fd9f7c660ba7504dade7bb838862f138348e/base/files/memory_mapped_file_posix.cc#104. You’ll note that macOS is currently excluded because up until now, the default filesystem, HFS+, did not support sparse files. We will need to change this to force physical allocation of these files because the new filesystem, APFS, does support sparse files.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!