Memory corruption in ATSFontDeactivate

Originator:mark
Number:rdar://9976774 Date Originated:2011-08-18
Status:Duplicate/9632502 Resolved:
Product:Mac OS X Product Version:10.7.1 11B26
Classification:Crash/Hang/Data Loss Reproducible:Always
 
I know that ATS is deprecated, but I’ve found a memory-smash bug in ATS, and I’ll even tell you how to fix it, so you might just want to treat this as some low-hanging fruit and eat it.

I see this bug on 10.7.1 11B26 and 10.7.0 11A511. Based on disassembly, the bug is present in 10.6.8 10K549, but we’ve never attributed any memory corruption to it on 10.6. It has shown up in our application in a very large way in 10.7, though.

We’ve discovered memory corruption occurring in Google Chrome Helper (renderer) processes within our calls to ATSFontDeactivate. The actual corruption occurs in SendDeactivateFontsInContainerMessage, which appears to have multiple bugs. A complete write-up of the bug is at http://crbug.com/93191. Important bits are excerpted here.

The symptom is that the process will crash, log messages like this, or both:

Google Chrome Helper(42302,0xaca812c0) malloc: *** error for object 0x59fb084: incorrect checksum for freed object - object was probably modified after being freed.
*** set a breakpoint in malloc_error_break to debug

In this situation, the stack looks like

Breakpoint 1, 0x9982fc97 in malloc_error_break ()
(gdb) bt
#0  0x9982fc97 in malloc_error_break ()
#1  0x997f14ce in szone_error ()
#2  0x997f154e in free_list_checksum_botch ()
#3  0x997f7a31 in tiny_malloc_from_free_list ()
#4  0x997f8903 in szone_malloc_should_clear ()
#5  0x997f966b in szone_malloc ()
#6  0x008f9435 in base::(anonymous namespace)::oom_killer_malloc(_malloc_zone_t*, unsigned long) (in Google Chrome Framework) (process_util_mac.mm:522)
#7  0x9982f962 in malloc_zone_malloc ()
#8  0x9a4a0198 in __CFAllocatorSystemAllocate ()
#9  0x9a4a0174 in CFAllocatorAllocate ()
#10 0x9a49fed1 in _CFRuntimeCreateInstance ()
#11 0x9a4a8d6a in CFBasicHashCreate ()
#12 0x9a4a8cc7 in __CFDictionaryCreateGeneric ()
#13 0x9a4a8742 in CFDictionaryCreateMutable ()
#14 0x9145ee61 in GetCachedInMemoryContainer ()
#15 0x91460aca in SendDeactivateFontsInContainerMessage ()
#16 0x9147335a in _eFODeactivateFontsInContainer ()
#17 0x91482b2d in _eATSFontDeactivate ()
#18 0x9149c961 in ATSFontDeactivate ()
#19 0x019777c8 in WebCore::FontCustomPlatformData::~FontCustomPlatformData() (in Google Chrome Framework) (FontCustomPlatformData.cpp:88)
#20 0x01cb4b01 in WebCore::CachedFont::allClientsRemoved() (in Google Chrome Framework) (CachedFont.cpp:183)

The object malloc is complaining about:

(gdb) p (char*)0x59fb084
Unsafe to call functions on thread 13: function: malloc_error_break on stack
warning: Canceling operation - malloc lock could be held on current thread.
$1 = 0x59fb084 "Text.framework/Versions/A/CoreText"

And, actually, backing up a bit to the nearest 0-byte on the front end:

(gdb) p (char*) 0x59fb010
Unsafe to call functions on thread 13: function: malloc_error_break on stack
warning: Canceling operation - malloc lock could be held on current thread.
$2 = 0x59fb010 "/System/Library/Frameworks/ApplicationServices.framework/Frameworks/???\020?c\032;???;\t??(?W?\r\005i?\\\032?\022\016?/?%\017g.2$??\022???~CoreText.framework/Versions/A/CoreText"

With MallocScribble set, the garbage becomes a scribble pattern making it fairly obvious that "CoreText.framework/Versions/A/CoreText" was written to a freed block (in this case, the '?' are 0xaa, the “allocated but unused” pattern; the 0x55 'U' are the “freed” pattern):

(gdb) p (char*)0x5bc8b50
Unsafe to call functions on thread 1: function: malloc_error_break on stack
warning: Canceling operation - malloc lock could be held on current thread.
$1 = 0x5bc8b50 "/System/Library/Frameworks/ApplicationServices.framework/Frameworks/", '?' <repeats 39 times>, "UUUUUCoreText.framework/Versions/A/CoreText"

The problem lies in SendDeactivateFontsInContainerMessage (ATS.framework). A complete annotated disassembly of the interesting parts of that function is at http://crbug.com/93191#c12 . Translated into C:

  // This is where it gets pretty interesting.
  const char ct_font_manager_unregister_font_for_data_name[] =
      "_CTFontManagerUnregisterFontForData";
  void* ct_font_manager_unregister_font_for_data =
      dlsym(RTLD_DEFAULT, ct_font_manager_unregister_font_for_data_name);
  if (!ct_font_manager_unregister_font_for_data) {
    // This is where it gets really interesting.

    // expect "/System/Library/Frameworks/ApplicationServices.framework/Frameworks/ATS.framework/Resources"
    const char* ats_resources_dir_path = GetATSResourcesDirPath();

    if (ats_resources_dir_path) {
      const char ats_framework_name[] = "ATS.framework";
      const char* application_services_frameworks_path_end =
          strstr(ats_resources_dir_path, ats_framework_name);
      if (application_services_frameworks_path_end) {
        const char coretext_suffix[] = "CoreText.framework/Versions/A/CoreText";
        size_t coretext_suffix_size = arraysize(coretext_suffix);
        size_t coretext_dylib_path_size =
            application_services_frameworks_path_end -
            ats_resources_dir_path +
            coretext_suffix_size;

        // malloc some memory that won't ever be freed, but it should probably
        // only happen once per process.
        char* coretext_dylib_path = malloc(coretext_dylib_path_size);
        if (coretext_dylib_path) {
          // expect 68
          size_t application_services_frameworks_path_length =
              application_services_frameworks_path_end - ats_resources_dir_path;

          // The strncpy does not '\0'-terminate coretext_dylib_path. It will
          // copy at most application_services_frameworks_path_length (68)
          // bytes from ats_resources_dir_path (a string with length 92).
          strncpy(coretext_dylib_path,
                  ats_resources_dir_path,
                  application_services_frameworks_path_length);

          // Since coretext_dylib_path is not '\0'-terminated at this point,
          // what strlen returns is anyone's best guess. It will be at least
          // 68, but it might be more. In order for there to be no bug, it
          // must be exactly 68. application_services_frameworks_path_length
          // could be reused here. Alternatively, the strncpy above could be
          // modified to pass coretext_suffix_size instead of
          // application_services_frameworks_path_length, or use some other
          // form of safer string handling.
          memcpy(coretext_dylib_path + strlen(coretext_dylib_path),
                 coretext_suffix,
                 coretext_suffix_size);

          // This is where it stops being really interesting, but it's still
          // pretty interesting.

          // dlopen a dylib that will never be dlclosed, but it will probably
          // only happen once per process.
          void* coretext_dylib =
              dlopen(coretext_dylib_path, RTLD_LAZY | RTLD_LOCAL);
          if (coretext_dylib) {
            ct_font_manager_unregister_font_for_data =
                dlsym(coretext_dylib,
                      ct_font_manager_unregister_font_for_data_name);
          }
        }
      }
    }
  }

  if (ct_font_manager_unregister_font_for_data) {
    // Use it.
  }

  // This is where it stops being interesting.

Let’s ignore the leaks (sloppy, sloppy, sloppy) because we’ve got bigger fish to fry.

The memory errors are in some incredibly poor string handling within the “very interesting” section. The code uses strncpy with an |n| (size) argument guaranteed to be too small so that strncpy never '\0'-terminates the |s1| (destination) buffer. It then uses strlen to compute the length of the string that it just copied, despite the fact that it already knows how much data was just copied (it’s exactly |n| bytes), and because there’s no guarantee that there’s a '\0' after the strncpyed string, strlen might return too large a length. Notably, malloc doesn’t zero memory it returns. Sometimes this code gets lucky and there’s a '\0' after the strncpyed string. Other times, there isn’t, so strlen returns too large a result, and the subsequent memcpy is asked to write to memory beyond the end of the malloced buffer. Whatever’s beyond the end of the buffer may not belong to anyone or it may belong to something else, but it sure won’t belong to this code. That’s why we saw “incorrect checksum for freed object - object was probably modified after being freed” (when the memory didn’t belong to anyone and malloc found it had been tampered with on a subsequent allocation attempt) as well as random crashes throughout the code (when the memory did belong to someone else and it got trashed).

I am currently working around this bug in my application by providing my own public symbol for __CTFontManagerUnregisterFontForData corresponding to an |extern "C"| function named _CTFontManagerUnregisterFontForData that’s a no-op. Since SendDeactivateFontsInContainerMessage looks the symbol up with dlsym(RTLD_DEFAULT), it finds my copy. Since ATS is apparently never successfully calling CTFontManagerUnregisterFontForData now because it looks the symbol up by an incorrect name, the no-op stub works around the bug without altering functionality.

The correct solution here is anything that prevents ATS from writing to memory that it doesn’t own. Since SendDeactivateFontsInContainerMessage was apparently unsuccessful at calling CTFontManagerUnregisterFontForData all along, you may not want to “fix” the symbol name, because it might result in behavior changes. On the other hand, you will certainly want to fix the poor string handling in this function.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!