CoreML reports NaN confidence values when performing inference on GPU

Originator:srichey
Number:rdar://47670895 Date Originated:01/30/2019
Status:Open Resolved:
Product:CoreML Product Version:
Classification:Other bug Reproducible:Yes
 
Summary:

We are developing an iOS application that uses image classification to identify products in a large catalog. We originally had 30-60 products -- and thus, 30-60 output labels -- but as we scale up to the full catalog of 400+ products, we’ve encountered an issue where the confidence values output by the model are all NaN when using the GPU for on-device inference on iOS.

Initially, we believed that this was an issue with one specific CoreML file that resulted from a training session, as the following model did not experience this issue. However, we have now seen this issue recur with numerous models, leaving us to believe that this represents a bug in iOS. This does not occur when performing inference on the CPU, and does not seem to be affected by compiling the mlmodel file on device or as part of Xcode's build process.

Steps to Reproduce:

These steps are dependent on the exact model used, as only some have this issue.

1. Use Turi Create to create a CoreML file based on resnet-50 (most likely to occur with a high number of output labels)
2. Import this CoreML file into a new Xcode project using standard settings
3. Add an image to the Xcode project
4. In the default ViewController, add code to viewDidLoad that performs the following:
  a. Read the image into a CVPixelBuffer
  b. Create the MLModel from the included mlmodel file
  c. Perform prediction on the given image
  d. Either print the label/confidence outputs of the model to the log, or display in a UILabel

Example project attached.

Expected Results:

We would expect all confidence values to be non-NaN, as we have seen in the past, and as we see when performing inference on the CPU.

Actual Results:

On certain devices, the output confidence values will be NaN. As noted in the “Configuration” section below, this does not occur on all devices that run iOS.

Version/Build:

The term “valid” below means that all output confidence values were normal floating-point values, and were not NaN or infinite. In one case the results array was empty, and no error was given.

iPhone 8 Plus, iOS 12.1.2: Valid on CPU, NaN on GPU
iPhone XR, iOS 12.1.3: Valid on CPU, Valid on GPU
iPad Pro (9.7 inch), iOS 12.0: Valid on CPU, NaN on GPU
iPhone 6s Plus, iOS 12.1.2: Valid on CPU, Valid on GPU
iPhone XS Max, iOS 12.1.3: Valid on CPU, Valid on GPU
iPhone 7 Plus, iOS 12.1.3: Valid on CPU, NaN on GPU
iPhone 7, iOS 12.1: Valid on CPU, NaN on GPU
iPad Air 2, iOS 12.1.1: Valid on CPU, NaN on GPU
iPad mini 3, iOS 11.2.5: Valid on CPU, Valid on GPU
iPad Pro (12.9 inch), iOS 11.0.1: Valid on CPU, Empty on GPU
iPad Pro (11 inch), iOS 12.1.1: Valid on CPU, Valid on GPU

Configuration:

iMac Pro 2017
macOS Mojave 10.14
Turi Create 5.2.1
Python 2.7

Image classifier uses resnet-50 as the base model. We have over 100,000 images with over 200 output labels.

Comments


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!