Metal should expose shuffle instructions

Number:rdar://31155555 Date Originated:2017-03-20
Status:open Resolved:
Product:iOS SDK Product Version:
Classification:Enhancement Reproducible:
Warp/lane shuffle operations are essential to doing register-level blocking of computation collaboratively across threads/work items. They can be challenging to use in general cases, but are critical to extracting peak performance on very important workloads like dense matrix-matrix multiply (and many neural net layer types), as well as local stream compaction for fine-grained dynamic collaboration between threads.

As such, all other compute APIs expose shuffle operations. Metal should as well. Without them, it is impossible to write competitive GEMM, conv layer, and other important kernels on most curent GPU architectures.


