Metal should expose shuffle instructions

Number:rdar://31155555 Date Originated:2017-03-20
Status:open Resolved:
Product:iOS SDK Product Version:
Classification:Enhancement Reproducible:
Warp/lane shuffle operations are essential to doing register-level blocking of computation collaboratively across threads/work items. They can be challenging to use in general cases, but are critical to extracting peak performance on very important workloads like dense matrix-matrix multiply (and many neural net layer types), as well as local stream compaction for fine-grained dynamic collaboration between threads.

As such, all other compute APIs expose shuffle operations. Metal should as well. Without them, it is impossible to write competitive GEMM, conv layer, and other important kernels on most curent GPU architectures.


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!