UITextView can't handle emojis with more than one Unicode codepoint
Originator: | villev | ||
Number: | rdar://35050622 | Date Originated: | October 18 2017 |
Status: | Open | Resolved: | No |
Product: | UIKit or Swift compiler | Product Version: | |
Classification: | Reproducible: | Yes |
Summary: UITextView text count is incorrectly calculated when using emojis where a single emoji contains more than one unicode codepoint. When typing emoji "๐ฆ๐ฟ" to UITextView 64 times textView.text.characters.count will return the correct number: 64. However when typing 65 or more characters, the textView.text.characters.count will return double the correct amount: 65 -> 130, 66 -> 132 etc. If using "๐ฆ"-emoji, which only has a single Unicode codepoint, the calculation is correct. Steps to Reproduce: Create a textview and insert 65 or more "๐ฆ๐ฟ"-emojis in it and print textView.characters.count. Expected Results: 65 or more Actual Results: 130, 132, 134... Version/Build: XCode 9, Swift 3
Comments
Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!
Our team made some research on the issue.
Here is summary of the findings:
UITextView and UITextField seem to use NSBigMutableString as their internal buffer (i.e. when you invoke .text on them, thatโs what gets returned)
When an NSBigMutableString that contains less or 256 UTF-16 units gets bridged to a Swift String, it works correctly
When an NSBigMutableString that contains more than 256 UTF-16 units gets bridged to a Swift String, grapheme clustering is not performed correctly โ i.e. each code point gets converted to a Character, even ones that should be clustered together with previous code points (like variation selectors)