macOS fails to include apparent ligatures in pasted text

Originator:latext
Number:rdar://30480068 Date Originated:2017-02-11
Status:Open Resolved:
Product:macOS + SDK Product Version:10.12.3
Classification:Usability Reproducible:Yes
 
Area:
Preview 9.0

Summary:
If I select a block of text in a PDF in Preview and copy it to the Clipboard, when I paste the text elsewhere, the ligatures (such as "fi", "fl", etc.) are missing altogether.

Steps to Reproduce:
1. Decompress attached archive (MRFP_June 2015_AFM982 8 15 FRENCH.pdf.zip) to obtain PDF file (MRFP_June 2015_AFM982 8 15 FRENCH.pdf). DO NOT DO ANYTHING TO PDF FILE (rename, copy, annotated, etc.). Leave it as it is, where it is.
2. Open PDF file in Preview.
3. Go to page 37 of PDF.
4. Select heading text "Fonds canadien à versement fixe imaxx".
5. Press command-C to copy the text.  
6. Switch to TextEdit.
7. Open a new document in TextEdit.
8. Press command-P to paste the text.

Expected Results:
I expect macOS to insert the text "Fonds canadien à versement fixe imaxx"

Actual Results:
macOS inserts "Fonds canadien à versement xe imaxx"

(the "fi" in "fixe" is missing)

Version:
10.12.3 (16D32)

Notes:
Two very weird things:

*1* In the PDF, the "fi" in "fixe" is NOT EVEN A LIGATURE. Yet somehow Preview treats it as one, and fails to copy it properly.

*2* Even weirder: If I do ANYTHING to the PDF file in Sierra, i.e. edit its name, annotate the PDF, etc., then the problem disappears, which suggests that the changes somehow cause Sierra to reindex the text, and then "fi" no longer causes problems.

See for example "MRFP_June 2015_AFM982 8 15 FRENCH ANNOTATED.pdf" (also attached). Same PDF, with a red rectangle added on the front page.

You can also see the problem with "fi" in "fixe" by using double-click and drag to select word by word. Somehow "fi" and "xe" are treated as TWO WORDS.

Configuration:
2014 Mac Pro with 32 GB of RAM running OS X 10.12.3 + Sharp PN-K321 Display and Apple Cinema HD Display 

Attachments:
https://www.dropbox.com/s/1lhgjrcm3ld7xkk/MRFP_June%202015_AFM982%208%2015%20FRENCH.pdf.zip?dl=0
https://www.dropbox.com/s/84s1fd5ljtieyy3/MRFP_June%202015_AFM982%208%2015%20FRENCH%20ANNOTATED.pdf?dl=0

Comments

Reproducible also in 10.11.6

The same issue is still present also in 10.11.6 with Preview 8.1 . Moreover, using XeLaTeX it is possible to provide a much shorter MWE, available at this URL: https://www.dropbox.com/s/lihs13fy794gwt6/ligatures_mwe.pdf?dl=1

When copied from Preview, the text reads as: This text should be entirely copiable, but it is not: gure literature ligature nal.

The same text is perfectly copiable with Acrobat Reader XI.

Just filed another radar with this MWE attached rdar://31065125 .


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!