[ld/as] Data reordering cannot be selectively disabled

Originator:jmaebe
Number:rdar://7201596 Date Originated:05-Sep-2009 05:30 PM
Status:Open Resolved:
Product:Developer Tools Product Version:3.2/ld64-95.2.12/cctools-750~70
Classification:Other bug Reproducible:Always
 
* Summary

Currently, the automatically activated data reordering functionality from Xcode 3.2's ld64 causes programs generated by our compiler to crash.

To solve this problem, and to prevent similar problems from happening in the future, I would like to ask for an assembler directive to either designate
a) symbols that must not be reordered by the linker, or
b) a group of symbols that must remain together in the original order at all times

Furthermore, and this is very important to us, it should be possible to use something like the following:

.if __NO_REORDER_SYM_SUPPORTED__
.noreorder sym1
.noreorder sym2
.noreorder sym3
.endif

This way, we can output code that will correctly assemble no matter which version of "as" our user is using.


Explanation:

Our compiler has support for a language features called "resource strings". Each string has a unique identifier, which makes them easy to replace with translated versions (much like the translation strings in plists).

The resource strings themselves are collected per unit in a table with the following structure in the assembler file:

***
.data
 global_unitx_start_label:
 [metadata]

 global_string1_label:
 [string 1 data]

 global_string2_label:
 [string 2 data]

 global_unitx_end_label:
***

I.e., there are global start and end symbols, and in between each string is also identified by a global symbol. This enables easily referring the string values from inside other compilation units.

The compiler also adds an array to the main program that collects references to all of these start and end labels:

.globl  FPC_RESOURCESTRINGTABLES
FPC_RESOURCESTRINGTABLES:
       .long   1                        // number of start/end pairs
       .long   global_unitx_start_label
       .long   global_unitx_end_label

When the program ends, code walks over this array and subsequently over all resource strings between the start and end labels to finalise the values assigned during run time.

It is clear that for this to work, the start and end labels as well as the individual resource string symbols must remain in their original order.

When compiling a dynamic library under Mac OS X 10.6, ld notices that the data following the start label and all string labels contains relocations and hence moves them to the start of the data section (as explained in the ld man page comments for the -no_order_data option). The end label remains behind, and as a result the code crashes when it walks over the array to finalise it.

What I therefore would need is a way to specify in the assembler file that particular symbols must not be reordered by the linker.

I can work around this particular problem by simply adding a dummy relocated value after the end label (then it will be moved along with the rest). I assume however that LD will become more aggressive regarding all kinds of reordering in the future, and this workaround will probably break at that moment.

I know about the "-order_file" linker parameter, but it is too blunt: I'm not interested in telling the linker *where* it should put this data, only that it should either leave it alone or keep it together.


* Steps to reproduce

Unpack the attached archive, then (x86_64 code)
1) cd ldreorderdata
2) ./build.sh -no_order_data
3) nm liblib.dylib |grep '_RESSTR_P\$TEST'|sort
4) ./prog
5) ./build.sh
6) nm liblib.dylib |grep '_RESSTR_P\$TEST'|sort
7) ./prog


* Expected results

The symbols displayed by nm should all be exactly 32 bytes (0x20 bytes) apart.

Furthermore, twice the following output from "prog":

firststring
secondstring


* Actual results

When linked with -no_order_data, everything is fine.

When linked without -no_order_data, the address for the end symbol is wrong:

00000000000269c0 D _RESSTR_P$TEST_START
00000000000269e0 D _RESSTR_P$TEST_FIRSTSTRING
0000000000026a00 D _RESSTR_P$TEST_SECONDSTRING
0000000000026c50 D _RESSTR_P$TEST_END

(0x26c50 - 0x26a00 == 0x250 != 0x20)

Due to this problem, the program crashes with a bus error:

$ ./prog 
firststring
secondstring
Bus error


* Regression

The problem does not occur with versions of Xcode before 3.2 (except possibly in the iPhone SDK), because they did not reorder data.


* Notes

I've worked around it in a currently unreleased version of our compiler by adding a relocated value after the end label. As mentioned in the summary, I assume things may however break again if ld starts reordering more aggressively.

Another possible workaround, as demonstrated in above, is to use -no_reorder_data. Since our compiler also runs on Mac OS X 10.3, that is not a very inviting approach since we'd somehow have to detect first whether the linker supports this option.

Comments

16-Sep-2009 09:17 AM Jonas Maebe

16-Sep-2009 09:17 AM Jonas Maebe:

Thanks for the feedback. I now understand why the linker figured it would be no problem to reorder the symbols, and I agree that in essence, this is a perfectly reasonable thing to do.

I'd like to make two remarks though:

a) the data structure does not assume that the section is atomic, only that the relative order of sections is constant (but I agree that this is also a wrong assumption to make). It will work perfectly fine if one of the resource string entries were stripped out by the linker in case it's unused (and in fact, that is the reason why it is structured like this, rather than with a single starting label and offsets to reference individual string entries). For some reason, the linker does not do this though, see below.

b) when .subsections_via_symbols was introduced, the ".reference" assembler directive was added at the same time to help massage code that would otherwise not work with this feature. I would really like to ask that similar functionality be added now (whose presence can be detected within the assembler code this time using a '.if functionality_present'-style expression, so we can generate code that is backwards compatible with previous Xcode assembler releases), so that it does not become an unfortunate choice for us between "don't have data ordering" and "don't have dead code stripping".

About the missing dead code (or rather data) stripping: I've attached a slightly modified copy of lib.s, from which I've removed all references to RESSTRP$TEST_FIRSTSTRING (namely the writeln from the initialisation section. I'd expect ld to remove this block of data from the linked library in response:

.globl RESSTRP$TEST_FIRSTSTRING RESSTRP$TEST_FIRSTSTRING: .quad $TEST$Ld6 .quad $TEST$Ld4 .quad $TEST$Ld4 .long 197560295,0

However, it does not do so. If you rebuild the library with

./build.sh -why_live '_RESSTR_P$TEST_FIRSTSTRING'

(mind the single quotes, because the symbol name contains a dollar sign) then all that ld says is:

0x1003012e0 RESSTRP$TEST_FIRSTSTRING from lib.o

I can't find where this reference comes from (there's no ".reference" referring to it either). As an aside, for '_RESSTR_P$TEST_FIRSTSTRING', it correctly mentions that it's still referenced from the main routine:

0x1003011f0 RESSTRP$TEST_SECONDSTRING from lib.o 0x100300f20 _P$TESTmain from lib.o

I can file a separate radar for this if you'd prefer, since I know how annoying it is if you can't close an issue because somewhat related things keep getting added to it. I'm just mentioning it in case I'm missing something obvious.

Thanks.

'lib.s' was successfully uploaded

15-Sep-2009 10:01 PM N. N. :

15-Sep-2009 10:01 PM KIT CHEUNG : Engineering has requested the following information in order to further investigate this issue:

In the file lib.s, if you remove the last line: .subsections_via_symbols Then the test case builds and runs correct, even without -no_order_data.

The .subsections_via_symbols flag tells the linker that each section can be broken down into chunks at label boundaries. Since your data structures assume entire sections will remain atomic, you should not use the .subsections_via_symbols directive.


Please note: Reports posted here will not necessarily be seen by Apple. All problems should be submitted at bugreport.apple.com before they are posted here. Please only post information for Radars that you have filed yourself, and please do not include Apple confidential information in your posts. Thank you!