... Linkers and Loaders, chapter 10 .\" $Header: /usr/home/johnl/book/linker/RCS/linker10.txt,v 2.3 1999/06/15 03:30:36 johnl Exp $ .CH "Dynamic Linking and Loading" .I "$Revision: 2.3 $" .br .I "$Date: 1999/06/15 03:30:36 $" .P .BT .I "Dynamic linking" defers much of the linking process until a program starts running. It provides a variety of benefits that are hard to get otherwise: .BL .LI Dynamically linked shared libraries are easier to create than static linked shared libraries. .LI Dynamically linked shared libraries are easier to update than static linked shared libraries. .LI The semantics of dynamically linked shared libraries can be much closer to those of unshared libraries. .LI Dynamic linking permits a program to load and unload routines at runtine, a facility that can otherwise be very difficult to provide. .EL .P There are a few disadvantages, of course. The runtime performance costs of dynamic linking are substantial compared to those of static linking, since a large part of the linking process has to be redone every time a program runs. Every dynamically linked symbol used in a program has to be looked up in a symbol table and resolved. (Windows DLLs mitigate this cost somewhat, as we describe below.) Dynamic libraries are also larger than static libraries, since the dynamic ones have to include symbol tables. .ET .P Beyond issues of call compatibility, a chronic source of problems is changes in library semantics. Since dynamic shared libraries are so easy to update compared to unshared or static shared libraries, it's easy to change libraries that are in use by existing programs, which means that the behavior of those programs changes even though "nothing has changed". This is a frequent source of problems on Microsoft Windows, where programs use a lot of shared libraries, libraries go through a lot of versions, and library version control is not very sophisticated. Most programs ship with copies of all of the libraries they use, and installers often will inadvertently install an older version of a shared library on top of a newer one, breaking programs that are expecting features found in the newer one. Well-behaved applications pop up a warning before installing an older library over a newer one, but even so, programs that depend on semantics of older libraries have been known to break when newer versions replace the older ones. . .H1 "ELF dynamic linking" . Sun Microsystems' SunOS introduced dynamic shared libraries to UNIX in the late 1980s. UNIX System V Release 4, which Sun co-developed, introduced the ELF object format and adapted the Sun scheme to ELF. ELF was clearly an improvement over the previous object formats, and by the late 1990s it had become the standard for UNIX and UNIX like systems including Linux and BSD derivatives. . .H1 "Contents of an ELF file" . As mentioned in Chapter 3, an ELF file can be viewed as a set of .I sections , interpreted by the linker, or a set of .I segments , interpreted by the program loader. ELF programs and shared libraries have the same general structure, but with different sets of segments and sections. .P ELF shared libraries can be loaded at any address, so they invariably use position independent code (PIC) so that the text pages of the file need not be relocated and can be shared among multiple processes. As described in Chapter 8, ELF linkers support PIC code with a Global Offset Table (GOT) in each shared library that contains pointers to all of the static data referenced in the program, Figure \n+F. The dynamic linker resolves and relocates all of the pointers in the GOT. This can be a performance issue but in practice the GOT is small except in very large libraries; a commonly used version of the standard C library has only 180 entries in the GOT for over 350K of code. .P Since the GOT is in the same loadable ELF file as the code that references it, and the relative addresses within a file don't change regardless of where the program is loaded, the code can locate the GOT with a relative address, load the address of the GOT into a register, and then load pointers from the GOT whenever it needs to address static data. A library need not have a GOT if it references no static data, but in practice all libraries do. .P To support dynamic linking, each ELF shared libary and each executable that uses shared libraries has a Procedure Linkage Table (PLT). The PLT adds a level of indirection for function calls analogous to that provided by the GOT for data. The PLT also permits "lazy evaluation", that is, not resolving procedure addresses until they're called for the first time. Since the PLT tends to have a lot more entries than the GOT (over 600 in the C library mentioned above), and most of the routines will never be called in any given program, that can both speed startup and save considerable time overall. .FG \nF "PLT and GOT" picture of program with PLT .br picture of library with PLT and GOT .EF We discuss the details of the PLT below. .P An ELF dynamically linked file contains all of the linker information that the runtime linker will need to relocate the file and resolve any undefined symbols. The .T .dynsym section, the dynamic symbol table, contains all of the file's imported and exported symbols. The .T .dynstr and .T .hash sections contain the name strings for the symbol, and a hash table the runtime linker can use to look up symbols quickly. .P The final extra piece of an ELF dynamically linked file is the .T DYNAMIC segment (also marked as the .T .dynamic section) which runtime dynamic linker uses to find the information about the file the linker needs. It's loaded as part of the data segment, but is pointed to from the ELF file header so the runtime dynamic linker can find it. The DYNAMIC section is a list of tagged values and pointers. Some entry types occur just in programs, some just in libraries, some in both. .BL .LI NEEDED: the name of a library this file needs. (Always in programs, sometimes in libraries when one library is dependend on another, can occur more than once.) .LI SONAME: "shared object name", the name of the file the linker uses. (Libraries.) .LI SYMTAB, STRTAB, HASH, SYMENT, STRSZ,: point to the symbol table, associated string and hash tables, size of a symbol table entry, size of string table. (Both.) .LI PLTGOT: points to the GOT, or on some architectures to the PLT (Both.) .LI REL, RELSZ, and RELENT or RELA, RELASZ, and RELAENT: pointer to, number of, and size of relocation entries. REL entries don't contain addends, RELA entries do. (Both.) .LI JMPREL, PLTRELSZ, and PLTREL: pointer to, size, and format (REL or RELA) of relocation table for data referred to by the PLT. (Both.) .LI INIT and FINI: pointer to initializer and finalizer routines to be called at program startup and finish. (Optional but usual in both.) .LI A few other obscure types not often used. .EL An entire ELF shared library might look like Figure \n+F. First come the read-only parts, including the symbol table, PLT, text, and read-only data, then the read-write parts including regular data, GOT, and the dynamic section. The bss logically follows the last read-write section, but as always isn't present in the file. .FG \nF "An ELF shared library" (Lots of pointer arrows here) .sp read-only pages: .br \&.hash .br \&.dynsym .br \&.dynstr .br \&.plt .br \&.text .br \&.rodata .sp read-write pages: .br \&.data .br \&.got .br \&.dynamic .sp \&.bss .EF An ELF program looks much the same, but in the read-only segment has init and fini routines, and an INTERP section near the front of the file to specify the name of the dynamic linker (usually .T ld.so ). The data segment has no GOT, since program files aren't relocated at runtime. . .H1 "Loading a dynamically linked program" . Loading a dynamically linked ELF program is a lengthy but straightforward process. . .H2 "Starting the dynamic linker" . When the operating system runs the program, it maps in the file's pages as normal, but notes that there's an INTERPRETER section in the executable. The specified interpreter is the dynamic linker, ld.so, which is itself in ELF shared library format. Rather than starting the program, the system maps the dynamic linker into a convenient part of the address space as well and starts ld.so, passing on the stack an .I "auxiliary vector" of information needed by the linker. The vector includes: .BL .LI AT_PHDR, AT_PHENT, and AT_PHNUM: The address of the program header for the program file, the size of each entry in the header, and the number of entries. This structure describes the segments in the loaded file. If the system hasn't mapped the program into memory, there may instead be a AT_EXECFD entry that contains the file descriptor on which the program file is open. .LI AT_ENTRY: starting address of the program, to which the dynamic linker jumps after it has finished initialization. .LI AT_BASE: The address at which the dynamic linker was loaded .EL .P At this point, bootstrap code at the beginning of ld.so finds its own GOT, the first entry in which points to the DYNAMIC segment in the ld.so file. From the dynamic segment, the linker can find its own relocation entries, relocate pointers in its own data segment, and resolve code references to the routines needed to load everything else. (The Linux ld.so names all of the essential routines with names starting with .T _dt _ , and special-case code looks for symbols that start with the string and resolves them.) .P The linker then initializes a chain of symbol tables with pointers to the program's symbol table and the linker's own symbol table. Conceptually, the program file and all of the libraries loaded into a process share a single symbol table. But rather than build a merged symbol table at runtime, the linker keeps a linked list of the symbol tables in each file. each file contains a hash table to speed symbol lookup, with a set of hash headers and a hash chain for each header. The linker can search for a symbol quickly by computing the symbol's hash value once, then running through apprpriate hash chain in each of the symbol tables in the list. . .H2 "Finding the libraries" . Once the linker's own initializations are done, it finds the names of the libraries required by the program. The program's program header has a pointer to the "dynamic" segment in the file that contains dynamic linking information. That segment contains a pointer, DT_STRTAB, to the file's string table, and entries DT_NEEDED each of which contains the offset in the string table of the name of a required library. .P For each library, the linker finds the library's ELF shared library file, which is in itself a fairly complex process. The library name in a DT_NEEDED entry is something like .I libXt.so.6 (the Xt toolkit, version 6.) The library file might in in any of several library directories, and might not even have the same file name. On my system, the actual name of that library is /usr/X11R6/lib/libXt.so.6.0, with the ``.0'' at the end being a minor version number. .P The linker looks in these places to find the library: .BL .LI If the dynamic segment contains an entry called DT_RPATH, it's a colon-separated list of directories to search for libraries. This entry is added by a command line switch or environment variable to the regular (not dynamic) linker at the time a program is linked. It's mostly used for subsystems like databases that load a collection of programs and supporting libraries into a single directory. .LI If there's an environment symbol LD_LIBRARY_PATH, it's treated as a colon-separated list of directories in which the linker looks for the library. This lets a developer build a new version of a library, put it in the LD_LIBRARY_PATH and use it with existing linked programs either to test the new library, or equally well to instrument the behavior of the program. (It skips this step if the program is set-uid, for security reasons.) .LI The linker looks in the library cache file .T /etc/ld.so.conf which contains a list of library names and paths. If the library name is present, it uses the corresponding path. This is the usual way that most libraries are found. (The file name at the end of the path need not be exactly the same as the library name, see the section on library versions, below.) .LI If all else fails, it looks in the default directory /usr/lib, and if the library's still not found, displays an error message and exits. .EL .P Once it's found the file containing the library, the dynamic linker opens the file, and reads the ELF header to find the program header which in turn points to the file's segments including the dynamic segment. The linker allocates space for the library's text and data segments and maps them in, along with zeroed pages for the bss. From the library's dynamic segment, it adds the library's symbol table to the chain of symbol tables, and if the library requires further libraries not already loaded, adds any new libraries to the list to be loaded. .P When this process terminates, all of the libraries have been mapped in, and the loader has a logical global symbol table consisting of the union of all of the symbol tables of the program and the mapped library. . .H2 "Shared library initialization" . Now the loader revisits each library and handles the library's relocation entries, filling in the library's GOT and performing any relocations needed in the library's data segment. Load-time relocations on an x86 include: .BL .LI R_386_GLOB_DAT, used to initialize a GOT entry to the address of a symbol defined in another library. .LI R_386_32, a non-GOT reference to a symbol defined in another library, generally a pointer in static data. .LI R_386_RELATIVE, for relocatable data references, typically a pointer to a string or other locally defined static data. .LI R_386_JMP_SLOT, used to initialize GOT entries for the PLT, described later. .EL .P If a library has an .T .init section, the loader calls it to do library-specific initializations, such as C++ static constructors, and any .T .fini section is noted to be run at exit time. (It doesn't do the init for the main program, since that's handled in the program's own startup code.) When this pass is done, all of the libraries are fully loaded and ready to execute, and the loader calls the program's entry point to start the program. . .H1 "Lazy procedure linkage with the PLT" . Programs that use shared libraries generally contain calls to a lot of functions. In a single run of the program many of the functions are never called, in error routines or other parts of the program that aren't used. Furthermore, each shared library also contains calls to functions in other libraries, even fewer of which will be executed in a given program run since many of them are in routines that the program never calls either directly or indirectly. .P To speed program startup, dynamically linked ELF programs use lazy binding of procedure addresses. That is, the address of a procedure isn't bound until the first time the procedure is called. .P ELF supports lazy binding via the Procedure Linkage Table, or PLT. Each dynamically bound program and shared library has a PLT, with the PLT containing an entry for each non-local routine called from the program or library, Figure \n+F. Note that the PLT in PIC code is itself PIC, so it can be part of the read-only text segment. .FG \nF "PLT structure in x86 code" Special first entry .BC PLT0: pushl GOT+4 jmp *GOT+8 .EC Regular entries, non-PIC code: .BC PLTn: jmp *GOT+m push #reloc_offset jmp PLT0 .EC Regular entries, PIC code: .BC PLTn: jmp *GOT+m(%ebx) push #reloc_offset jmp PLT0 .EC .EF All calls within the program or library to a particular routine are adjusted when the program or library is built to be calls to the routine's entry in the PLT. The first time the program or library calls a routine, the PLT entry calls the runtime linker to resolve the actual address of the routine. After that, the PLT entry jumps directly to the actual address, so after the first call, the cost of using the PLT is a single extra indirect jump at a procedure call, and nothing at a return. .P The first entry in the PLT, which we call PLT0, is special code to call the dynamic linker. At load time, the dynamic linker automatically places two values in the GOT. At GOT+4 (the second word of the GOT) it puts a code that identifies the particular library. At GOT+8, it puts the address of the dynamic linker's symbol resolution routine. .P The rest of the entries in the PLT, which we call PLTn, each start with an indirect jump through a GOT entry. Each PLT entry has a corresponding GOT entry which is initially set to point to the push instruction in the PLT entry that follows the jmp. (In a PIC file this requires a loadtime relocation, but not an expensive symbol lookup.) Following the jump is a push instruction which pushes a relocation offset, the offset in the file's relocation table of a special relocation entry of type .T R_386_JMP_SLOT . The relocation entry's symbol reference points to the symbol in the file's symbol table, and its address points to the GOT entry. .P This compact but rather baroque arragement means that the first time the program or library calls a PLT entry, the first jump in the PLT entry in effect does nothing, since the GOT entry through which it jumps points back into the PLT entry. Then the push instruction pushes the offset value which indirectly identifies both the symbol to resolve and the GOT entry into which to resolve it, and jumps to PLT0. The instructions in PLT0 push another code that identifies which program or library it is, and then jumps into stub code in the dynamic linker with the two identifying codes at the top of the stack. Note that this was a jump, rather than a call, above the two identifying words just pushed is the return address back to the routine that called into the PLT. .P Now the stub code saves all the registers and calls an internal routine in the dynamic linker to do the resolution. the two identifying words suffice to find the library's symbol table and the routine's entry in that symbol table. The dynamic linker looks up the symbol value using the concatenated runtime symbol table, and stores the routine's address into the GOT entry. Then the stub code restores the registers, pops the two words that the PLT pushed, and jumps off to the routine. The GOT entry having been updated, subsequent calls to that PLT entry jump directly to the routine itself without entering the dynamic linker. . .H1 "Other peculiarities of dynamic linking . The ELF linker and dynamic linker have a lot of obscure code to handle special cases and try and keep the runtime semantics as similar as possible to whose of unshared libraries. . .H2 "Static initializations" . If a program has an external reference to a global variable defined in a shared library, the linker has to create in the program a copy of the variable, since program data addresses have to be bound at link time, Figure \n+F. This poses no problem for the code in the shared library, since the code can refer to the variable via a GOT pointer which the dynamic linker can fix up, but there is a problem if the library initializes the variable. To deal with this problem, the linker puts an entry in the program's relocation table (which otherwise just contains .T R_386_JMP_SLOT , .T R_386_GLOB_DAT , .T R_386_32 , and .T R_386_RELATIVE entries) of type .T R_386_COPY that points to the place in the program where the copy of the variable is defined, and tells the dynamic linker to copy the initial value of that word of data from the shared library. .FG \nF "Global data initialization Main program: .BC extern int token; .EC Routine in shared library: .BC int token = 42; .EC .EF .P Although this feature is essential for certain kinds of code, it occurs very rarely in practice. This is a band-aid, since it only works for single word data. The initializers that do occur are always pointers to procedures or other data, so the band-aid suffices. . .H2 "Library versions" . Dynamic libraries are generally named with major and minor versions numbers, like .T libc.so.1.1 but programs should be bound only to major version numbers like .T libc.so.1 since minor versions are supposed to be upward compatible. .P To keep program loading reasonably fast, the system manager maintains a cache file containing the full pathname most recent version of each library, which is updated by a configuration program whenever a new library is installed. .P To support this design, each dynamically linked library can have a "true name" called the .I SONAME assigned at library creation time. For example, the library called .T libc.so.1.1 would have a SONAME of .T libc.so.1 . (The SONAME defaults to the library's name.) When the linker builds a program that uses shared libraries, it lists the SONAMEs of the libraries it used rather than the actual names of the libraries. The cache creation program scans all of the directories that contain shared libraries, finds all of the shared libraries, extracts the SONAME from each one, and where there are multiple libraries with the same SONAME, discards all but the highest version number. Then it writes the cache file with SONAMEs and full pathnames so at runtime the dynamic linker can quickly find the current version of each library. . .H1 "Dynamic loading at runtime" . Although the ELF dynamic linker is usually called implcitly at program load time and from PLT entries, programs can also call it explicitly using .T dlopen() to load a shared library and .T dlsym() to find the address of a symbol, usually a procedure to call. Those two routines are actually simple wrappers that call back into the dynamic linker. When the dynamic linker loads a library via .T dlopen() , it does the same relocation and symbol resolution it does on any other library, so the dynamically loaded program can without any special arrangements call back to routines already loaded and refer to global data in the running program. .P This permits users to add extra functionality to programs without access to the source code of the programs and without even having to stop and restart the programs (useful when the program is something like a database or a web server.) Mainframe operating systems have provided access to "exit routines" like this since at least the early 1960s, albeit without such a convenient interface, and it's long been a way to add great flexibility to packaged applications. It also provides a way for programs to extend themselves; there's no reason a program couldn't write a routine in C or C++, run the compiler and linker to create a shared library, then dynamically load and run the new code. (Mainframe sort programs have linked and loaded custom inner loop code for each sort job for decades.) . .H1 "Microsoft Dynamic Link Libraries" . Microsoft Windows also provides shared libraries, called dynamic-link libraries or DLLs in a fashion similar to but somewhat simpler than ELF shared libraries. The design of DLLs changed substantially between the 16 bit Windows 3.1 and the 32 bit Windows NT and 95. This discussion addresses only the more modern Win32 libraries. DLLs import procedure addresses using a PLT-like scheme. Although the design of DLLs would make it possible to import data addresses using a GOT-like scheme, in practice they use a simpler scheme that requires explicit program code to dereference imported pointers to shared data. .P In Windows, both programs and DLLs are PE format (portable executable) files are intended to be memory mapped into a process. Unlike Windows 3.1, where all applications shared a single address space, Win32 gives each application its own address space and executables and libraries are mapped into each address space where they are used. For read-only code this doesn't make any practical difference, but for data it means that each application using a DLL gets its own copy of the DLL's data. (That's a slight oversimplification, since PE files can mark some sections as shared data with a single copy shared among all applications that use the file, but most data is unshared.) .P Loading a Windows executable and DLLs is similar to loading a dynamically linked ELF program, although in the Windows case the dynamic linker is part of the kernel. First the kernel maps in the executable file, guided by section info in the PE headers. Then it maps in all of the DLLs that the executable refers to, again guided by the PE headers in each DLL. .P PE files can contain relocation entries. An executable generally won't contain them and so has to be mapped at the address for which it was linked. DLLs all do contain relocation entries, and are relocated when they're mapped in if the address space for which they were linked isn't available. (Microsoft calls runtime relocation .I rebasing .) .P All PE files, both executables and DLLs, have an entry point, and the loader calls a DLL's entry point when the DLL is loaded, when the DLL is unloaded, and each time a process thread attaches to or detaches from the DLL. (The loader passes an argument to say why it's making each call.) This provides a hook for static initializers and destructors analogous to the ELF .T .init and .T .fini sections. . .H2 "Imported and exported symbols in PE files" . PE supports shared libraries with two special sections of the file, .T .edata , for exported data, that lists the symbols exported from a file, and .T .idata , that lists the symbols imported into a file. Program files generally have only an .T .idata section, while DLLs always have an .T .edata and may have a .T .idata if they use other DLLs. Symbols can be exported either by symbol name, or by "ordinal", a small integer that gives the index of the symbol in the export address table. Linking by ordinals is slightly more efficient since it avoids a symbol lookup, but considerably more error prone since it's up to the person who builds a DLL to ensure that ordinals stay the same from one library version to another. In practice ordinals are usually used to call system services that rarely change, and names for everything else. .P The .T .edata section contains an export directory table that describes the rest of the section, followed by the tables that define the exported symbols, Figure \n+F. .FG \nF "Structure of .edata section" export directory pointing to: .br export address table .br ordinal table .br name pointer table .br name strings .EF The export address table contains the RVA (relative virtual address, relative to the base of the PE file) of the symbol. If the RVA points back into the .T .edata section, it's a "forwarder" reference, and the value pointed to is a string naming the symbol to use to satisfy the reference, probably defined in a different DLL. The ordinal and name pointer tables are parallel, with each entry in the name pointer table being the RVA of the name string for the symbol, and the ordinal being the index in the export address table. (Ordinals need not be zero-based; the ordinal base to subtract from ordinal values to get the index in the export address table is stored in the export directory and is most often 1.) Exported symbols need not all have names, although in practice they always do. The symbols in the name pointer table are in alphabetical order to permit the loader to use a binary search. .P The .T .idata section does the converse of what the .T .edata section does, by mapping symbols or ordinals back into virtual addresses. The section consists of a null-terminated array of import directory tables, one per DLL from which symbols are imported, followed by an import lookup table per DLL, followed by a name table with hints, Figure \n+F. .FG \nF "Structure of .idata section" array of import directory tables, with lotsa arrows .br each has import lookup table RVA, time/date stamp, forwarder chain (unused?), DLL name, import address RVA table .br NULL .br import table, entries with high bit flag (table per DLL) .br hint/name table .EF For each imported DLL, there is an array of import addresses, typically in the program's text segment, into which the program loader places the resolved addresses. The import lookup table identifies the symbols to import, with the entries in the import lookup table being parallel to those in the import address table. The lookup table consists of 32 bit entries. If the high bit of an entry is set, the low 31 bits are the ordinal of the symbol to import, otherwise the entry is the RVA of an entry in the hint/name table. Each hint/name entry consists of a four-byte hint that guesses the index of the symbol in the DLL's export name pointer table, followed by the null terminated symbol name. The program loader uses the hint to probe the export table, and if the symbol name matches, it uses that symbol, otherwise it binary searches the entire export table for the name. (If the DLL hasn't changed, or at least its list of exported symbols hasn't changed, since the program that uses the DLL was linked, the guess will be right.) .P Unlike ELF imported symbols, the values of symbols imported via .T .idata are only placed in the import address table, not fixed up anywhere else in the importing file. For code addresses, this makes little difference. When the linker builds an executable or DLL, it creates in the text section a table of misnamed "thunks", indirect jumps through the entries in the import address table, and uses the addresses of the thunks as the address of the imported routine, which is transparent to the programmer. (The thunks as well as most of the data in the .T .idata section actually come from a stub library created at the same time as the DLL.) In recent versions of Microsoft's C and C++ compiler, if the programmer knows that a routine will be called in a DLL, the routine can be declared "dllimport", and the compiler will emit an indirect call to the address table entry, avoiding the extra indirect jump. For data addresses, the situation is more problematical, since it's harder to hide the extra level of indirection required to address a symbol in another executable. Traditionally, programmers just bit the bullet and explicitly declared imported variables to be pointers to the real values and explicitly dereferencd the pointers. Recent versions of Microsoft's C and C++ compiler also let the programmer declare global data to be "dllimport" and the compiler will emit the extra pointer deferences, much like ELF code that references data indirectly via pointers in the GOT. . .H2 "Lazy binding" . Recent versions of Windows compilers have added delay loaded imports to permit lazy symbol binding for procedures, somewhat like the ELF PLT. A delay-loaded DLL has a structure similar to the .T .idata import directory table, but not in the .I .idata section so the program loader doesn't handle it automatically. The entries in the import address table initially all point to a helper routine that finds and loads the DLL and replaces the contents of the address table with the actual addresses. The delay-loaded directory table has a place to store the original contents of the import address table so the values can be put back if the DLL is later unloaded. Microsoft provides a standard helper routine, but its interfaces are documented and programmers can write their own versions if need be. .P Windows also permits programs to load and unload DLLs explicitly using .T LoadLibrary and .T FreeLibrary , and to find addresses of symbols using .T GetProcAddress . . .H2 "DLLs and threads" . One area in which the Windows DLL model doesn't work particularly well is thread local storage. A Windows program can start multiple threads in the same process, which share the process' address space. Each thread has a small chunk of thread local storage (TLS) to keep data specific to that thread, such as pointers to data structures and resources that the thead is using. The TLS needs "slots" for the data from the executable and from each DLL that uses TLS. The Windows linker can create a .tls section in a PE executable, that defines the layout for the TLS needed by routines in the executable and any DLLs to which it directly refers. Each time the process creates a thread, the new thread gets its own TLS, created using the .tls section as a template. .P The problem is that most DLLs can either be linked implicitly from the executable, or loaded explicitly with .T LoadLibrary . DLLs loaded explicitly don't automatically get .tls storage, and since a DLL's author can't predict whether a library will be invoked implicitly or explicitly, it can't depend on the .tls section. .P Windows defines runtime system calls that allocate slots at the end of the TLS. DLLs use those calls rather than .tls unless the DLL is known only to be invoked implicitly. .H1 "OSF/1 pseudo-static shared libraries" . OSF/1, the ill-fated UNIX variant from the Open Software Foundation, used a shared library scheme intermediate between static and dynamic linking. Its authors noted that static linking is a lot faster than dynamic since less relocation is needed, and that libraries are updated infrequently enough that system managers are willing to endure some pain when they update shared libraries, although not the agony of relinking every executable program in the entire system. .P So OSF/1 took the approach of maintaining a global symbol table visible to all processes, and loaded all the shared libraries into a sharable address space at system boot time. This assigned all of the libraries addresses that wouldn't change while the system was running. Each time a program started, if it used shared libraries, it would map in the shared libraries and symbol table and resolve undefined references in the executable using the global symbol table. No load-time relocation was ever required since programs were all linked to load in a part of the address space that was guaranteed to be available in each process, and the library relocation had already happened when they were loaded at boot time. .P When one of the shared libraries changed, the system just had to be rebooted normally, at which point the system loaded the new libraries and created a new symbol table for executables to use. .P This scheme was clever, but it wasn't very satisfactory. For one thing, processing symbol lookups is considerably slower than processing relocation entries, so avoiding relocation wasn't that much of a performance advantage. For another, dynamic linking provides the ability to load and run a library at runtime, and the OSF/1 scheme didn't provide for that. . .H1 "Making shared libraries fast" . .BT Shared libraries, and ELF shared libraries in particular, can be very slow. The slowdowns come from a variety of sources, several of which we mentioned in Chapter 8: .BL .LI Load-time relocation of libraries .LI Load-time symbol resolution in libraries and executables .LI Overhead due to PIC function prolog code .LI Overhead due to PIC indirect data references .LI Slower code due to PIC reserved addressing registers .EL The first two problems can be ameliorated by caching, the latter two by retreating from pure PIC code. .P .ET On modern computers with large address spaces, it's usually possible to choose an address range for a shared library that's available in all or at least most of the processes that use the library. One very effective technique is similar to the Windows approach. Either when the library is linked or the first time a library is loaded, tentatively bind its addresses to a chunk of address space. After that, each time a program links to the library, use the same addresses of possible, which means that no relocation will be necessary. If that address space isn't available in a new process, the library is relocated as before. .P SGI systems use the term .I QUICKSTART to describe the process of pre-relocating objects at linktime, or in a separate pass over the shared library. BeOS caches the relocated library the first time it's loaded into a process. If multiple libraries depend on each other, in principle it should be possible to pre-relocate and then pre-resolve symbol references among libraries, although I'm not aware of any linkers that do so. .P If a system uses pre-relocated libraries, PIC becomes a lot less important. All the processes that load a library at its pre-relocated address can share the library's code whether it's PIC or not, so a non-PIC library at a well-chosen address can in practice be as sharable as PIC without the performance loss of PIC. This is basically the static linked library approach from Chapter 9, except that in case of address space collisions, rather than the program failing the dynamic linker moves the libraries at some loss of performance. Windows uses this approach. .P BeOS implements cached relocated libraries with great thoroughness, including preserving correct semantics when libaries change. When a new version of a library is installed BeOS notes the fact and creates a new cached version rather than using the old cached version when programs refer to the library. Library changes can have a ripple effect. When library A refers to symbols in library B and B is updated, a new cached version of A will also have to be created if any of the referenced symbols in B have moved. This does make the programmer's life easier, but it's not clear to me that libraries are in practice updated often enough to merit the considerable amount of system code needed to track library updates. . .H1 "Comparison of dynamic linking approaches" . The Unix/ELF and Windows/PE dynamic linking differ in several interesting ways. .P The ELF scheme uses a single name space per program, while the PE scheme uses a name space per library. An ELF executable lists the symbols it needs and the libraries it needs, but it doesn't record which symbol is in which library. A PE file, on the other hand, lists the symbols to import from each library. The PE scheme is less flexible but also more resistant to inadvertent spoofing. Imagine that an executable calls routine AFUNC which is found in library A and BFUNC which is found in library B. If a new version of library A happens to define its own BFUNC, an ELF program could use the new BFUNC in preference to the old one, while a PE program wouldn't. This is a problem with some large libraries; one partial solution is to use the poorly documented DT_FILTER and DT_AUXILIARY fields to tell the dynamic linker what libraries this one imports symbols from, so the linker will search those libraries for imported symbols before searching the executable and the rest of the libraries. The DT_SYMBOLIC field tells the dynamic linker to search the library's own symbol table first, so that other libraries cannot shadow intra-library references. (This isn't always desirable; consider the malloc hack described in the previous chapter.) These ad-hoc approaches make it less likely that symbols in unrelated libraries will inadvertently shadow the correct symbols, but they're no substitude for a hierarchical link-time name space as we'll see in Chapter 11 that Java has. .P The ELF scheme tries considerably harder than the PE scheme to maintain the semantics of static linked programs. In an ELF program, references to data imported from another library are automatically resolved, while a PE program needs to treat imported data specially. The PE scheme has trouble comparing the values of pointers to functions, since the address of an imported function is the address of the "thunk" that calls it, not the address of the actual function in the other library. ELF handles all pointers the same. .P At run-time, nearly all of the Windows dynamic linker is in the operating system, while the ELF dynamic linker runs entirely as part of the application, with the kernel merely mapping in the initial files. The Windows scheme is arguably faster, since it doesn't have to map and relocate the dynamic linker in each process before it starts linking. The ELF scheme is definitely a lot more flexible. Since each executable names the "interpreter" program (now always the dynamic linker named ld.so) to use, different executables could use different interpreters without requring any operating system changes. In practice, this makes it easier to support executables from variant versions of Unix, notably Linux and BSD, by making a dynamic linker that links to compatibility libraries that support non-native executables. . .H1 Exercises . In ELF shared libraries, libraries are often linked so that calls from one routine to another within a single shared library go through the PLT and have their addresses bound at runtime. Is this useful? Why or why not? .P Imagine that a program calls a library routine .T plugh() that is found in a shared library, and the programmer builds a dynamically linked program that uses that library. Later, the system manager notices that .T plugh is a silly name for a routine and installs a new version of the library that calls the routine .T xsazq instead. What happens when the next time the programmer runs the program? .P If the runtime environment variable .T LD_BIND_NOW is set, the ELF dynamic loader binds all of the program's PLT entries at load time. What would happen in the situtation in the previous problem if .T LD_BIND_NOW were set? .P Microsoft implemented lazy procedure binding without operating system assistance by adding some extra cleverness in the linker and using the existing facilities in the operating system. How hard would it be to provide transparent access to shared data, avoiding the extra level of pointers that the current scheme uses? . .H1 Project . It's impractical to build an entire dynamic linking system for our project linker, since much of the work of dynamic linking happens at runtime, not link time. Much of the work of building a shared library was already done in the project 8-3 that created PIC executables. A dynamically linked shared library is just a PIC executable with a well-defined list of imported and exported symbols and a list of other libraries on which it depends. To mark the file as a shared library or an executable that uses shared libraries, the first line is: .BC LINKLIB lib1 lib2 ... or LINK lib1 lib2 ... .EC .br where the lib's are the names of other shared libraries on which this one depends. .P .I "Project 10-1:" Starting with the version of the linker from project 8-3, extend the linker to produce shared libraries and executables that need shared libraries. The linker needs to take as its input a list of input files to combine into the output executable or library, as well as other shared libraries to search. The output file contains a symbol table with defined (exported) and undefined (imported) symbols. Relocation types are the ones for PIC files along with AS4 and RS4 for references to imported symbols. .P .I "Project 10-2:" Write a run-time binder, that is, a program that takes an executable that uses shared libraries and resolves its references. It should read in the executable, then read in the necessary libraries, relocating them to non-overlapping available addresses, and creating a logically merged symbol table. (You may want to actually create such a table, or use a list of per-file tables as ELF does.) Then resolve all of the relocations and external references. When you're done, all code and data should be assigned memory addresses, and all addresses in the code and data should be resolved and relocated to the assigned addresses.