... Linkers and Loaders, chapter 3
.\" $Header: /usr/home/johnl/book/linker/RCS/linker03.txt,v 2.6 1999/06/29 04:21:48 johnl Exp $
.CH "Object Files"
.I "$Revision: 2.6 $"
.br
.I "$Date: 1999/06/29 04:21:48 $"
.P
.BT
Compilers and assemblers create object files containing the generated
binary code and data for a source file.
Linkers combine multiple object files into one, loaders take object
files and load them into memory.
(In an integrated programming environment, the compilers, assemblers, and linkers
are run implicitly when the user tells it to build a program, but they're there
under the covers.)
In this chapter we delve into the details of object file formats and
contents.
.ET
.
.H1 "What goes into an object file?"
An object file contains five kinds of information.
.BL
.LI
.I "Header information:"
overall information about the file, such as the
size of the code, name of the source file it was translated from,
and creation date.
.LI
.I "Object code:"
Binary instructions and data generated by a compiler or assembler.
.LI
.I "Relocation:"
A list of the places in the object code that have to be fixed up when
the linker changes the addresses of the object code.
.LI
.I "Symbols:"
Global symbols defined in this module, symbols to be imported from
other modules or defined by the linker.
.LI
.I "Debugging information:"
Other information about the object code not needed for linking but of
use to a debugger.
This includes source file and line number information, local symbols,
descriptions of data structures used by the object code such as C
structure definitions.
.EL
(Some object files contain even more than this, but these are plenty
to keep us occupied in this chapter.)
.P
Not all object formats contain all of these kinds of information, and
it's possible to have quite useful formats with little or no
information beyond the object code.
.
.H2 "Designing an object format"
.
The design of an object format is a compromise driven by the various
uses to which an object file is put.
A file may be
.I linkable ,
used as input by a link editor or linking loader.
It my be
.I executable ,
capable of being loaded into memory and run as a program,
.I loadable ,
capable of being loaded into memory as a library along with a program,
or any combination of the three.
Some formats support just one or two of these uses, others support all three.
.P
A linkable file contains extensive symbol and relocation information needed
by the linker along with the object code.
The object code is often divided up into many small logical segments that
will be treated differently by the linker.
An executable file contains object code, usually page aligned to permit the
file to be mapped into the address space, but doesn't need any symbols (unless
it will do runtime dynamic linking), and needs little or no relocation
information.
The object code is a single large segment or a small set of segments that
reflect the hardware execution environment, most often read-only vs.
read-write pages.
Depending on the details of a system's runtime environment,
a loadable file may consist solely of object code, or may contain
complete symbol and relocation information to permit runtime symbolic
linking.
.P
There is some conflict among these applications.
The logically oriented grouping of linkable segments rarely matches
the hardware oriented grouping of executable segments.
Particularly on smaller computers, linkable files are read and written
by the linker a piece at a time, while executable files are loaded in
their entirely into main memory.
This distinction is most obvious in the completely different MS-DOS
linkable OMF format and executable EXE format. 
.P
We'll tour a series of popular formats, starting with the simplest,
and working up to the most complicated.
.H1 "The null object format: MS-DOS .COM files"
.
It's quite possible to have a usable object file with no information
in it whatsoever other than the runnable binary code.
The MS-DOS .COM format is the best-known example.
A .COM file literally consists of nothing other than binary code.
When the operating system runs a .COM file, it merely loads the
contents of the file into a chunk of free memory starting at offset
0x100, (0-FF are the, PSP, Program Segment Prefix with command line
arguments and other parameters),
sets the x86 segment registers all to point to the PSP,
the SP (stack pointer) register to the end of the segment,
since the stack grows downward,
and jumps to the beginning of the loaded program.
.P
The segmented architecture of the x86 makes this work.
Since all x86 program addresses are interpreted relative to the base
of the current segment and the
segment registers all point to base of the segment, the
program is always loaded at segment-relative location 0x100.
Hence, for a program that fits in a single segment, no fixups are
needed since segment-relative addresses can be determined at link time.
.P
For programs that don't fit in a single segment, the fixups are the
programmer's problem, and there are indeed programs that start out by
fetching one of their segment registers, and adding its contents to
stored segment values elsewhere in the program.
Of course, this is exactly the sort of tedium that linkers and loaders
are intended to automate, and MS-DOS does that with .EXE files,
described later in this chapter.
.
.H1 "Code sections: Unix a.out files"
.
Computers with hardware memory relocation (nearly all of them, these
days) usually create a new process with an empty address space for each
newly run program, in which case programs can be linked to start at a
fixed address and require no relocation at load time.
The Unix a.out object format handles this situation.
.P
In the simplest case, an a.out file consisted of a small header
followed by the executable code (called the text section for
historical reasons) and the initial values for static data, Figure \n+F.
The PDP-11 had only 16 bit addressing, which limited programs to a
total of 64K. 
This limit quickly became too small, so later models in the PDP-11
line provided separate address spaces for code (I for Instruction
space) and data (D space), so a single program could contain both 64K
of code and 64K of data.
To support this feature, the compilers, assembler, and linker
were modified to create two-section object files, with the code in
the first section and the data in the second section, and the program
loader loaded the first section into a process' I space and the
second into the D space.
.FG \nF "Simplifed a.out"
a.out header

text section

data section

other sections
.EF
.P
Separate I and D space had another performance advantage: since a
program couldn't change its own I space, multiple copies of a single
program could share a single copy of a program's code, while keeping
separate copies of the program's data.
On a time-shared system like Unix, multiple copies of the shell (the
command interpreter) and network daemons are common, and shared
program code saves considerable real memory.
.P
The only currently common computer that still uses separate addressing
for code and data is the 286 (or 386 in 16 bit protected mode).
Even on more modern machines with large address spaces, the operating system
can handle shared read-only code pages in virtual memory much more efficiently
than read/write pages, so all modern loaders support them.
This means that linker formats must at the least mark read-only
versus read-write sections.
In practice, most linker formats have many sections, such as read-only
data, symbols and relocation for subsequent linking, debugging
symbols, and shared library information.
(Unix convention confusingly calls the file sections segments, so we use
that term in discussions of Unix file formats.)
.
.H2 "a.out headers"
The header varies somewhat from one version of Unix to another,
but the version in BSD Unix, Figure \n+F is typical.
(In the examples in this chapter, int values are 32 bits, and short are 16 bits.)
.FG \nF "a.out header"
.BC
int	a_magic;	// magic number
int	a_text;		// text segment size
int	a_data;		// initialized data size
int	a_bss;		// uninitialized data size
int	a_syms;		// symbol table size
int	a_entry;	// entry point
int	a_trsize;	// text relocation size
int	a_drsize;	// data relocation size
.EC
.EF
.P
The magic number
.T a_magic
indicates what kind of executable file this is.
(
.I "Make this a footnote:"
Historically, the magic number on the original PDP-11 was octal 407, which
was a branch instruction that would jump over the next seven words of the
header to the beginning of the text segment.
That permitted a primitive form
of position independent code.
A bootstrap loader could load the entire executable including the
file header to be loaded by into memory,
usually at location zero, and then jump to
the beginning of the loaded file to start the program.
Only a few standalone programs ever used this ability, but the
407 magic number is still with us 25 years later.)
Different magic numbers tell the operating system program loader to
load the file in to memory differently; we discuss these variations
below.
The text and data segment sizes
.T a_text
and
.T a_data
are the sizes in bytes of the read-only code and read-write data that
follow the header. 
Since Unix automatically initializes newly allocated memory to zero,
any data with an initial contents of zero or whose contents don't
matter need not be present in the a.out file.
The uninitialized size
.T a_bss
says how much uninitialized data (really zero-initialized) data
logically follows the data in the a.out file.
.P
The
.T a_entry
field gives the starting address of the program, while
.T a_syms ,
.T a_trsize ,
and
.T a_drsize
say how much symbol table and relocation information follow the data
segment in the file. 
Programs that have been linked and are ready to run need no symbol
nor relocation info, so these fields are zero in runnable files unless
the linker has included symbols for the debugger.
.
.H2 "Interactions with virtual memory"
The process involved when the operating system loads and starts
a simple two-segment file is straightforward, Figure \n+F:
.
.FG \nF "Loading an a.out into a process"
picture of file and segments with arrows pointing out data flows
.EF
.P
.BL
.LI
Read the a.out header to get the segment sizes.
.LI
Check to see if there's already a sharable code segment for this file.
If so, map that segment into the process' address space.
If not, create one, map it into the address space,
and read the text segment from the file into the new memory segment.
.LI
Create a private data segment large enough for the combined data and
BSS, map it into the process, and read the data segment from the file
into the data segment.
Zero out the BSS segment.
.LI
Create and map in a stack segment (usually separate from the data
segment, since the data heap and stack grow separately.)
Place arguments from the command line or calling program on the stack.
.LI
Set registers appropriately and jump to the starting address.
.EL
.P
This scheme (known as NMAGIC, where the N means new, as of about 1975)
works quite well, and PDP-11 and early VAX Unix systems
used it for years for all object files, and linkable files used it
throughout the life of the a.out format into the 1990s.
When Unix systems gained virtual memory,
several improvements to this simple scheme sped up program loading and
saved considerable real memory.
.P
On a paging system, the simple scheme above allocates fresh virtual
memory for each text segment and data segment.
Since the a.out file is already stored on the disk, the object file
itself can be mapped into the process' address space.
This saves disk space, since new disk space for virtual memory need
only be allocated for pages that the program writes into, and can speed
program startup, since the virtual memory system need only load in
from disk the pages that the program's actually using, not the whole
file.
.P
A few changes to the a.out format make this possible, Figure \n+F,.
and create what's known as ZMAGIC format.
These changes align the segments in the object file on page boundaries.
On systems with 4K pages, the a.out header is expanded to 4K, and the text
segment's size is rounded up to the next 4K boundary.
There's no need to round up the size of the data segment, since the BSS segment
logically follows the data segment, and is zeroed by the program loader
anyway.
.
.FG \nF "Mapping an a.out into a process"
Picture of file and segments, with page frames mapping into segments
.EF
.
.P
ZMAGIC files reduce unneeded paging, but at the cost of wasting a lot of
disk space.
The a.out header is only 32 bytes long, yet an entire 4K of disk space
is allocated.
The gap between the text and data also wastes 2K, half a 4K page, on average.
Both of these are fixed in the compact pagable format known as QMAGIC.
.P
Compact pagable files consider the a.out header to be part of the text
segment, since there's no particular reason that the code in the text
segment has to start at location zero.
Indeed, program zero is a particularly bad place to load a program since uninitialized
pointer variables often contain zero.
The code actually starts immediately after the header, and the whole
page is mapped into the second page of the process, leaving the first page unmapped so
that pointer references to location zero will fail, Figure \n+F.
This has the harmless side-effect of mapping the header into the process as
well.
.
.FG \nF "Mapping a compact a.out into a process"
Picture of file and segments, with page frames mapping into segments
.EF
.P
The text and data segments in a QMAGIC executable are each rounded up
to a full page, so the system can easily map file pages to address
space pages.
The last page of the data segment is padded out with zeros for BSS
data; if there is more BSS data than fits in the padding area, the
a.out header contains the size of the remaining BSS area to allocate.
.P
Although BSD Unix loads programs at location zero (or 0x1000 for QMAGIC), other versions
of Unix load programs at other addresses.
For example, System V for the Motorola 68K series loads at 0x80000000, and for the 386
loads at 0x8048000.
It doesn't matter where the load address is so long as it's page aligned, and the linker
and operating system can permanently agree what it is.
.H1 "Relocation: MS-DOS EXE files"
.
The a.out format is quite adequate for systems that assign a fresh address space
to each process so that every program can be loaded at the same logical address.
Many systems are not so fortunate.  
Some load all the programs into the same address space.
Others give each program its own address space, but don't always load the program
at the same address.
(32 bit versions of Windows fall into this last category.)
.P
In these cases, executable files contain
.I "relocation entries"
often called
.I fixups
that identify the places in the program where addresses need to be modified when
the program is loaded.
One of the simplest formats with fixups is the MS-DOS EXE format.
.P
As we saw with the .COM format above, DOS loads a program into a contiguous
chunk of available real-mode memory.
If the program doesn't fit in one 64K segment, the program has to use explicit
segment numbers to address program and data,
and at load time the segment numbers in the
program have to be fixed up to match the address where the program is actually
loaded.
The segment numbers in the file are stored as though the program will be loaded at location zero,
so the fixup action is to add to every stored segment number the base paragraph
number at which the program is actually loaded.
That is, if the program is loaded at location 0x5000, which is paragraph 0x500, a reference
to segment 12 is relocated to be a reference to segment 512.
The offsets within the segments don't change, since the program is relocated as a
unit, so the loader needn't adjust anything other than the segment numbers.
.P
Each .EXE File starts with a header shown in Figure \n+F.
Following the header is some extra information of variable length (used for overlay
loaders,
self-extracting archives, and other application-specific hackery)
and a list of the fixup addresses in 32 bit segment:offset format.
The fixup addresses are relative to the base of the program, so the fixups themselves
have to be relocated to find the addresses in the program to change.
After the fixups comes the program code.
There may be more information, ignored by the program loader, after the code.
(In the example below, far pointers are 32 bits with a 16 bit segment number and
16 bit offset.)
.FG \nF "Format of .EXE file header"
.BC
char signature[2] = "MZ";	// magic number
short lastsize;	// # bytes used in last block
short nblocks;	// number of 512 byte blocks
short nreloc;	// number of relocation entries
short hdrsize;	// size of file header in 16 byte paragraphs
short minalloc;	// minimum extra memory to allocate
short maxalloc;	// maximum extra memory to allocate
void far *sp;	// initial stack pointer
short checksum;	// ones complement of file sum
void far *ip;	// initial instruction pointer
short relocpos;	// location of relocation fixup table
short noverlay;	// Overlay number, 0 for program
char extra[];	// extra material for overlays, etc.
void far *relocs[]; // relocation entries, starts at relocpos
.EC
.EF
.P
Loading an .EXE file is only slightly more complicated than loading a .COM file.
.BL
.LI
Read in the header, check the magic number for validity.
.LI
Find a suitable area of memory.
The
.T minalloc
and
.T maxalloc
fields say the minimum and maximum number of extra paragraphs of memory to allocate
beyond the end of the loaded program.
(Linkers invariably default the minimum to the size of the program's BSS-like
uninitialized data, and the maximum to 0xFFFF.)
.LI
Create a PSP, the control area at the head of the program.
.LI
Read in the program code immediately after the PSP.
The
.T nblocks
and
.T lastsize
fields define the length of the code.
.LI
Start reading
.T nreloc
fixups at
.T relocpos .
For each fixup, add the base address of the program code to the segment number
in the fixup,
then use the relocated fixup as a pointer to a program address to which to add the
base address of the program code.
.LI
Set the stack pointer to
.T sp ,
relocated,
and jump to
.T ip ,
relocated, to start the program.
.EL
.P
Other than the peculiarities associated with segmented addressing, this is a pretty
typical setup for program loading.
In a few cases, different pieces of the program are
relocated differently.
In 286 protected mode, which EXE files do not support,
each segment of code or data in the executable file is loaded into
a separate segment in the system, but the segment numbers cannot for architectural reasons
be consecutive.
Each protected mode executable has a table near the beginning listing all of the segments
that the program will require.
The system makes a table of actual segment numbers
corresponding to each segment in the executable.
When processing fixups, the system looks up the logical segment number in that table
and replaces it with the actual segment number, a process more akin to symbol binding
than to relocation.
.P
Some systems permit symbol resolution at load time as well, but we save that topic
for Chapter 10. 
.
.H1 "Symbols and relocation"
.
The object formats we've considered so far are all loadable, that is, they can be
loaded into memory and run directly.
Most object files aren't loadable, but rather are intermediate files passed from a
compiler or assembler to a linker or library manager.
These linkable files can be considerably more complex than runnable ones.
Runnable files have to be simple enough to run on the ``bare metal'' of the computer,
while linkable files are processed by a layer of software which can do very sophisticated
processing.
In principle, a linking loader could do all of functions of a linker as a program
was loaded, but for efficiency reasons the loader is generally as simple as possible
to speed program startup.
(Dynamic linking, which we cover in chapter 10, moves a lot of the function
of the linker into the loader, with attendant performance loss, but modern computers
are fast enough that the gains from dynamic linking outweigh the performance penalty.)
.P
We look at five formats of increasing complexity: relocatable a.out used on BSD UNIX
systems, ELF used on System V, IBM 360 objects, the extended COFF linkable and PE executable formats
used on 32 bit Windows, and the OMF linkable
format used on pre-COFF Windows systems.
.
.H1 "Relocatable a.out"
.
Unix systems have always used a single object format for both runnable and linkable files,
with the runnable files leaving out the sections of use only to the linker.
The a.out format we saw in Figure 2 includes several fields
used by the linker.
The sizes of the relocation tables for the text and data segments are in
.T a_trsize
and
.T a_drsize ,
and the size of the symbol table is in
.T a_syms .
The three sections follow the text and data, Figure \n+F.
.FG \nF "Simplifed a.out"
a.out header

text section

data section

text relocation

data relocation

symbol table

string table
.EF
.H2 "Relocation entries"
Relocation entries serve two functions.
When a section of code is relocated to a different base address, relocation entries mark
the places in the code that have to be modified.
In a linkable file, there are also relocation entries that mark references to
undefined symbols, so the linker knows where to patch in the symbol's value when the
symbol is finally defined.
.P
Figure \n+F shows the format of a relocation entry.
Each entry contains the address within the text or data section to be relocated,
along with information that defines what to do.
The address is the offset from the beginning of the text or data segment of a
relocatable item.
The length field says how long the item is, values 0 through three mean 1, 2, 4, or
(on some architectures) 8 bytes.
The pcrel flag means that this is a ``PC relative'' item, that is, it's used in an
instruction as a relative address.
.FG \nF "Relocation entry format
Draw this with boxes

-- four byte address

-- three byte index, one bit pcrel flag, 2 bit length field, one bit extern flag, four spare bits
.EF
.P
The extern flag controls the interpretation of the index field to determine 
which segment or symbol the relocation refers to.
If the extern flag is off, this is a plain relocation item, and the index tells which
segment (text, data, or BSS) the item is addressing.
If the extern flag is on, this is a reference to an external symbol, and the index
is the symbol number in the file's symbol table.
.P
This relocation format is adequate for most machine architectures, but some of the more
complex ones need extra flag bits to indicate, e.g., three-byte 370 address constants or
high and low half constants on the SPARC.
.H2 "Symbols and strings"
The final section of an a.out file is the symbol table.
Each entry is 12 bytes and describes a single symbol, Figure \n+F.
.FG \nF "Symbol format"
Draw this with boxes, too:

- four byte name offset

- one byte type

- one spare byte

- two byte debugger info

- four byte value
.EF
.P
Unix compilers permit arbitrarily long identifiers, so the name strings are all in a
string table that follows the symbol table.
The first item in a symbol table entry is the offset in the string table of the
null-terminated name of the symbol.
In the type byte, if the low bit is set the symbol is external (a misnomer, it'd
better be called global, visible to other
modules).
Non-external symbols are not needed for linking but can be used by debuggers.
The rest of the bits are the symbol type.
The most important types include:
.BL
.LI
.I text ,
.I data ,
or
.I bss :
A symbol defined in this module.
External bit may or may not be on.
Value is the relocatable address in the module corresponding to the symbol.
.LI
.I abs :
An absolute non-relocatable symbol.
(Rare outside of debugger info.)
External bit may or may not be on.
Value is the absolute value of the symbol.
.LI
.I undefined :
A symbol not defined in this module.
External bit must be on.
Value is usually zero, but see the ``common block hack'' below.
.EL
These symbol types are adequate for older languages such as C and Fortran and,
just barely, for C++.
.P
As a special case, a compiler can use an undefined symbol to request that the linker
reserve a block of storage by that symbol's name.
If an undefined external symbol has a non-zero value, that value is
a hint to the linker how large a block of storage the program expects the symbol to
address.
At link time, if there is no definition of the symbol, the linker creates a block of
storage by that name in the BSS segment with the size being the largest hint value found
in any of the linked modules. 
If the symbol is defined in any module, the linker uses the definition and ignores the
size hints. 
This ``common block hack'' supports typical (albeit non standard conformant)
usage of Fortran common blocks and uninitialized C external data.
.
.H2 "a.out summary"
The a.out format is a simple and effective one for relatively simple
systems with paging.
It has fallen out of favor because it doesn't easily support for dynamic linking.
Also, a.out doesn't support C++, which requires special treatment of
initializer and finalizer code, very well.
.H1 "Unix ELF"
.
The traditional a.out format served the Unix community for over a decade, but with the
advent of Unix System V, AT&T decided that it needed something better to support
cross-compilation, dynamic linking and other modern system features.
Early versions of System V used COFF, Common Object File Format, which was originally
intended for cross-compiled embedded systems and didn't work all that well for a
time-sharing system, since it couldn't support C++ or dynamic linking without extensions. 
In later versions of System V, COFF was superseded by ELF, Executable and Linking Format.
ELF has been adopted by the popular freeware Linux and BSD variants of Unix as
well.
ELF has an associated debugging format called DWARF which we visit in Chapter 5.
In this discussion we treat the 32 bit version of ELF.
There are 64 bit variants that extend sizes and addresses to 64 bits in a straightforward way.
.P
ELF files come in three slightly different flavors: relocatable, executable, and
shared object.
Relocatable files are created by compilers and assemblers but need to be processed by
the linker before running.
Executable files have all relocation done and all symbols resolved except perhaps
shared library symbols to be resolved at runtime.
Shared objects are shared libraries, containing both symbol information for the linker
and directly runnable code for runtime.
.P
ELF files have an unusual dual nature, Figure \n+F.
Compilers, assemblers, and linkers treat the file as a set of logical sections described by
a section header table, while the system loader treats the file as a set of segments
described by a program header table.
A single segment will usually consist of several sections.
For example, a ``loadable read-only'' segment could contain sections for executable code,
read-only data, and symbols for the dynamic linker.
Relocatable files have section tables, executable files have program header tables, and
shared objects have both.
The sections are intended for further processing by a linker, while the segments are
intended to be mapped into memory.
.FG \nF "Two views of an ELF file"
linking view and execution view, adapted from fig 1-1 in Intel TIS document
.EF
.P
ELF files all start with the ELF header, Figure \n+F.
The header is designed to be decodable even on machines with a different byte order
from the file's target architecture.
The first four bytes are the magic number identifying an ELF file, followed by three
bytes describing the format of the rest of the header.
Once a program has read the
.T class
and
.T byteorder
flags, it knows the byte order and word size of the file and can do the necessary
byte swapping and size conversions.
Other fields provide the size and location of the section header and program header,
if present, 
.FG \nF "ELF header"
.BC
char magic[4] = "\e177ELF";	// magic number
char class;	// address size, 1 = 32 bit, 2 = 64 bit
char byteorder;	// 1 = little-endian, 2 = big-endian
char hversion;	// header version, always 1
char pad[9];

short filetype;	// file type: 1 = relocatable, 2 = executable,
		// 3 = shared object, 4 = core image
short archtype;	// 2 = SPARC, 3 = x86, 4 = 68K, etc.
int fversion;	// file version, always 1
int entry;	// entry point if executable
int phdrpos;	// file position of program header or 0
int shdrpos;	// file position of section header or 0
int flags;	// architecture specific flags, usually 0
short hdrsize;	// size of this ELF header
short phdrent;	// size of an entry in program header
short phdrcnt;	// number of entries in program header or 0
short shdrent;	// size of an entry in section header
short phdrcnt;	// number of entries in section header or 0
short strsec;	// section number that contains section name strings
.EC
.EF
.H2 "Relocatable files"
A relocatable or shared object file is considered to be a collection of sections, defined in section headers, Figure \n+F.
Each section contains a single type of information, such as program code,
read-only or read-write data, relocation entries, or symbols.
Every symbol defined in the module is defined relative to a section,
so a procedure's entry point would be relative to the program code section that
contains that procedure's code.
There are also two pseudo-sections
.T SHN_ABS
(number 0xfff1) which logically contains absolute non-relocatable symbols, and
.T SHN_COMMON
(number 0xfff2)
that contains uninitialized data blocks, the descendant of the a.out common block hack.
Section zero is always a null section, with an all-zero section table entry.
.FG \nF "Section header"
.BC
int sh_name;	// name, index into the string table
int sh_type;	// section type
int sh_flags;	// flag bits, below
int sh_addr;	// base memory address, if loadable, or zero
int sh_offset;	// file position of beginning of section
int sh_size;	// size in bytes
int sh_link;	// section number with related info or zero
int sh_info;	// more section-specific info
int sh_align;	// alignment granularity if section is moved
int sh_entsize;	// size of entries if section is an array
.EC
.EF
.P
Section types include:
.BL
.LI
.T PROGBITS :
Program contents including code, data, and debugger info.
.LI
.T NOBITS :
Like
.T PROGBITS
but no space is allocated in the file itself.
Used for BSS data allocated at program load time.
.LI
.T SYMTAB
and
.T DYNSYM :
Symbol tables, described in more detail later.
The
.T SYMTAB
table contains all symbols and is intended for the regular linker, while
.T DYNSYM
is just the symbols for dynamic linking.
(The latter table has to be loaded into memory at runtime, so it's kept as small as possible.)
.LI
.T STRTAB :
A string table, analogous to the one in a.out files.
Unlike a.out files, ELF files can and often do contain separate string tables for
separate purposes, e.g. section names, regular symbol names, and dynamic linker symbol
names.
.LI
.T REL
and
.T RELA :
Relocation information.
.T REL
entries add the relocation value to the base value stored in the code or data, while
.T RELA
entries include the base value for relocation in the relocation entries themselves.
(For historical reasons, x86 objects use
.T REL
relocation and 68K objects use
.T RELA .)
There are a bunch of relocation types for each architecture, similar to (and derived
from) the a.out relocation types.
.LI
.T DYNAMIC
and
.T HASH :
Dynamic linking information and the runtime symbol hash table.
.EL
There are three flag bits used:
.T ALLOC ,
which means that the section occupies memory when the program is loaded,
.T WRITE
which means that the section when loaded is writable, and
.T EXECINSTR
which means that the section contains executable machine code.
.P
A typical relocatable executable has about a dozen sections.
Many of the section names are meaningful to the linker, which looks
for the section types it knows about for specific processing, while
either discarding or passing through unmodified sections (depending on
flag bits) that it doesn't know about.
.P
Sections include:
.BL
.LI
.T .text
which is type PROGBITS with attributes ALLOC+EXECINSTR.
It's the equivalent of the a.out text segment.
.LI
.T .data
which is type PROGBITS with attributes ALLOC+WRITE.
It's the equivalent of the a.out data segment.
.LI
.T .rodata
which is type
.T PROGBITS
with attribute
ALLOC.
It's read-only data, hence no WRITE.
.LI
.T .bss
which is type NOBITS with attributes ALLOC+WRITE.
The BSS section takes no space in the file, hence NOBITS, but is allocated at runtime,
hence ALLOC.
.LI
.T .rel.text ,
.T .rel.data ,
and
.T .rel.rodata ,
each which is type REL or RELA.
The relocation information for the corresponding text or data section.
.LI
.T .init
and
.T .fini ,
each type PROGBITS with attributes ALLOC+EXECINSTR.
These are similar to
.T .text ,
but are code to be executed when the program starts up or terminates, respectively.
C and Fortran don't need these, but they're essential for C++ which has global data
with executable initializers and finalizers.
.LI
.T .symtab ,
and
.T .dynsym
types SYMTAB and DYNSYM respectively, regular and dynamic linker symbol tables.
The dynamic linker symbol table is ALLOC set, since it's loaded at runtime.
.LI
.T .strtab ,
and
.T .dynstr
both type STRTAB, a table of name strings, for a symbol table or the section
names for the section table.
The
.T dynstr
section, the strings for the dynamic linker symbol table,
has ALLOC set since it's loaded at runtime.
.EL
There are also some specialized sections like
.T .got
and
.T .plt ,
the Global Offset Table and Procedure Linkage Table used for dynamic linking (covered
in Chapter 10),
.T .debug
which contains symbols for the debugger,
.T .line
which contains mappings from source line numbers to object code locations again for
the debugger, and
.T .comment
which contains documentation strings, usually version control version numbers.
.P
An unusual section type is
.T .interp
which contains the name of a program to use as an interpreter.
If this section is present, rather than running the program directly, the system runs
the interpreter and passes it the ELF file as an argument.
Unix has for many years had self-running interpreted text files, using
.BC
#! /path/to/interpreter
.EC
as the first line of the file.
ELF extends this facility to interpreters which run non-text programs.
In practice this is used to call the run-time dynamic linker to load
the program and link in any required shared libraries.
.P
The ELF symbol table is similar to the a.out symbol table.
It consists of an array of entries, Figure \n+F.
.FG \nF "ELF symbol table"
.BC
int name;	// position of name string in string table
int value;	// symbol value, section relative in reloc,
		// absolute in executable
int size;	// object or function size
char type:4;	// data object, function, section, or special case file
char bind:4;	// local, global, or weak
char other;	// spare
short sect;	// section number, ABS, COMMON or UNDEF
.EF
The a.out symbol entry is fleshed out with a few more fields.
The size field tells how large a data object is (particularly for
undefined BSS, the common block hack again.)
A symbol's binding can be local, just visible in this module, global,
visible everywhere, or weak.
A weak symbol is a half-hearted global symbol: if a definition is
available for an undefined weak symbol, the linker will use it, but if
not the value defaults to zero.
.P
The symbol's type is normally data or function.
There is a section symbol defined for each section, usually with the
same name as the section itself, for the benefit of relocation
entries.
(ELF relocation entries are all relative to symbols, so a section
symbol is necessary to indicate that an item is relocated relative to
one of the sections in the file.)
A file entry is a pseudo-symbol containing the name of the source
file.
.P
The section number is the section relative to which the symbol is
defined, e.g., function entry points are defined relative to
.T .text .
Three special pseudo-sections also appear, UNDEF for undefined
symbols, ABS for non-relocatable absolute symbols, and COMMON for
common blocks not yet allocated.
(The value of a COMMON symbol gives the required alignment
granularity, and the size gives the minimum size.
Once allocated by the linker, COMMON symbols move into the
.T .bss
section.)
.P
A typical complete ELF file, Figure \n+F, contains quite a few sections for
code, data, relocation information, linker symbols, and debugger
symbols.
If the file is a C++ program, it will probably also contain
.T .init ,
.T .fini ,
.T .rel.init ,
and
.T .rel.fini
sections as well.
.FG \nF "Sample relocatable ELF file"
ELF header
.br
\&.text
.br
\&.data
.br
\&.rodata
.br
\&.bss
.br
\&.sym
.br
\&.rel.text
.br
\&.rel.data
.br
\&.rel.rodata
.br
\&.line
.br
\&.debug
.br
\&.strtab
.br
(section table, not considered to be a section)
.EF
.
.H2 "ELF executable files"
An ELF executable file has the same general format as a relocatable
ELF, but the data are arranged so that the file can be mapped into
memory and run.
The file contains a program header that follows the ELF header in the file.
The program header defines the segments to be mapped.
The program header, Figure \n+F, is an array of segment descriptions.
.FG \nF "ELF program header"
.BC
int type;	// loadable code or data, dynamic linking info, etc.
int offset;	// file offset of segment
int virtaddr;	// virtual address to map segment
int physaddr;	// physical address, not used
int filesize;	// size of segment in file
int memsize;	// size of segment in memory (bigger if contains BSS)
int flags;	// Read, Write, Execute bits
int align;	// required alignment, invariably hardware page size
.EC
.EF
An executable usually has only a handful of segments, a read-only one
for the code and read-only data, and a read-write one for read/write
data.
All of the loadable sections are packed into the appropriate segments
so the system can map the file with one or two operations.
.P
ELF files extend the ``header in the address space'' trick used in
QMAGIC a.out files to make the executable files as compact as possible
at the cost of some slop in the address space.
A segment can start and end at arbitrary file offsets, but the virtual
starting address for the segment must have the same low bits modulo the
alignment as the starting offset in the file, i.e, must start in the same offset on a page.
The system maps in the entire range from the page where the segment
starts to the page where the segment ends, even if the segment
logically only occupies part of the first and last pages mapped.
Figure \n+F shows a typical segment arrangement.
.FG \nF "ELF loadable segments"
.TS
|l|n|n|l|.
_
	File offset	Load address	Type
_
ELF header	0	0x8000000
_
Program header	0x40	0x8000040
_
Read only text	0x100	0x8000100	LOAD, Read/Execute
(size 0x4500)	\^	\^	\^
_
Read/write data	0x4600	0x8005600	LOAD, Read/Write/Execute
(file size 0x2200,	\^	\^	\^
memory size 0x3500)	\^	\^	\^
_
.TE
non-loadable info and optional section headers
.EF
The mapped text segment consists of the ELF header, program header,
and read-only text, since the ELF and program headers are in the same
page as the beginning of the text.
The read/write 
but the data segment in the file starts immediately after the text segment.
The page from the file is mapped both read-only as the last page of the text
segment in memory and copy-on-write as the first page of the data segment.
In this example, if a computer has 4K pages,
and in an executable file the text ends at 0x80045ff, then the data starts at 0x8005600.
The file page is mapped into the last page of the text segment at location 0x8004000
where the first 0x600 bytes contain the
text from 0x8004000-0x80045ff, and into the data segment at 0x8005000
where the rest of the page contain the initial contents of data from 0x8005600-0x80056ff.
.P
The BSS section again is logically continuous with the end of the read
write sections in the data segment, in this case 0x1300 bytes, the
difference between the file size and the memory size.
The last page of the data segment is mapped in from the file, but as soon as the
operating system starts to zero the BSS segment, the copy-on-write system makes a
private copy of the page.
.P
If the file contains
.T .init
or
.T .fini
sections, those sections are part of the read only text segment, and
the linker inserts code at the entry point to call the
.T .init
section code before it calls the main program, and the
.T .fini
section code after the main program returns.
.P
An ELF shared object contains all the baggage of a relocatable and an
executable file.
It has the program header table at the beginning, followed by the
sections in the loadable segments, including dynamic linking
information.
Following sections comprising the loadable segments are the
relocatable symbol table and other information that the linker needs
while creating executable programs that refer to the shared object,
with the section table at the end.
.H2 "ELF summary"
ELF is a moderately complex format, but it serves its purposes well.
It's a flexible enough relocatable format to support C++, while being
an efficient executable format for a virtual memory system with
dynamic linking, and makes it easy to map executable pages directly into the
program address space.
It also permits cross-compilation and cross-linking from one platform
to another, with enough information in each ELF file to identify the
target architecture and byte order.
.
.H1 "IBM 360 object format"
.
The IBM 360 object format was designed in the early 1960s, but remains
in use today.
It was originally designed for 80 column punch cards, but has been adapted
for disk files on modern systems.
Each object file contains a set of control sections (csects), which are
optionally named separately relocatable chunks of code and/or data.
Typically each source routine is compiled into one csect, or perhaps one csect
for code and another for data.
A csect's name, if it has one,
can be used as a symbol that addresses the beginning of the
csect; other types of symbols include those defined within a csect,
undefined external symbols, common blocks, and a few others.
Each symbol defined or used in an object file is assigned a small integer
External Symbol ID (ESID).
An object file is a sequence of 80 byte records in a common format, Figure \n+F.
The first byte of each record is 0x02, a value that marks the record as part of an object
file.
(A record that starts with a blank is treated as a command by the linker.)
Bytes 2-4 are the record type, TXT for program code or "text", ESD for
an external symbol directory that defines symbols and ESIDs, RLD for
Relocation Directory, and END for the last record that also defines
the starting point.
The rest of the record up through byte 72 is specific to the record type.
Bytes 73-80 are ignored.
On actual punch cards they were usually a sequence number.
.P
An object file starts with some ESD records that define the csects and all symbols,
then the TXT records, the RLD records and the END.
There's quite a lot of flexibility in the order of the records.
Several TXT records can redefine the contents of a single location, with the last one in the
file winning.
This made it possible (and not uncommon) to punch a few ``patch'' cards to stick at the end of
an object deck, rather than reassembling or recompiling. 
.FG \nF "IBM object record format"
.BC
char flag = 0x2;
char rtype[3];	// three letter record type
char data[68];	// format specific data
char seq[8];	// ignored, usually sequence numbers
.EF
.
.H2 "ESD records"
Each object file starts with ESD records, Figure \n+F,
that define the csects and
symbols used in the file and give them all ESIDs.
.
.FG \nF "ESD format"
.BC
char flag = 0x2;	// 1
char rtype[3] = "ESD";	// 2-4 three letter type
char pad1[6];
short nbytes;		// 11-12 number of bytes of info: 16, 32, or 48
char pad2[2];
short esid;		// 15-16 ESID of first symbol

{			// 17-72, up to 3 symbols
  char name[8];		// blank padded symbol name
  char type;		// symbol type
  char base[3];		// csect origin or label offset
  char bits;		// attribute bits
  char len[3];		// length of object or csect ESID
} 
.EC
.EF
Each ESD records defines up to three symbols with sequential ESIDs.
Symbols are up to eight EBCDIC characters.
The symbol types are:
.BL
.LI
SD and PC: Section Definition or Private Code, defines a csect.
The csect origin is the logical address of the beginning of the csect, usually
zero, and the length is the length of the csect.
The attribute byte contains flags saying whether the csect uses 24 or 31
bit program addressing, and whether it needs to be loaded into a 24 or 31 bit
address space.
PC is a csect with a blank name; names of csects must be unique within a
program but there can be multiple unnamed PC sections.
.LI
LD: label definition.
The base is the label's offset within its csect, the len field is the ESID
of the csect.
No attribute bits.
.LI
CM: common.
Len is the length of the common block, other fields are ignored.
.LI
ER and WX: external reference and weak external.
Symbols defined elsewhere.
The linker reports an error if an ER symbol isn't defined elsewhere in
the program, but an undefined WX is not an error.
.LI
PR: pseudoregister, a small area of storage defined at link time but
allocated at runtime.
Attribute bits give the required alignment, 1 to 8 bytes, and len is
the size of the area.
.EL
.H2 "TXT records"
Next come text records, Figure \n+F, that contain the program code and data.
Each text record defines up to 56 contiguous bytes within a single
csect.
.FG \nF "TXT format"
.BC
char flag = 0x2;	// 1
char rtype[3] = "TXT";	// 2-4 three letter type
char pad;
char loc[3];		// 6-8 csect relative origin of the text
char pad[2];
short nbytes;		// 11-12 number of bytes of info
char pad[2];
short esid;		// 15-16 ESID of this csect
char text[56];		// 17-72 data
.EC
.EF
.H2 "RLD records"
After the text come
RLD records, Figure \n+F, each of which contains
a sequence of relocation entries.
.FG \nF "RLD format"
.BC
char flag = 0x2;	// 1
char rtype[3] = "TXT";	// 2-4 three letter type
char pad[6];
short nbytes;		// 11-12 number of bytes of info
char pad[7];

{			// 17-72 four or eight-byte relocation entries
  short t_esid;		// target, ESID of referenced csect or symbol
			// or zero for CXD (total size of PR defs)
  short p_esid;		// pointer, ESID of csect with reference
  char flags;		// type and size of ref,
  char addr[3];		// csect-relative ref address
}
.EC
.EF
Each entry has the ESIDs of the target and the pointer, a flag byte, and
the csect-relative address of the pointer.
The flag byte has bits giving the type of reference (code, data, PR, or CXD),
the length (1, 2, 3, or 4 bytes), a sign bit saying whether to add or
subtract the relocation, and a "same" bit.
If the "same" bit is set, the next entry omits the two ESIDs and uses the
same ESIDs as this entry.
.H2 "END records"
The end record, Figure \n+F, gives the starting address for the program,
either an address within a csect or the ESID of an external symbol.
.FG \nF "END format"
.BC
char flag = 0x2;	// 1
char rtype[3] = "END";	// 2-4 three letter type
char pad;
char loc[3];		// 6-8 csect relative start address or zero
char pad[6];
short esid;		// 15-16 ESID of csect or symbol
.EC
.EF
.H2 Summary
Although the 80 column records are quite dated, the IBM object format
is still surprisingly simple and flexible.
Extremely small linkers and loaders can handle this format; on one model
of 360, I used an absolute loader that fit on a single 80 column punch card
and could load a program, interpreting TXT and END records, and
ignoring the rest.
.P
Disk based systems either store object files as card images, or use a
variant version of the format with the same record types but much longer
records without sequence numbers.
The linkers for DOS (IBM's lightweight operating system for the 360)
produce a simplified output format with in effect one csect and a
stripped down RLD without ESIDs.
.P
Within object files,
the individual named csects permit a programmer or linker to arrange the
modules in a program as desired, putting all the code csects together, for
example.
The main places this format shows its age is in the eight-character maximum
symbol length, and no type information about individual csects.
.
.H1 "Microsoft Portable Executable format"
.
Microsoft's Windows NT has extremely mixed heritage
including earlier versions of MS-DOS and Windows, Digital's VAX VMS
(on which many of the programmers had worked), and Unix System V (on
which many of the rest of the programmers had worked.)
NT's format is adapted from COFF, a file format that Unix versions used
after a.out but before ELF.
We'll take a look at PE and, where it differs from PE, Microsoft's
version of COFF.
.P
Windows developed in an underpowered
environment with slow processors, limited RAM, and originally without
hardware paging, so there was always an emphasis on shared libraries
to save memory, and ad-hoc tricks to improve performance, some of which
are apparent in the PE/COFF design.
Most Windows executables contain
.I resources ,
a general term that refers to objects such as cursors,
icons, bitmaps, menus, and fonts that are shared between the program
and the GUI.
A PE file can contain a resource directory for all of the resources the
program code in that file uses.
.P
PE executable files are intended for a paged environment, so pages from
a PE file are
usually be mapped directly into memory and run, much like an ELF
executable.
PE's can be either EXE programs or DLL shared libraries (known as
dynamic link libraries).
The format of the two is the same, with a status bit identifying a PE as one
or the other.
Each can contain a list of
exported functions and data that can be used by other PE files loaded into the same address space,
and a list of imported functions and data that need to be resolved from other PE's at load time.
Each file contains a set of chunks analogous to ELF segments that have
variously been called sections, segments, and objects.
We call them sections here, the term that Microsoft now uses.
.P
A PE file, Figure \n+F, starts with a small DOS .EXE file that prints out something
like "This program needs Microsoft Windows."
(Microsoft's dedication to certain kinds of backward compatibility is
impressive.)
A previously unused field at the end of the EXE header points to the
PE signature, which is followed by the file header which consists of a COFF section and
the ``optional'' header, which despite
its name appears in all PE files, and a list of section headers.
The section headers describe the various
sections of the file.
A COFF object file starts with the COFF header, and omits the optional header.
.FG \nF "Microsoft PE and COFF file"
DOS header (PE only)
.br
DOS program stub (PE only)
.br
PE signature (PE only)
.br
COFF header
.br
Optional header (PE only)
.br
Section table
.br
Mappable sections (pointed to from section table)
.br
COFF line numbers, symbols, debug info (optional in PE File)
.EF
Figure \n+F shows the PE, COFF, and "optional" headers.
The COFF header describes the contents of the file, with the most
important values being the number of entries in the section table,
The "optional" header contains pointers to the most commonly
used file sections.
Addresses are all kept as offsets from the place in memory that the
program is loaded, also called Relative Virtual Addresses or RVAs.
.FG \nF "PE and COFF header"
PE signature
.BC
char signature[4] = "PE\e0\e0";	// magic number, also shows byte order
.EC
COFF header
.BC
unsigned short  Machine;	// required CPU, 0x14C for 80386, etc.
unsigned short  NumberOfSections;	// creation time or zero
unsigned long   TimeDateStamp;	// creation time or zero
unsigned long   PointerToSymbolTable;	// file offset of symbol table in COFF or zero
unsigned long   NumberOfSymbols;	// # entries in COFF symbol table or zero
unsigned short  SizeOfOptionalHeader;	// size of the following optional header
unsigned short  Characteristics;	// 02 = executable, 0x200 = nonrelocatable,
				//  0x2000 = DLL rather than EXE
.EC
Optional header that follows PE header,
not present in COFF objects
.BC
// COFF fields
unsigned short  Magic;	// octal 413, from a.out ZMAGIC
unsigned char   MajorLinkerVersion;
unsigned char   MinorLinkerVersion;
unsigned long   SizeOfCode;	// .text size
unsigned long   SizeOfInitializedData;	// .data size
unsigned long   SizeOfUninitializedData;	// .bss size
unsigned long   AddressOfEntryPoint;	// RVA of entry point
unsigned long   BaseOfCode;	// RVA of .text
unsigned long   BaseOfData;	// RVA of .data

// additional fields.

unsigned long   ImageBase;	// virtual address to map beginning of file
unsigned long   SectionAlignment;	// section alignment, typically 4096, or 64K
unsigned long   FileAlignment;	// file page alignment, typically 512
unsigned short  MajorOperatingSystemVersion;
unsigned short  MinorOperatingSystemVersion;
unsigned short  MajorImageVersion;
unsigned short  MinorImageVersion;
unsigned short  MajorSubsystemVersion;
unsigned short  MinorSubsystemVersion;
unsigned long   Reserved1;
unsigned long   SizeOfImage;	// total size of mappable image, rounded to SectionAlignment
unsigned long   SizeOfHeaders;	// total size of headers up through section table
unsigned long   CheckSum;	// often zero
unsigned short  Subsystem;// required subsystem: 1 = native, 2 = Windows GUI,
	// 3 = Windows non-GUI, 5 = OS/2, 7 = POSIX
unsigned short  DllCharacteristics;	// when to call initialization routine (obsolescent)
	// 1 = process start, 2 = process end, 4 = thread start, 8 = thread end
unsigned long   SizeOfStackReserve;	// size to reserve for stack
unsigned long   SizeOfStackCommit;	// size to allocate initially for stack
unsigned long   SizeOfHeapReserve;	// size to reserve for heap
unsigned long   SizeOfHeapCommit;	// size to allocate initially for heap
unsigned long   LoaderFlags;		// obsolete
unsigned long   NumberOfRvaAndSizes;	// number of entries in following image data directory
// following pair is repeated once for each directory
{
	unsigned long   VirtualAddress;	// relative virtual address of directory
	unsigned long   Size;
}
.EC
Directories are, in order:
.br
Export Directory
.br
Import Directory
.br
Resource Directory
.br
Exception Directory
.br
Security Directory
.br
Base Relocation Table
.br
Debug Directory
.br
Image Description String
.br
Machine specific data
.br
Thread Local Storage Directory
.br
Load Configuration Directory
.EF
Each PE file is created in a way that makes it straightforward for the
system loader to map it into memory.
Each section is physically aligned on a disk block boundary or greater
(the filealign value), and logically aligned on a memory page boundary
(4096 on the x86.)
The linker creates a PE file for a specific target address at which
the file will be mapped (imagebase).
If a chunk of address space at that address is available, as it almost always is,
no load-time fixups are needed.
In a few cases such as the old win32s compatbility system
target addresses aren't available so the loader has to map the file
somewhere else, in which case the file must contain relocation fixups in the .reloc section
that tell the loader what to change.
Shared DLL libraries also are subject to relocation, since the address
at which a DLL is mapped depends on what's already occupying the
address space.
.P
Following the PE header is the section table, an array of entries like
Figure \n+F.
.FG \nF "Section table"
.BC
// array of entries
unsigned char   Name[8];	// section name in ASCII
unsigned long   VirtualSize;	// size mapped into memory
unsigned long   VirtualAddress;	// memory address relative to image base 
unsigned long   SizeOfRawData;	// physical size, mumtiple of file alignment
unsigned long   PointerToRawData;	// file offset 
// next four entries present in COFF, present or 0 in PE
unsigned long   PointerToRelocations;	// offset of relocation entries
unsigned long   PointerToLinenumbers;	// offset of line number entries
unsigned short  NumberOfRelocations;	// number of relocation entries
unsigned short  NumberOfLinenumbers;	// number of line number entries
unsigned long   Characteristics;	// 0x20 = text, 0x40 = data, 0x80 = bss, 0x200 = no-load,
	// 0x800 = don't link, 0x10000000 = shared,
	// 0x20000000 = execute, 0x40000000 = read, 0x80000000 = write
.EC
.EF
Each section has both a file address and size (PointerToRawData and SizeOfRawData) and a
memory address and size (VirtualAddress and VirtualSize) which aren't necessarily the
same.
The CPU's page size is often larger than the disk's
block size, typically 4K pages and 512 byte disk blocks, and a
section that ends in the middle of a page need not have blocks for the
rest of the page allocated, saving small amounts of disk space.
Each section is marked with the hardware permissions appropriate for
the pages, e.g. read+execute for code and read+write for data.
.H2 "PE special sections"
A PE file includes .text, .data, and sometimes .bss sections like a Unix executable
(usually under those names, in fact) as well as a lot of
Windows-specific sections.
.BL
.LI
.I Exports :
A list of the symbols defined in this module and visible to other
modules.
EXE files typically export no symbols, or maybe one or two for debugging.
DLLs export symbols for the routines and data that they provide.
In keeping with Windows space saving tradition, exported symbols
can be references via small integers called export ordinals as well as by names.
The exports section contains an array of the RVAs of the exported
symbols.
It also contains two parallel arrays of the name of the symbol (as the
RVA of an ASCII string), and the export ordinal for the symbol, sorted
by string name.
To look up a symbol by name, perform a binary search in the string
name table, then find the entry in the ordinal table in the position
corresponding to the found name, and use that ordinal to index the
array of RVAs.
(This is arguably faster than iterating over an array of three-word
entries.)
Exports can also be ``forwarders'' in which case the RVA points to a string
naming the actual symbol which is found in another library.
.LI
.I Imports :
The imports table lists all of the symbols that need to be resolved at
load time from DLLs.
The linker predetermines which symbols will be found in which DLLs, so
the imports table starts with an import directory, consisting of one
entry per referenced DLL.
Each directory entry contains the name of the DLL, and parallel arrays
one identifying the required symbols, and the other being the
place in the image to store the symbol value.
The entries in the first value can be either an ordinal (if the high
bit is set), or a pointer to a name string preceded by a guess at the
ordinal to speed up the search.
The second array contains the place to store the
symbol's value; if the symbol is a procedure, the linker will already have adjusted all calls to
the symbol to call indirectly via that location, if the symbol is data, references in the
importing module are made using that location as a pointer to the actual data.
(Some compilers provide the indirection automatically, others require explicit program code.)
.
.LI
.I Resources :
The resource table is organized as a tree.
The structure supports arbitrarily deep trees, but in practice the
tree is three levels, resource type, name, and language.
(Language here means a natural language, this permits customizing executables
for speakers of languages other than English.)
Each resource can have either a name or and numbers.
A typical resource might be type DIALOG (Dialog box), name ABOUT (the
About This Program box), language English.
Unlike symbols which have ASCII names, resources have Unicode names to
support non-English languages.
The actual resources are chunks of binary data, with the format of the
resource depending on the resource type.
.LI
.I "Thread Local Storage" :
Windows supports multiple threads of execution per process.
Each thread can have its own private storage, Thread Local Storage or TLS.
This section points to a chunk of the image used to initialize TLS
when a thread starts, and also contains pointers to initialization
routines to call when each thread starts.
Generally present in EXE but not DLL files, because Windows doesn't
allocate TLS storage when a program dynamically links to a DLL.
(See Chapter 10.)
.LI
.I Fixups :
If the executable is moved, it is moved as a unit so all fixups have
the same value, the difference between the actual load address and the target address. 
The fixup table, if present, contains an array of fixup blocks, each
containing the fixups for one 4K page of the mapped executable.
(Executables with no fixup table can only be loaded at the linked target
address.)
Each fixup block contains the base RVA of the page, the number of fixups,
and an array of 16 bit fixup entries.
Each entry contains in the low 12 bits the offset in the block that
needs to be relocated, and in the high 4 bits the fixup type, e.g.,
add 32 bit value, adjust high 16 bits or low 16 bits (for MIPS
architecture).
This block-by-block scheme saves considerable space in the relocation
table, since each entry can be squeezed to two bytes rather than the 8
or 12 bytes the ELF equivalent takes.  
.EL
.H2 "Running a PE executable"
Starting a PE executable process is a relatively straightforward
procedure.
.BL
.LI
Read in the first page of the file with the DOS header, PE header, and
section headers.
.LI
Determine whether the target area of the address space is available,
if not allocate another area.
.LI
Using the information in the section headers, map all of the sections
of the file to the appropriate place in the allocated address space.
.LI
If the file is not loaded into its target address, apply fixups.
.LI
Go through the list of DLLs in the imports section and load any that
aren't already loaded.
(This process may be recursive.)
.LI
Resolve all the imported symbols in the imports section.
.LI
Create the initial stack and heap using values from the PE header.
.LI
Create the initial thread and start the process.
.EL
.H2 "PE and COFF"
A Windows COFF relocatable object file has the same COFF file header and section
headers as  a PE, but the structure is more similar to that of a
relocatable ELF file.
COFF files don't have the DOS header nor the optional header following
the PE header.
Each code or data section also carries along relocation
and line number information.
(The line numbers in an EXE file, if any, are collected in in a debug
section not handled by the system loader.)
COFF objects have section-relative relocations, like ELF files,
rather than RVA relative relocations, and invariably contain a symbol
table with the symbols needed.
COFF files from language compilers typically do not contain any
resources, rather, the resources are in a separate object file created
by a specialized resource compiler.
.P
COFF files can also have several other section types not used in PE.
The most notable is the .drective section which contains text command
strings for the linker.
Compilers usually use .drective to tell the linker to search the
appropriate language-specific libraries.
Some compilers including MSVC also include linker directives to export
code and data symbols when creating a DLL.
(This mixture of commands and object code goes way back; IBM linkers
accepted mixed card decks of commands and object files in the early 1960s.)
.H2 "PE summary"
.
The PE file format is a competent format for a linearly addressed operating
system with virtual memory, with only small amounts of historical baggage
from its DOS heritage.
It includes some extra features such as ordinal imports and exports
intended to speed
up program loading on small systems, but of debatable effectiveness on
modern 32 bit systems.
The earlier NE format for 16 bit segmented executables
was far more complicated, and PE is a definite improvement.
.
.H1 "Intel/Microsoft OMF files"
.
The penultimate format we look at in this chapter is one of the oldest
formats still in use, the Intel Object Module Format.
Intel originally defined OMF in the late 1970s for the 8086.
Over the years a variety of vendors, including Microsoft, IBM, and
Phar Lap (who wrote a very widely used set of 32 bit extension tools
for DOS), defined their own extensions.
The current Intel OMF is the union of the original spec and most of
the extensions, minus a few extensions that either collided with other
extensions or were never used.
.P
All of the formats we've seen so far are intended for environments
with random access disks and enough RAM to do compiler and linker
processing in straightforward ways.
OMF dates from the early days of microprocessor development when
memories were tiny and storage was often punched paper tapes.
As a result, OMF divides the object file into a series of short
records, Figure \n+F.
Each record contains a type byte, a two-byte length, the contents, and a
checksum byte that makes the byte-wise sum of the entire record zero.
(Paper tape equipment had no built-in error detection, and errors due
to dust or sticky parts were not rare.)
OMF files are designed so that a linker on a machine without mass
storage can do its job with a minimum number of passes over the files.
Usually 1 1/2 passes do the trick, a partial pass to find the symbol names
which are placed near the front of each file, and then a full pass to do the
linking and produce the output.
.FG \nF "OMF record format"
picture of
.br
-- type byte
.br
-- two-byte length
.br
-- variable length data
.br
-- checksum byte
.EF
.P
OMF is greatly complicated by the need to deal with the 8086 segmented
architecture.
One of the major goal of an OMF linker is to pack code and data into
a minimum number of segments and segment groups.
Every piece of code or data in an OMF object is assigned to a segment,
and each segment in turn can be assigned to a segment group or segment
class.
(A group must be small enough to be addressed by a single segment
value, a class can be any size, so groups are used for both addressing
and storage management, while classes are just for storage management.)
Code can reference segments and groups by name, and can also reference
code within a segment relative to the base of the segment or the base
of the group.
.P
OMF also contains some support for overlay linking, although no
OMF linker I know of has ever supported it,
taking overlay instructions instead from a separate directive file.
.H2 "OMF records"
OMF currently defines at least 40 record types, too many to enumerate
here, so we'll look at a simple OMF file.
(The complete spec is in the Intel TIS documents.)
.P
OMF uses several coding techniques to make records as short as possible.
All name strings are variable length, stored as a length byte followed
by characters.
A null name (valid in some contexts) is a single zero byte.
Rather than refer to segments, symbols, groups, etc. by name, an OMF
module lists each name once in an LNAMES record and subsequently uses
a index into the list of names to define the names of segments,
groups, and symbols.
The first name is 1, the second 2, and so forth through the entire set
of names no matter how many LNAMES records they might have taken.
(This saves a small amount of space in the not uncommon case that a
segment and an external symbol have the same name since the
definitions can refer to the same string.)
Indexes in the range 0 through 0x7f are stored as one byte.
Indexes from 0x80 through 0x7fff are stored as two bytes, with the
high bit in the first byte indicating a two-byte sequence.
Oddly, the low 7 bits of the first byte are the high 7 bits of the
value and the second byte is the low 8 bits of the value, the opposite
of the native Intel order.
Segments, groups, and external symbols are also referred to by index,
with separate index sequences for each.
For example, assume a module lists the names DGROUP, CODE, and DATA,
defining name indexes 1, 2, and 3.
Then the module defines two segments called CODE and DATA, referring
to names 2 and 3.
Since CODE is the first segment defined, it will be segment index 1
and DATA will be segment index 2.
.P
The original OMF format was defined for the 16 bit Intel architecture.
For 32 bit programs, there are new OMF types defined for the record
types where the address size matters.
All of the 16 bit record types happened to have even numerical codes,
so the corresponding 32 bit record types have the odd code one greater
than the 16 bit type.
.H2 "Details of an OMF file"
Figure \n+F lists the records in a simple OMF file.
.FG \nF "Typical OMF record sequence"
THEADR program name
.br
COMENT flags and options
.br
LNAMES list of segment, group, and class names
.br
SEGDEF segment (one record per segment)
.br
GRPDEF group (one record per group)
.br
PUBDEF global symbols
.br
EXTDEF undefined external symbols (one per symbol)
.br
COMDEF common blocks
.br
COMENT end of pass1 info
.br
LEDATA chunk of code or data (multiple)
.br
LIDATA chunk of repeated data (multiple)
.br
FIXUPP relocations and external ref fixups, each following the LEDATA
or LIDATA to which it refers
.br
MODEND end of module
.EF
The file starts with a THEADR record that marks the start of the
module and gives the name of the module's source file as a string.
(If this module were part of a library, it would start with a similar
LHEADR record.)
.P
The second record is a badly misnamed COMENT record which contains
configuration information for the linker.
Each COMENT record contains some flag bits saying whether to keep the
comment when linked, a type byte, and the comment text.
Some comment types are indeed comments, e.g., the compiler version
number or a copyright notice, but several of them give essential
linker info such as the memory model to use (tiny through large), the
name of a library to search after processing this file, definitions of
weak external symbols, and a grab-bag of other types of data that
vendors shoe-horned into the OMF format.
.P
Next comes a series of LNAMES records that list all of the names used
in this module for segments, groups, classes, and overlays.
As noted above, the all the names in all LNAMES are logically
considered an array with the index of the first name being 1.
.P
After the LNAMES record come SEGDEF records, one for each segment
defined in the module.
The SEGDEF includes an index for the name of the segment, and the
class and overlay if any it belongs to.
Also included are the segment's attributes including its alignment
requirements and rules for combining it with same-name segments in
other modules, and its length.
.P
Next come GRPDEF records, if any, defining the groups in the module.
Each GRPDEF has the index for the group name and the indices for the
segments in the group.
.P
PUBDEF records define "public" symbols visible to other modules.
Each PUBDEF defines one or more symbols within a single group or
segment.
The record includes the index of the segment or group and for each
symbol, the symbol's offset within the segment or group, its name, and
a one-byte compiler-specific type field.
.P
EXTDEF records define undefined external symbols.
Each record contains the name of one symbol and a byte or two of
debugger symbol type.
COMDEF records define common blocks, and are similar to EXTDEF records
except that they also define a minimum size for the symbol.
All of the EXTDEF and COMDEF symbols in the module are logically an array,
so fixups can refer to them by index.
.P
Next comes an optional specialized COMENT record that marks the end of pass 1
data.
It tells the linker that it can skip the rest of the file in the first
pass of the linking process.
.P
The rest of the file consists of the actual code and data of the
program, intermixed with fixup records containing relocation and
external reference information.
There are two kinds of data records LEDATA (enumerated) and
LIDATA (iterated).
LEDATA simply has the segment index and starting offset, followed by
the data to store there.
LIDATA also starts with the segment and starting offset, but then has
a possibly nested set of repeated blocks of data.
LIDATA efficiently handles code generated for statements like this
Fortran:
.BC
INTEGER A(20,20) /400*42/
.EC
A single LIDATA can have a two- or four-byte block containing 42 and
repeat it 400 times.
.P
Each LEDATA or LEDATA that needs a fixup must be immediately followed
by the FIXUPP records.
FIXUPP is by far the most complicated record type.
Each fixup requires three items: first the target, the address being
referenced, second the
frame, the position in a segment or group relative to which the address is
calculated, and third the location to be fixed up.
Since it's very common to refer to a single frame in many fixups and
somewhat common to refer to a single target in many fixups, OMF
defines fixup
.I threads ,
two-bit codes used as shorthands for frames or targets, so at any
point there can be up to four frames and four targets with thread
numbers defined.
Each thread number can be redefined as often as needed.
For example, if a module includes a data group, that group is usually
used as the frame for nearly every data reference in the module, so
defining a thread number for the base address of that group saves a
great deal of space.
In practice a GRPDEF record is almost invariably followed by a FIXUPP
record defining a frame thread for that group.
.P
Each FIXUPP record is a sequence of subrecords, with each subrecord
either defining a thread or a fixup.
A thread definition subrecord has flag bits saying whether it's
defining a frame or target thread.
A target thread definition contains the thread number, the kind
of reference (segment relative, group relative, external relative),
the index of the base segment, group or symbol, and optionally a base
offset.
A frame thread definition includes the thread number, the kind of
reference (all the kinds for target definition plus two common special
cases, same segment as the location and same segment as the target.)
.P
Once the threads are defined, a fixup subrecord is relatively simple.
It contains the location to fix up, a code specifying the type of
fixup (16 bit offset, 16 bit segment, full segment:offset, 8 bit
relative, etc.), and the frame and target.
The frame and target can either refer to previously defined threads or
be specified in place.
.P
After the LEDATA, LIDATA, and FIXUPP records, the end of the module is
marked by a MODEND record, which can optionally specify the entry
point if the module is the main routine in a program.
.P
A real OMF file would contain more record types for local symbols,
line numbers, and other debugger info, and in a Windows environment
also info to create the imports and exports sections in a target NE
file (the segmented 16 bit predecessor of PE),
but the structure of the module doesn't change.
The order of records is quite flexible, particularly if there's no end
of pass 1 marker.
The only hard and fast rules are that THEADER and MODEND must come
first and last, FIXUPPs must immediately follow the LEDATA and LIDATA
to which they refer, and no intra-module forward references are
allowed.
In particular, it's permissible to emit records for symbols, segments,
and groups as they're defined, so long as they precede other records
that refer to them.
.
.H2 "Summary of OMF"
.
The OMF format is quite complicated compared to the other formats
we've seen.
Part of the complication is due to tricks to compress the data, part
due to the division of each module into many small records, part
due to incremental features added over the years, and part due to the
inherent complexity of segmented program addressing.
The consistent record format with typed records is a strong point,
since it both permits extension in a straightforward way, and permits
programs that process OMF files to skip records they don't understand.
.P
Nonetheless, now that even small desktop computers have megabytes of
RAM and large disks, the OMF division of the object into many small
records has become more trouble than it's worth.
The small record type of object module was very common up through the
1970s, but is now obsolescent.
. 
.H1 "Comparison of object formats"
We've seen seven different object and executable formats in this chapter, ranging from
the trivial (.COM) to the sophisticated (ELF and PE) to the rococo (OMF).
Modern object formats such as ELF try to group all of the data of a
single type together to make it easier for linkers to process.
They also lay out the file with virtual memory considerations in mind,
so that the system loader can map the file into the program's address
space with as little extra work as possible.
.P
Each object format shows the style of the system for which it was
defined.
Unix systems have historically kept their internal interfaces simple
and well-defined, and the a.out and ELF formats reflect that in their
relative simplicity and the lack of special case features.
Windows has gone in the other direction, with process management and user
interface intertwined.
.H1 Project
.
Here we define the simple object format used in the project assignments
in this book.
Unlike nearly every other object format, this one consists entirely
of lines of ASCII text.
This makes it possible to create sample object files in a text editor,
as well as making it easier to check the output files from the project
linker.
Figure \n+F sketches the format.
The segment, symbol, and relocation entries are represented as
lines of text with fields separated by spaces.
Each line may have extra fields at the end which programs should be prepared
to ignore.
Numbers are all hexadecimal.
.FG \nF "Project object format"
LINK
.br
.I "nsegs nsyms nrels"
.br
-- segments --
.br
-- symbols --
.br
-- rels --
.br
-- data --
.EF
.P
The first line is the ``magic number,'' the word
.T LINK .
.P
The second line contains at least three decimal numbers, the number of segments
in the file, the number of symbol table entries, and the number of
relocation entries.
There may be other information after the three numbers for extended
versions of the linker.
If there are no symbols or relocations, the respective number is zero.
.P
Next comes the segment definitions.
Each segment definition contains the segment name, the address where the
segment logically starts, the length of the segment in bytes, and a string
of code letters describing the segment.
Code letters include R for readable, W for writable, and P for present in
the object file.
(Other letters may be present as well.)
A typical set of segments for an a.out like file would be:
.BC
\&.text 1000 2500 RP
\&.data 4000 C00 RWP
\&.bss  5000 1900 RW
.EC 
Segments are numbered in the order their definitions appear, with the
first segment being number 1.
.P
Next comes the symbol table.
Each entry is of the form:
.BC
name value seg type
.EC
The name is the symbol name.
The value is the hex value of the symbol.
Seg is the segment number relative to which the segment is defined, or 0
for absolute or undefined symbols.
The type is a string of letters including D for defined or U for undefined.
Symbols are also numbered in the order they're listed, starting at 1.
.P
Next come the relocations, one to a line:
.BC
loc seg ref type ...
.EC
Loc is the location to be relocated, seg is the segment within which
the location is found, ref is the segment or symbol number to be relocated
there, and type is an architecture-dependent relocation type.
Common types are A4 for a four-byte absolute address, or R4 for a four-byte
relative address.
Some relocation types may have extra fields after the type.
.P
Following the relocations comes the object data.
The data for each segment is a single long hex string followed by a newline.
(This makes it easy to read and write section data in perl.)
Each pair of hex digits represents one byte.
The segment data strings are in the same order as the segment table, and there must
be segment data for each ``present'' segment.
The length of the hex string is determined by the the defined length of the segment;
if the segment is 100 bytes long, the line of segment data is 200 characters, not
counting the newline at the end.
.P
.I "Project 3-1:"
Write a perl program that reads an object files in this format and stores
the contents in a suitable form in perl tables and arrays, then writes
the file back out.
The output file need not be identical to the input, although it should be
semantically equivalent.
For example, the symbols need not be written in the same order they were
read, although if they're reordered, the relocation entries must be adjusted
to reflect the new order of the symbol table.
.
.H1 Exercises
.
1.
Would a text object format like the project format be practical?
(Hint: See Fraser and Hanson's paper "A Machine-Independent Linker.")