So at this point we now know how to write our programs on an extremely low level, and thus produce an executable file that very closely matches what we want. But the question is, how is our program code now actually stored on disk?
We'll begin our investigation into executable formats by looking at the ELF binary specification and related tools, and then move on to examine the PE binary specification, and related tools for dealing with those binaries.
Recall that when a Linux program runs, we start at the _start function, and move on from there to __libc_start_main, and eventually to main, which is our code. So somehow the operating system is gathering together a whole lot of code from various places, and loading it into memory and then running it. How does it know what code goes where?
The answer on Linux and UNIX is the ELF binary specification. ELF specifies a standard format for mapping your code on disk to a complete executable image in memory that consists of your code, a stack, a heap (for malloc), and all the libraries you link against.
So lets provide an overview of the information needed for our purposes here, and refer the user to the ELF spec to fill in the details if they wish. We'll start from the beginning of a typical executable and work our way down.
There are three header areas in an ELF file: The main ELF file header, the program headers, and then the section headers. The program code lies in between the program headers and the section headers.
TODO: Insert figure here to show a typical ELF layout.
NOTE: ELF is extremely flexible. Many of these sections can be shunk, expanded, removed, etc. In fact, it is not outside the realm of possibility that some programs may deliberately make abnormal, yet valid ELF headers and files to try to make reverse engineering difficult (vmware does this, for example).
The main elf header basically tells us where everything is located in the file. It comes at the very beginning of the executable, and can be read directly from the first e_ehsize (default: 52) bytes of the file into this structure.
/* ELF File Header */ typedef struct { unsigned char e_ident[EI_NIDENT]; /* Magic number and other info */ Elf32_Half e_type; /* Object file type */ Elf32_Half e_machine; /* Architecture */ Elf32_Word e_version; /* Object file version */ Elf32_Addr e_entry; /* Entry point virtual address */ Elf32_Off e_phoff; /* Program header table file offset */ Elf32_Off e_shoff; /* Section header table file offset */ Elf32_Word e_flags; /* Processor-specific flags */ Elf32_Half e_ehsize; /* ELF header size in bytes */ Elf32_Half e_phentsize; /* Program header table entry size */ Elf32_Half e_phnum; /* Program header table entry count */ Elf32_Half e_shentsize; /* Section header table entry size */ Elf32_Half e_shnum; /* Section header table entry count */ Elf32_Half e_shstrndx; /* Section header string table index */ } Elf32_Ehdr;
The fields of interest to us are e_entry, e_phoff, e_shoff, and the sizes given. e_entry specifies the location of _start, e_phoff shows us where the array of program headers lies in relation to the start of the executable, and e_shoff shows us the same for the section headers.
The next portion of the program are the ELF program headers. These describe the sections of the program that contain executable program code to get mapped into the program address space as it loads.
/* Program segment header. */ typedef struct { Elf32_Word p_type; /* Segment type */ Elf32_Off p_offset; /* Segment file offset */ Elf32_Addr p_vaddr; /* Segment virtual address */ Elf32_Addr p_paddr; /* Segment physical address */ Elf32_Word p_filesz; /* Segment size in file */ Elf32_Word p_memsz; /* Segment size in memory */ Elf32_Word p_flags; /* Segment flags */ Elf32_Word p_align; /* Segment alignment */ } Elf32_Phdr;
Keep in mind that there are going to a few of these (usually 2) end-to-end (ie forming an array of structs) in a typical ELF executable. The interesting fields in this structure are p_offset, p_filesz, and p_memsz, all of which we will need to make use of in the code modification chapter.
The meat of the ELF file comes next. The actual locations and sizes of portions of the body are described by the program headers above, and contain the executable instructions from our assembly file, as well as string constants and global variable declarations. This will become important in the next chapter, program modification. (TODO: How to link to other chapters)
The ELF section headers describe various named sections in an executable file. Each section has an entry in the section headers array, which is found at the bottom of the executable and has the following format:
/* Section header. */ typedef struct { Elf32_Word sh_name; /* Section name (string tbl index) */ Elf32_Word sh_type; /* Section type */ Elf32_Word sh_flags; /* Section flags */ Elf32_Addr sh_addr; /* Section virtual addr at execution */ Elf32_Off sh_offset; /* Section file offset */ Elf32_Word sh_size; /* Section size in bytes */ Elf32_Word sh_link; /* Link to another section */ Elf32_Word sh_info; /* Additional section information */ Elf32_Word sh_addralign; /* Section alignment */ Elf32_Word sh_entsize; /* Entry size if section holds table */ } Elf32_Shdr;
The section headers are entirely optional, however. A list of common sections can be found on page 20 of the ELF Spec PDF
Editing ELF is often desired during reverse engineering, especially when we want to insert bodies of code, or if we want to reverse engineer binaries with deliberately corrupted ELF headers.
Now you could edit these headers by hand using the <elf.h> header file and those above structures, but luckily there is already a nice editor called HT Editor that allows you to examine and modify all sections of an ELF program, from ELF header to actual instructions. (TODO: instructions, screenshots of HTE)
Do note that changing the size of various program sections in the ELF headers will most likely break things. We will get into how to edit ELF in more detail when we are talking about actual code insertion, which is the next chapter.