Posts /

The Portable Executable Format (PE)

Twitter Facebook
02 Apr 2023

The Portable Executable Format (PE)

Introduction

Is there any other way of continuing the posting with a topic that’s not this one? After understanding how the Executable and Linkable Format works (you read about that in here), I understand you are eager for more, and not happy with only understanding Unix systems. That is why today we will cover the Portable Executable format, which (spoiler incoming) is the analogous to ELF for Windows systems.

In this post I will go through an Overview of what the PE file is, what it is used for, and how it works.

There is obviously much more to cover than what can fit in the scope of a 10-minute post, so I will leave you a couple of links at the end for you to do your own research, but don’t worry, you will be able to conduct such research with ease after finishing this read.

One last important thing before the fun, just in case you happen to be on a Unix machine and want to follow along with this guide, you can download a Windows 10 Image for research purposes from here. It’s official and legal, so you don’t have to worry about me trying to install a keylogger on your system.

The other option you have is to compile the source code with a cross-compiler, I will be using MinGW, which can be installed with :

sudo apt-get install mingw-w64

The PE format

Let’s start with the Wikipedia definition, which I think is pretty solid :

The Portable Executable (PE format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary for the Windows OS loader to manage the wrapped executable code.

That is pretty understandable, the PE format is in charge of telling the dynamic linker how the binary must be mapped onto memory to create the image of a process. As you might already know, it is exclusive of Windows systems, and just as some extra information, it is fundamental to understand this format if you want to do any kind of malware reversing, game hacking, etc.

Let’s have a look at its structure (sorry for the monster image but I think it is exceptional) :

Untitled

To analyze the PE format, the tool PE File is the go to, you can install it with :

sudo apt-get install python-pefile     //Debian based distros
sudo pip install pefile                //Python Repos

DOS Header

The first thing that you will encounter is the DOS header, which is represented with the IMAGE_DOS_HEADER structure :

typedef struct _IMAGE_DOS_HEADER {
    WORD   e_magic;               // Magic number
    WORD   e_cblp;                // Bytes on last page of file
    WORD   e_cp;                  // Pages in file
    WORD   e_crlc;                // Relocations
    WORD   e_cparhdr;             // Size of header in paragraphs
    WORD   e_minalloc;            // Minimum extra paragraphs needed
    WORD   e_maxalloc;            // Maximum extra paragraphs needed
    WORD   e_ss;                  // Initial (relative) SS value
    WORD   e_sp;                  // Initial SP value
    WORD   e_csum;                // Checksum
    WORD   e_ip;                  // Initial IP value
    WORD   e_cs;                  // Initial (relative) CS value
    WORD   e_lfarlc;              // File address of relocation table
    WORD   e_ovno;                // Overlay number
    WORD   e_res[4];              // Reserved words
    WORD   e_oemid;               // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;             // OEM information; e_oemid specific
    WORD   e_res2[10];            // Reserved words
    LONG   e_lfanew;              // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

The first 2 Bytes are the magic number, which is ‘MZ’, and you will see along your journey this is a very characteristic magic number (I can’t explain magic number in this post but you can learn here).

Just a little bit of trivia, this number is due to Mark Zbikowski, who designed the DOS executable format.

From all this information in the struct, we will just focus on the field e_lfanew, which provides the offset to access the PE Header.

I have compiled a little hello world program with MinGW and PE File lets us analyze the headers :

Untitled

According to this information, we will find the PE header at offset 0x80.

PE Header

The PE Header is represented by IMAGE_NT_HEADERS :

typedef struct _IMAGE_NT_HEADERS {
    DWORD                   Signature;
    IMAGE_FILE_HEADER       FileHeader;
    IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

As you can see, this Header is comprised of a signature (the value of which is 0x4550 which is the same as ‘PE’), a file header and an optional header.

We can double check this with pefile :

Untitled

Windows PE Header.

The IMAGE_FILE_HEADER contains the following information :

typedef struct _IMAGE_FILE_HEADER {
    WORD  Machine;
    WORD  NumberOfSections;
    DWORD TimeDateStamp;
    DWORD PointerToSymbolTable;
    DWORD NumberOfSymbols;
    WORD  SizeOfOptionalHeader;
    WORD  Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;

Here is where we start to see some valuable information, as you can observe, we obtain information about the machine, the number of sections of the binary, the location of the symbol table, or the size of the optional header.

As you were expecting, pefile provides this information :

Untitled

Windows File Header.

At this point, we have reached the optional header, and you will be surprised of how much information it stores :

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
  DWORD                SizeOfUninitializedData;
  DWORD                AddressOfEntryPoint;
  DWORD                BaseOfCode;
  DWORD                BaseOfData;
  DWORD                ImageBase;
  DWORD                SectionAlignment;
  DWORD                FileAlignment;
  WORD                 MajorOperatingSystemVersion;
  WORD                 MinorOperatingSystemVersion;
  WORD                 MajorImageVersion;
  WORD                 MinorImageVersion;
  WORD                 MajorSubsystemVersion;
  WORD                 MinorSubsystemVersion;
  DWORD                Win32VersionValue;
  DWORD                SizeOfImage;
  DWORD                SizeOfHeaders;
  DWORD                CheckSum;
  WORD                 Subsystem;
  WORD                 DllCharacteristics;
  DWORD                SizeOfStackReserve;
  DWORD                SizeOfStackCommit;
  DWORD                SizeOfHeapReserve;
  DWORD                SizeOfHeapCommit;
  DWORD                LoaderFlags;
  DWORD                NumberOfRvaAndSizes;
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

Don’t say I didn’t warn you.

Arguably, the most important element of the optional header is the AddressOfEntryPoint, which tells us where the first instruction of code is once the program runs.

i.e. it points to the first assembly instruction.

I will let you explore the rest of elements, which you can look up in the msdn for IMAGE_OPTIONAL_HEADER32.

There is an exception, though, the DataDirectory array, which holds information about a wide range of important structures of the PE file.

Such structures are represented through the IMAGE_DATA_DIRECTORY struct, which bears the following appearance :

typedef struct _IMAGE_DATA_DIRECTORY {
    DWORD VirtualAddress;
    DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;

Each section is identified through an address and a size, and these are the sections that are contained within the Data Directory :

Untitled

That is a lot of useful information!

If you want to investigate a little bit more about anything in this table, I would point you towards the export and import table, which is what is really relevant for hacking.

In a future post that I am preparing (man, does it take time) we will see how this format can be exploited to access some windows functionality that an attacker shouldn’t be able to reach.

PE sections

Following the PE Header, we reach the PE Sections, represented by IMAGE_SECTION_HEADER.

typedef struct _IMAGE_SECTION_HEADER {
    BYTE  Name[IMAGE_SIZEOF_SHORT_NAME];
    union {
      DWORD PhysicalAddress;
      DWORD VirtualSize;
    } Misc;
    DWORD VirtualAddress;
    DWORD SizeOfRawData;
    DWORD PointerToRawData;
    DWORD PointerToRelocations;
    DWORD PointerToLinenumbers;
    WORD  NumberOfRelocations;
    WORD  NumberOfLinenumbers;
    DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;

Windows Section Header.

We know the number of sections that we will encounter thanks to the element NumberOfSections that we saw in the PE Header (In IMAGE_FILE_HEADER).

pefile displays the information of the struct so that we can easily interpret it :

Untitled

Don’t forget the sections are stored in an array, so there are a bunch of them, I’m only showing the .text seciton for spacing reasons.

Conclusion

I’m afraid that is all the contents of the Portable Executable!

Well, it contains way more information that we’ve covered in this post (you can analyze each section, each element in each struct, etc.), but as always I encourage you to go and find out for yourselves, which is always more fun than reading a random guy from the Internet.

Knowing how this format works is crucial for any kind of analysis of a binary file in a Windows environment, and I believe we have gone through the utmost basics today.

This knowledge will get you ready to start diving into the MSDN (Microsoft Developer Network) documentation, thus I consider we have taken down the first initial barrier (sometimes things seem really hard from the outside but once you get your hands dirty the fog clears out).

Just as in every post, I would love to show my gratitude to you for reading this page, I hope I have been able to help you in any way.

Have a great week!

Resources


Twitter Facebook