Information leakage through PDB files


This document describes a source of information leakage discovered during research into Microsoft's Program DataBase (PDB) debugging information files.



This project originally started as an investigation into Microsoft's Program DataBase (PDB) debugging information files. I started off looking at the WINE implementation of the DbgHelp APIs, but quickly found that Microsoft had introduced a new, incompatable, format with Visual Studio.NET. Whilst researching this new format, I found the Debug Information Accessor (DIA) SDK 2.0 which was handily provided with Visual Studio.NET.

This provided a great deal of functionality, much more than the old DbgHelp APIs. Unfortunately, however, it also had a COM interface that was quite hard to use. To ameliorate this, I started writing DiaLib, a C++ wrapper around the COM interface.

To test this library, I wrote PdbDump, which is a simple tool to dump out the contents of PDB files.


PDB Private Information

I had assumed, like most people, that PDB files contained just a simple list of function names and addresses; this is all that the old DbgHelp API exposed. However, while developing DiaLib, I noticed that on certain PDB files, PdbDump was outputting a lot more information than I had expected. This included C++ class prototypes, full enumeration definitions, local function variables, and much more.

Later I noticed that if I linked a debug version of a program against .LIB files (or .OBJ files), the generated PDB file would contain private information from the third party .LIB file as well as my own code.

This information is akin to the metadata in CLR or Java programs. It provides information on data structures, function prototypes, and certain variables in the compiled code. It does not contain any of the original sourcecode. So while it doesn't provide people with details of private algorithms, it does provide them with a lot more private information than expected.

This situation is obviously not ideal if you are releasing a commerical library.


PDB Private Information in .OBJ/.LIB Files

This is a more complex issue, and is best described with the help of an example. This contains a sample VS.NET C++ project containing a static library and an executable which uses that library.

The static library contains a public function, a private function, and a private structure definition. If you compile the projects as is (Debug or Release targets), and run pdbdump on the resulting testexe.pdb file, you will find the following somewhere in the output (edited to remove irrelevancies):

struct privatestruct1 {
  // non-static data --------------------------------
  /*<thisrel this0x0+>*/ /*|0x4|*/ int a;
  /*<thisrel this0x4+>*/ /*|0x4|*/ int c;
  /*<thisrel this0x8+>*/ /*|0x4|*/ char* banana;
// <size 0xc>

char* __cdecl aPublicFunction(/*<regrel ebp0x8+>*/ /*|0x4|*/ char* banana);
// <rva 0x11a60>
// <size 0x57>
// <staticlocal <rva 0x11aaf*>*/ /*|0x0|*/ ... > 
// <staticlocal <rva 0x11aa3*>*/ /*|0x0|*/ ... > 
// <staticlocal <rva 0x11a9b*>*/ /*|0x0|*/ ... > 
// <staticlocal <rva 0x284e0*>*/ /*|0x4|*/ int hello>
// <local <regrel ebp-0x10*>*/ /*|0xc|*/ struct privatestruct1 struct1>

char* __cdecl aPrivateFunction(/*<regrel ebp0x8+>*/ /*|0x4|*/ struct privatestruct1* privateStructArg);

As you can see, a considerable amount of information has leaked out of the static library. The private structure definition is visible, and even the static variable in aPublicFunction() is exposed. PdbDump does not output line number information, but it is likely that this is present as well.

The problem with .LIB files is that there are no visible PDB files present, yet somehow this information still leaks. If you dump the .LIB file with ar t testlibrary.lib (yes, .LIB files are just AR archives :), the only object visible is object1.obj. Therefore, the PDB information must be stored inside .OBJ files.


Stripping Private Information

To remove this private information from PDB files, you must use the /PDBSTRIPPED flag on Microsoft's compiler. This will reduce the PDB file to just function names.

Static .OBJ or .LIB files. You can remove the information from these by disabling debug information generation completely when compiling the source files for the library. It should be noted that Visual Studio sets the debugging status to "Program Database for Edit & Continue (/ZI)" if you create a library project using its wizard. I have not been able to find a switch equivalent to /PDBSTRIPPED for .OBJ/.LIB files.

DLL files do not require any form of stripping as long as any supplied debug information is contained in a PDBSTRIPPED PDB file. However, I believe there are other methods of storing debugging information; I have not investigated these for this problem.

I would recommend using PdbDump as a tool to check for information leakage in your commercially released PDB files, and also enusuring that no debug information is creeping into released .OBJ/.LIB files.


Future work

The next focus of this project will be to write a program to strip the private information from .OBJ/.LIB files (like the /PDBSTRIPPED option). Basic debugging (with function names) should still be possible, but the private information will be removed.



DiaLib and PdbDump are both hosted on sourceforge here. They are licensed under a BSD-style license.

A version of this document has also been published at