Git Product home page Git Product logo

microsoft-pdb's Introduction

Update: This repo has moved to archive. Currently we are not planning any new updates nor are we taking PRs. Please add any new requests related to debugging and PDB format to Visual Studio Feedback.

microsoft-pdb

This repo contains information from Microsoft about the PDB (Program Database) Symbol File format.

[WILL NOT currently build. There is a cvdump.exe till the repo is completed. pdb.h is in the langapi folder]

The intent here is to provide code that will show all the binary level formats and simple tools that can use the pdb.

Simply put ...We will make best efforts to role this foward with the new compilers and tools that we ship every release. We will continue to innovate and change binary API's and ABI's for all the Microsoft platforms and we will try to include the community by keeping this PDB repo in synch with the latest retail products (compilers,linkers,debuggers) just shipped.

By publishing this source code, we are by passing the publically documented API we provided for only reading a PDB - that was DIA https://msdn.microsoft.com/en-us/library/x93ctkx8.aspx

With this information we are now building the information for other compilers (and tools) to efficiently write a PDB.

The PDB format has not been officially documented, presenting a challenge for other compilers and toolsets (such as Clang/LLVM) that want to work with Windows or the Visual Studio debugger. We want to help the Open Source compilers to get onto the Windows platform.

The majority of content on this repo is presented as actual source files from the VC++ compiler toolset. Source code is the ultimate documentation :-) We hope that you will find it helpful. If you find that you need other information to successfully complete your project, please enter an Issue letting us know what information you need.

##Start here The file pdb.h (on in langapi), provides the API surface for mscorpdb.dll, which we ship with every compiler and toolset.

Important points:

• Mscorpdb.dll is what our linker and compiler uses to create PDB files. • Mscorpdb.dll implements the “stream” abstractions.

Also there is another file that we ship that should allow you to determine whether you have correctly produced an “empty” PDB which contains the minimal encoding to let another tool open and correctly parse that “empty” file. “Empty” really meaning a properl y formated file where the sections contain the correct information to indicate zero records or symbols are present A tool that I thought we also ship that would easily verify your “empty” PDB file is dia2dump.exe

So in summary, by using the externally defined function entry points in pdb.h you can call into mscorpdb.dll.

##What is a PDB

PDBs are files with multiple ‘streams’ of information in them. You can almost assume each stream as an individual file, except that storing them as individual files is wasteful and inconvenient, hence this multiple streams approach. PDB streams are not NTFS streams though. They can be implemented as NTFS streams, but since they are to be made available on Win9X as well, they use a home brewed implementation. The implementation allows a primitive form of two-phase commit protocol. The writers of PDB files write what ever they want to in PDBs, but it won’t be committed until an explicit commit is issued. This allows the clients quite a bit of flexibility - say for example, a compiler can keep on writing information, and just not commit it, if it encounters an error in users’ source code.

Each stream is identified with a unique stream number and an optional name. In a nutshell here’s how the PDB looks like -

Stream No. Contents Short Description
1 Pdb (header) Version information, and information to connect this PDB to the EXE
2 Tpi (Type manager) All the types used in the executable.
3 Dbi (Debug information) Holds section contributions, and list of ‘Mods’
4 NameMap Holds a hashed string table
4-(n+4) n Mod’s (Module information) Each Mod stream holds symbols and line numbers for one compiland
n+4 Global symbol hash An index that allows searching in global symbols by name
n+5 Public symbol hash An index that allows searching in public symbols by addresses
n+6 Symbol records Actual symbol records of global and public symbols
n+7 Type hash Hash used by the TPI stream.

microsoft-pdb's People

Contributors

greatkeeper avatar jimradigan avatar microsoft-github-policy-service[bot] avatar msftgits avatar russellhadley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microsoft-pdb's Issues

What's the status of this repository?

The README states (emphasis mine):

Simply put ...We will make best efforts to role this foward with the new compilers and tools that we ship every release. We will continue to innovate and change binary API's and ABI's for all the Microsoft platforms and we will try to include the community by keeping this PDB repo in synch with the latest retail products (compilers,linkers,debuggers) just shipped.

We could an update here @AndrewPardoe. 😄

Register source language enumerator value

Hi,

Is it possible to "register" a source language identifier to get listed in CV_CFL_LANG here https://github.com/Microsoft/microsoft-pdb/blob/master/include/cvconst.h#L300 ?

I have extracted the D language expression evaluator of the mago debug engine (https://github.com/rainers/mago/tree/master/EED/MagoNatCC) as a Concord extension (https://github.com/Microsoft/ConcordExtensibilitySamples) and used 'D' (0x44) as the source language identifier for the S_COMPILE CodeView record emitted by the compiler. The installer adds appropriate "CodeView Compiler" registry entries for the VS debugger to map this identifier to the D Concord extension. Long term, this might cause conflicts with other languages using the same enumerator value, though.
(I have asked for the source language registration at the Concord repo, too, but didn't get an answer: microsoft/ConcordExtensibilitySamples#28)

LLVM got support for the S_COMPILE3 record recently, and I'd like the 'D' identifier to be added there, too. It might be easier to get addition of the appropriate translation with some "official blessing" of the actual value.

Thanks.

Document .SBR and .BSC file formats?

Would it be possible for you to release information about the formats of SBR files emitted by Visual C++ when using the /FR option and the BSC files built by BSCMAKE? I've figured out the basic structure of SBR files, but there are still corner cases that I can't figure out.

Using cvdump.exe

Dumb question time - how can I use cvdump.exe, and what would I use it for?

-Ben

Information about the PE format?

How about detailed information about the PE format?

I have software to generate a PE executable image that works fine in Windows XP/Vista/7, but not in Window 8/10. I had to guess what to change to make the executable work.

Global Symbols and their segs

Hello,
in the struct:
typedef struct DATASYM32 {
unsigned short reclen; // Record length
unsigned short rectyp; // S_LDATA32, S_GDATA32, S_LMANDATA, S_GMANDATA
CV_typ_t typind; // Type index, or Metadata token if a managed symbol
CV_uoff32_t off;
unsigned short seg;
unsigned char name[1]; // Length-prefixed name
} DATASYM32;

that is used for global data, there is that unsigned short seg. from looking at it and the pe, it looks like (seg-1) is the pe section that one need to offset off by to look at that global. The question arises, what about seg=0? it looks like it's used for numerous things, including __ImageBase with off=0, and it looks like it's just offseted by the virtual address of that pe.

My questions are:

  1. is my thinking is correct that the seg-1 is the section to offset by to look for the global?
  2. is seg==0 is to offset relative to the virtual address of the pe?
  3. why are there numerous globals with seg=0 and off=0 that are not recognized when typed into visual studio? things like __arct_country_count and _wpgmptr?

Thanks,
The 8th mage

Visual Studio 2017 and PDB files

Does the new Visual Studio 2017 version (the one that just entered RC state) add anything new to the PDB format? Are there any plans to update this repository with any new information in the newer PDB files?

Publish stuff for undecorating C++ symbols

I would like to see code (or docs) for undecorating C++ symbols made available.

The Visual C++ runtime library contains a function called unDName (or something like that, its referenced all over the CRT source code) but that function is undocumented and no official header for it is available (3rd party headers exist for it but they may not have the correct definitions). Also not everyone working with PDB files can link to the Visual C++ runtime library (even if undname was usable). The Visual Studio 2015 CRT source code includes an undname.cxx file but it is missing required header files (and cant be used anyway because of the license)

There is also a function UnDecorateSymbolName in dbghelp.dll but it doesn't support the full set of decorated names and not everyone working with pdb files is in a position to use dbghelp.dll or the UnDecorateSymbolName function.

There is a tool called undname.exe in the Windows SDK but that cant be redistributed and its a tool, not a library you can call so that isn't an option.

There are 3rd party implementations for undecorating symbols but those may not be complete or accurate.

There is also code claiming to be from Microsoft at this URL https://github.com/sundayliu/windbg/tree/master/langapi/undname but that doesn't have any license attached so we cant use it. Also we dont know how old that is or if it was even legitimately released by Microsoft (its quite possible this code came from one of the various source code thefts Microsoft has experienced in the past (e.g. Windows NT4 source or Windows 2000 source).

I would like to see usable complete source code for undecorating names published (under the same MIT license as the PDB code ideally) or if that is not possible, complete official header files required for using the undname function in Visual C++ libraries.

What does the values in S_PUB32 mean?

So, I used cvdump to dump my PDB file. I would like to understand what these values in S_PUB32 mean:

eg.
S_PUB32: [0001:00001B80], Flags: 00000002, ___security_init_cookie

I understand that 00001B80 means that __security_init_cookie is found at 00001B80 offset to the image base address, but I do not understand what the other values mean.

line numbers on the asm block

you write this struct:

struct CV_Line_t {
unsigned long offset; // Offset to start of code bytes for line number
unsigned long linenumStart:24; // line where statement/expression starts
unsigned long deltaLineEnd:7; // delta to line where statement ends (optional)
unsigned long fStatement:1; // true if a statement linenumber, else an expression line num
};

but i find the value 0x80f00f00 for the second unsigned long, which means that the line number is 0xf00f00 which is not a valid line number. can it be that you added more bits to the delta? can you help me with that?
thanks, the mage.

.debug$S and .debug$T sections in COFF obj files

Is there any documentation on the layout of the .debug$S and .debug$T sections in COFF obj files (as output by the C++ compiler in Visual Studio 2015 specifically) and which of the data structures (from the PDB documentation in this repo or the Windows SDK headers or whatever else) I need if I want to parse this information?

I have read the stuff in this repo but I cant see anything related to .debug$S or .debug$T and I have read the PE COFF specs but they only say "contains debugging information" with no details on their format.

Merging multiple PDB and IDB files

Hi,
Is it possible to use this library to merge several PDB and IDB files (created by distributed build) to single file accepted by MSBuild linker?

If so, can anyone please direct me to any example or class/functions which should I use for it?

Thanks

Address of first section in Section Map substream

The section map substream has lengths of each section, which allows us to cumulatively add and get the address of the sections.
But, the length of first section (0th) is not there.
I used to assume the length to be 0x1000, and hence, the address of first section as 0x1000.
But in some PDB files, i have seen it to be 0x2000.

Where in the PDB file is this information mentioned?

Please add in CVDump tool

CVInfo is a good start, but please add information about the CVDump tool that outputs debug information from binaries in a more readable format.

S_THUNK32 Variant Field Location Incorrect?

In cvdump, when reading an S_THUNK32 symbol, I believe the location calculated for the variant field is incorrect. The current code is:
const void *pVariant = psym->name + *psym->name + 1;

This assumes the 'name' field has been emitted using a length-prefixed form, but it looks like it is encoded as a UTF-8 null-terminated string instead.

Won't this prevent cvdump from correctly displaying the variant fields for the adjustor/vcall/pcode ordinals?

The order of the entries with same hash in GLOBALS stream.

Hi,

First, some background. I am currently writing a tool to read and write PDB files related to managed CLR assemblies (and only those). Reading was the easy part, as I can skip most of the PDB infra stuff and just read the actual data. However, I am currently a bit stuck on one specific issue related to writing a PDB file.

This issue occurs when there are name collisions in GLOBALS stream (function/token -ref gets same hash value as some other function/token ref). The most common case for this are constructors, since all managed code constructors are seen as functions named '.ctor' in PDB.

When the hash collision occurs, the order of the items with the same hash seems to matter. If I write out collisioned entries with just some fixed order (e.g. ascending/descending by module index, or something simple like that), things won't work as expected, since then the ISymUnmanagedReader COM object will fail to locate some of the methods part of the collision.

After a bit of digging around, I've found out that only the name of the module seems to affect the order of the items with the same hash value. The token of the function and the order of the functions does not seem to have effect on how the collided entries are sorted in GLOBALS stream. Furthermore, the order seems to stem from some value calculated from the module name, most likely another hash. However, the lhashPbCb function of struct Hasher in PDB\include\misc.h (which is used by GLOBALS stream to hash the names of function/token ref entries themselves) does not seem to provide correct values. The lhashPbCb function of struct HasherV2 in the same file does not seem to be the one used to hash module names, either.

My question is therefore this: what is the logic of ordering the collided entries in GLOBALS stream, at least for the PDB files related to managed code? Can you provide directly the code behind it, or could you explain verbally how the order logic works? This seems, at the moment at least, one last issue preventing me from creating PDB file which is readable by ISymUnmanagedReader.

Breakpad Use Case

The README noted you would like issues pertaining to use cases, so here is a slightly complicated one. :)

If you are not aware, Breakpad symbol files are a way to get cross platform support for decoding callstacks from crash dumps/bug reports etc. The Breakpad symbol files are also ~20% of the size of the PDB they are generated from since they only contain a subset of the information.

I helped create a PDB reader here that has most of the support needed, but there are some things missing, as well as edge cases that sometimes crop up due to slight differences between eg. the standard MSVC++ Windows compiler and the XBox 1 compiler. (Though this will hopefully be resolved once the XB1 uses the same exact compiler which is coming soon in VS2015).

The reason this was made was because of several reasons:

  • Abysmal performance of DIA.
    • All strings are heap allocated, and they are all considered wide strings, even though I have yet to find string records for eg. function names/type names etc that are anything but standard ASCII.
    • The COM interface is an impediment to linear processing, it was only designed for doing single lookups of individual pieces of information it seems.
  • Ability to process anywhere
    • DIA makes the symbol processing have a hard dependency on Windows, most shops would like the option of being able to do the Breakpad generation on a Linux box if desired.

So basically I guess I am asking for...an almost full spec of the format. :) All of the information I used to reverse engineer parts of the format was either old, or only showed some of the pertinent data (eg. MS CCI.

.msf file archives?

Any possibility to archive some representative .msf files for testing? A minimum file (empty) and a large file would be great.

DBI signature 20000404?

I've produced a python port of the MS CLR PDB parser and when running it across a large cache of real PDBs I'm seeing it fail to parse any where the DBI signature is 20000404 [DBI version varies but is normally something like 1308076843]. In all cases the PDB stream signature is 20000404.

Looking through the code in this repo it appears you handle DBI streams with a signature of 19990903 as the default, and with the new style header, and streams with any other signature using the old style header. I've tried replicating this but the data I've got in the DBi stream defiantly isn't in the old header format.

Does anyone have any pointers on how to handle DB streams with a signature of 20000404, or even newer ones like DBIImpvV110 (20091201).

Heres an example of the bytes from one of these DBI streams:

94 2e 31 01 2b ab f7 4d 01 00 00 00 99 44 fe 6b
00 7f 9b 49 b6 92 d0 a2 a6 90 6b fd 07 00 00 00
2f 6e 61 6d 65 73 00 01 00 00 00 02 00 00 00 01
00 00 00 02 00 00 00 00 00 00 00 00

Update cvinfo.h

@AndrewPardoe
There appear to be some undocumented additions in VS2019 that are not covered in cvinfo.h

Known undocumented nodes:
Leaf 0x1609
Sym 0x1167 (S_FASTLINK)
Sym 0x1168 (S_INLINEES)
Sym 0x1176
Sym 0x1179

Tools such as llvm-pdbutil are unable to parse pdb files containing those undocumented nodes. Would it be possible to update cvinfo.h?

cvdump/type7.cpp is blank, DumpModTypC7 implementation missing...

No data was pushed for this file, it looks like cvdump code relies on it though: some calls to DumpModTypC7() are made and the headers declare DumpModTypC7, but the function can't be found anywhere.

type6.cpp contains code for DumpModTypC6, so my guess is type7.cpp should have the DumpModTypC7 code.

Is there any reason for type7.cpp not being released publicly? It would really come in helpful for understanding how cvdump parses types...

objcrt.pdb not loaded

We are trying to convert our IOS project to Windows using vsimporter.
Very often we are getting the error message objcrt.pdb and Foundation.pdb not loaded.
We are not able to understand the exact issue. If anyone can explain it that would be helpful.

Thanks & Regards,
Swathi

Document time-stamp & other nondeterministic content

I've previously struggled to get "reproducible builds" (https://reproducible-builds.org/) with Visual Studio on Windows. One of the problems have been non-deterministic data inside PDF files, which I suspect is due to time-stamps or similar serial numbers.

Would it be possible to make some form of wiki-page that documents sources of non-deterministic content inside PDBs, together with tips for how to avoid it?

Thanks in advance.

Please provide cvdump.pdb as well.

You provide the executable, which is nice, but as long as things are in a state where we cannot build the thing ourselves, providing the PDB for the executable that you do include would at least let us step through and get a better feel for how it works.

Providing both debug and release builds of the tool might be useful as well.

Right now I am looking at an issue where cvdump says the TYPES stream is corrupt. Why? Hard to say - I am having to step through in assembly language to try and deduce why this might be the case, and as you might expect this is a tedious process.

Missing many headers needed by cvdump

I know the README.md already says that the repository won't yet build, but I figured something more specific might be useful.

At least these headers are missing:

  • verstamp.h: used by the .rc file
  • version.h and vcver.h: Most likely not very necessary; vcver.h in particular seems irrelevant for non-integrated builds
  • cvexefmt.h: "format of CodeView information in exe" (there are some old versions floating around the web, though; evidently it has appeared in samples before)
  • _winnt2.h: Presumably a bastardized (or simply extra-fresh) version of winnt.h? We might be able to get away with using a sufficiently-fresh copy of winnt.h proper
  • ecoff.h
  • safestk.h: I'm extra certain we need this one
  • map_t.h: This one, too
  • armregs.h and arm64regs.h: I'm a bit surprised we're not missing more of these, but it looks like the symbol names for all other arches are hardcoded in dumpsym7.cpp.

(At least the register table for IA64 isn't indexed by the CV register numbers; its too bad that C++ still doesn't have designated initializers like C does...)

And also these items mentioned in cvdump.nativeproj:

    <CCompile Include="..\..\..\misc\utf8.c"/>
    <Link Include="$(OutputLibPath)mspdb$(VCToolsProdVerSuffix)$(BuildSuffix)-libcmt.lib">
        <ProjectReference>$(VCToolsRootPath)\PDB\lic\lic.nativeproj</ProjectReference>
    </Link>
    <Link Include="$(OutputLibPath)msobj$(VCToolsProdVerSuffix)$(BuildSuffix)-libcmt.lib">
        <ProjectReference>$(VCToolsRootPath)\PDB\objfile\lic\lic.nativeproj</ProjectReference>
    </Link>
    <Link Include="$(OutputLibPath)msdia$(VCToolsProdVerSuffix)$(BuildSuffix)-libcmt.lib">
        <ProjectReference>$(VCToolsRootPath)\PDB\dia2\lic\lic.nativeproj</ProjectReference>
    </Link>

Address-to-Symbol Resolution

Is it possible to get some information on how to resolve an address in a DLL to a function name, the same way Visual Studio does with Windows DLLs while debugging? I once tried writing a small program that uses the Windows dbghelp.dll APIs to do this, but those APIs are so buggy as to be unusable. It would be even better if there is documentation that makes an implementation of a tool like atos(1) for Windows possible. Thanks!

Differentiating between fuction parameter and function variable

hello,
i'm trying to parse a pdb in order to write a debugger. i've got the following sample function:
`static void clear(PictureBuffer buffer,unsigned int Color)
{
PictureBuffer a = buffer;
for(int i=0;i<a.height;i++)
{
for(int j=0;j<buffer.width;j++)
{
buffer.picture[i*buffer.pitch+j]=Color;

    }
}

}
`
and you can see in this function 2 instances of PictureBuffer, one of them is passed as a parameter and one is in the function. the function parameter as speced in x64 call convention is passed as a pointer, and the instance in the scope sits as values on the stack
well i've got the hex for the places in the pdb here, and i don't see any difference between the two instances, can you point me to how i can know whether it's a pointer or not?

hex

also, is there a place to ask question about the pdb file format that is not in the github issues section? please point me to it if there is one.
thanks,
the_8th_mage.

cvdump does not recognize symbol type 0x1168. PDB generated by VS 2017 C++.

I compiled a file containing a call to an inline function, using VS 2017. Running cvdump.exe produces an error message about a symbol record type 1168 (which is hex). Apparently, the new VS 2017 toolset is producing this record, and your cvinfo.h does not include it in the SYM_ENUM_e type.
I looked at this record in the PDB, and it contains the number 2, and type IDs for the two inline functions called in a function.
cvinfo.h needs updating. This is especially important as this file is being cited in various other web resources as the 'official' documentation of the symbol records in a PDB.
I also ran cvinfo.h on the OBJ file, with the same result.

Here's my example:

__inline int inl(int x)
{
return x + 1;
}

extern "C"
int seh(void)
{
X xx;
int x = xx.foo ();
x = inl(x);
...

cvdump output is:

(000728) S_GPROC32: [0002:00001F40], Cb: 00000084, Type: 0x2249, seh
Parent: 00000000, End: 000008E4, Next: 00000000
Debug start: 0000000E, Debug end: 0000007E

(000754) S_FRAMEPROC:
Frame size = 0x00000060 bytes
Pad size = 0x00000040 bytes
Offset of pad in frame = 0x00000020
Size of callee save registers = 0x00000000
Address of exception handler = 0000:00000000
Function info: seh invalid_pgo_counts opt_for_speed Local=rbp Param=rbp (0x00128040)
Error: unknown symbol record type 1168!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.