Git Product home page Git Product logo

ducible's Introduction

Ducible

Build Status

This is a tool to make builds of Portable Executables (PEs) and PDBs reproducible.

Timestamps and other non-deterministic data are embedded in DLLs, EXEs, and PDBs. If some source is compiled and linked twice without changing any source, the binary and PDB will not be bit-for-bit identical both times. This tool fixes that by modifying DLLs/EXEs in-place and rewriting PDBs.

Don't worry, Ducible won't mess with the functionality of your executable. All changes have no functional effect. It merely transforms one perfectly good executable into another perfectly good, yet reproducible(!), executable.

Why?

In general, reproducible builds give a verifiable path from source code to binary code. There are a number of security reasons and practical reasons for why this is good. More specifically, it enables

  • confidence that two parties built a binary with the same environment,
  • recreating a release bit-for-bit from source code,
  • recreating debug symbols for a particular version of source code,
  • verifiable and correct distributed builds,
  • better caching of builds,
  • no spurious changes in binaries under version control.

See also https://reproducible-builds.org/ for more information on why you should want this.

Using It

Usage is as follows:

$ ducible IMAGE [PDB]

The EXE/DLL is specified as the first parameter and the PDB is optionally specified as the second. The PDB must be modified because changing the image invalidates the signature for the PDB.

As a post-build step, simply run:

$ ducible MyModule.dll MyModule.pdb

The files are overwritten in-place.

Downloading It

See the releases for downloads.

Known Limitations

  1. This tool cannot prevent you from shooting yourself in the foot. Please don't ever have anything like this in your code:

    std::cout << "Build date: " << __DATE__ << " " << __TIME__ << std::endl;

    There is nothing that Ducible can do about this. Embedding dates or times might seem useful, but all they do is prevent reproducible builds. Once you have reproducible builds and a proper versioning scheme, embedding this information is pointless.

  2. Digital signing with trusted timestamping cannot be made reproducible (e.g., using Microsoft's signtool). Even while doing digital signing, you can still gain some of the benefits that using Ducible provides (e.g., recreating a PDB for debugging purposes). Digital signatures can also be stripped off after being applied to make comparing binaries possible.

  3. Incremental linking using /INCREMENTAL changes the executable quite extensivly upon subsequent builds. Ducible will invalidate the .ilk file and force the linker to do a full relink every time. However, this isn't enough to make the build reproducible. You can work around this issue by disabling /INCREMENTAL in the linker settings. (Unfortunately this is usually enabled by default for Debug builds in Visual Studio.)

Building It

This is written in C++11. There are no third party dependencies and it should be buildable and runnable on any platform (even non-Windows!).

Required build tools:

  1. Python 3 is used to generate src/version.h.

  2. Git is used to get the current commit hash. This is embedded in the src/version.h that is generated by Python.

Both of these tools must be in your PATH.

First, clone the repository:

git clone https://github.com/jasonwhite/ducible.git

Note: Downloading a zip of the source or cloning via SVN will cause the build to fail. The current commit hash is embedded in the executable to help trace the executable back to the exact source used to build it.

Windows

Just open vs/vs2015/ducible.sln and build it. Of course, this requires Visual Studio 2015 or later. If another version of Visual Studio is needed, please submit an issue or, better yet, a pull request.

You can also use the free Visual Studio 2015 build tools to build. You just need to invoke msbuild directly (in a Visual Studio command prompt):

msbuild vs\vs2015\ducible.sln /m /t:Build /p:Configuration=Release /p:Platform=x64

Linux and Mac

Although this is primarily a Windows utility, it was developed in a Unix environment simply because it was faster and easier. One might also want to use it when compiling Windows binaries on Linux. Thus, it builds and runs on Linux and Mac as well.

To build it, just run make.

Related Work

I am only aware of the zap_timestamp tool in Syzygy. Unfortunately, it has a few problems:

  1. It does not work with 64-bit PE files (i.e., the PE32+ format).
  2. It does not create a reproducible PDB file.
  3. It is a pain to build. It is part of a larger suite of tools that operate on PE files. That suite then requires Google's depot_tools. The end result is that you're required to download hundreds of megabytes of tooling around something that should be very simple.

License

As always, this tool uses the very liberal MIT License. Use it for whatever nefarious purposes you like.

ducible's People

Contributors

chetanbhat avatar evetsso avatar jasonwhite avatar oscarfv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ducible's Issues

Relative object file paths in the PDB

This is part feature request/part question.

Even after running the DLL and PDB through ducible, I found that the PDB hashes would differ, so the PE hashes would also differ, leading to cache misses.
Turns out that the linker writes out absolute paths to .obj files in the module info substream!
This is particularly obvious across different developer machines where their usernames are different.

Have you considered adding support to replace those with relative or fake absolute paths?
Do you know if there are any downsides of doing something like that?

Known Limitations - Digitally Signed Executable

Hello I have a problem. We have the original .exe and .pdb file of an application we released - that was digitally signed due to being released to customer desktops. Now the application randomly crashed and we have a dump file from the crash.

Now my question is does that mean there is no way for us to look at the dump file because the exe was digitally signed or is there some way around this with the help of your tool or in general?

Incremental build fails due to PDB being different

When building incrementally, the linker fails with the error LNK1209: program database 'foo.pdb' differs from previous link; relink or rebuild.

This could be caused by a number of reasons, one being that the resulting PDB is simply not valid. The linker may also keep information about the state of the PDB separately (maybe in the .ilk file). This needs to be researched more.

No version.h in source tarballs

The source tarballs (or at least v1.2.1.tar.gz) have no version.h, so make tries to configure version.h.in, which fails because there is no git checkout.

Source tarballs are important for projects that package binaries such as Linux distributions or MSYS2. They are more prone to include a package that is built from a source tarball that corresponds to a version number. Packages built from version control checkouts are frowned upon because that is a sign of instability.

Add support for static libs.

Hi Jason,

First, I would like to warmly thanks you for this program. We added ducible to our build pipeline and it is a huge gain of time while debugging.

I would like to know if it is possible to make ducible work for .lib file? AFAIK it is generating a pdb as well, that has the same limitations as the .dll one.

Thank you for your time!

Add tests

A number of test cases need to be added to verify that builds are indeed deterministic. This will involve building every set of sources files twice and comparing the checksums to verify they are identical. The negative case also needs to be tested. That is, modifying a source file should affect the output and produce non-identical files.

The plan is to write this test suite in Python as we already have a dependency on it.

Make PDB builds reproducible

Currently, PDB files are not patched to remove non-determinism.

This presents a larger challenge than just patching the PE file alone, because:

  1. The PDB file format is not well documented.
  2. The size of the PDB file seems to change with every build. Thus, we cannot use a memory map to make modifications to the PDB file as we will need to remove data.

References:

  1. https://github.com/Microsoft/microsoft-pdb

use hash of input file instead of fixed timestamp

The timestamp of a DLL seems to have at least some kind of role as a unique identifier that is relevant to the loader. Using the same timestamp for different versions of a DLL might introduce weird side effects.

Instead, the timestamp should be set to the hash of the input file to guarantee a unique value, see https://devblogs.microsoft.com/oldnewthing/20180103-00/?p=97705 which describes that Windows 10 DLLs now also use a hash of the file as a timestamp.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.