m4b / goblin Goto Github PK

View Code? Open in Web Editor NEW

1.2K 24.0 156.0 3.05 MB

An impish, cross-platform binary parsing crate, written in Rust

License: MIT License

Makefile 0.26% Rust 99.58% C 0.16%

elf mach archive pe binary-analysis reverse-engineering cross-platform

goblin's People

Contributors

Stargazers

Watchers

Forkers

flanfly ticki llogiq jdub coffenhu pombredanne linecode willglynn h0k5 mitsuhiko mre philipc lion328 aidanhs ignatenkobrain ranweiler jan-auer stefantrew2 tathanhdinh livingthought luser kjempelodott rocallahan jrmuizel amanieu sebastiencs est31 tempbottle gz roblabla mwolting ibabushkin endeav0r nanne007 pzixel techno-coder pchickey shnatsel xiaolongn100 iximeow tesuji wickerwaka isgasho burjui duzhanyuan wyxloading danielhenrymantilla ko1n atul9 jsgf agent00049 modulexcite reversetools wolfarctic jackcmay expixel 5l1v3r1 wareyang timmmm iptq jabedude evian-zhang emmanuel099 quake not-wlan woodruffw-forks rulibc jessehui tiwalun magebeans glandium sollyucko nico-abram jablonskim icodein 2vg ischeinkman s1341 gankra oxidecomputer legorooj assarbad wuhx maimonator ubamrein johannst npmccallum lj94093 dannypsnl-fork skdltmxn andrewpedia isiaon dureuill justanotherdot n01e0 swatinem ajunlonglive standardgalactic mrk-its xorpse

goblin's Issues

pull core-only dependent functions and impls out of `impure`

This will make them accessible to consumers who use don't use std (because why not), and will also remove the warnings for those who use std but not endian_fd (binary loaders, dryad), and will also make the api more usable in general.

mach: Segment::from_{32,64} may panic when segment data is invalid

Both functions do not verify that fileoff and fileoff+filesize are within bytes range, which may result in a panic for invalid files.

`std` is required when `std` feature is not used

Due to the log crate, std is required. To fix this, remove the use_std feature from log.

move endian_fd reading into nonpure and use generic Reader

The idea is that in a non-pure setting, you'll typically always want endian_fd reading, which means we can use the generic Reader trait from std.

This will enable unit testing the endian fd readers by passing Cursor'd byte arrays.

We may be able to drop the no_endian_fd feature flag if this pans out the way I think it can.

make mach symbol iterator resultless

Similar to ELF's symbol iterator, we have a result on invocation, and elements are resultness, because we know the size beforehand.

EASY PE: should Import.name be option?

            Import {
                name: "ORDINAL 0",
                dll: "WS2_32.dll",
                ordinal: 0,
                offset: 62264,
                rva: 0,
                size: 4
            },

this ("ORDINAL 0") is ~~basically~~ a stupid hack, but it makes working with Import much nicer. Alternatively, could make an enum, but, i dunno that just annoys me for some reason

Goblin panics when reading certain kinds of UPXed binaries

I was investigating my options for parsing EXE files to determine what environment to auto-fill in my experimental game launcher (ie. DOSBox, Wine, Wine+qemu-user, Mono, etc.) and I managed to trigger some panics in goblin.

====================
TESTING WITH GOBLIN:
====================
unknown magic: ./hello_owatcom_com.com
Parse error: ./hello_owatcom_os2v2.exe => Invalid magic number: 0x1
pe: ./hello_pacific.exe
Parse error: ./hello_owatcom_dos.upx.exe => requested range [309100590..309100594) from object of len 6881
Parse error: ./hello_owatcom_dos4g.exe => Invalid magic number: 0x1
Parse error: ./hello_owatcom_windows.exe => Invalid magic number: 0x0
pe: ./hello_mingw32.exe
pe: ./hello_csharp_exe_itanium.exe
pe: ./hello_owatcom_win95.exe
Parse error: ./hello_owatcom_dos4g.upx.exe => Invalid magic number: 0x1
elf: ./hello_gcc.x86
unknown magic: ./hello_djgpp.upx.coff.exe
unknown magic: ./hello_owatcom_com.upx.com
unknown magic: ./hello_dev86.upx.com
unknown magic: ./hello_dev86.com
pe: ./hello_mingw64.exe
Parse error: ./hello_djgpp.exe => Invalid magic number: 0x0
PANICKED on hello_mingw32.upx.exe
Parse error: ./hello_owatcom_dos.exe => Invalid magic number: 0x20
PANICKED on hello_owatcom_win95.upx.exe
pe: ./hello_owatcom_nt.exe
Parse error: ./hello_owatcom_win386.exe => Invalid magic number: 0x0
Parse error: ./hello_djgpp.upx.exe => Invalid magic number: 0x0
Parse error: ./hello_owatcom_dos4gnz.exe => Invalid magic number: 0x1
PANICKED on hello_mingw64.upx.exe
Parse error: ./hello_owatcom_os2.exe => Invalid magic number: 0x0
pe: ./hello_csharp_exe_arm.exe
pe: ./hello_csharp_exe_x64.exe
pe: ./hello_csharp_exe_x86.exe
elf: ./hello_gcc.x86_64
PANICKED on hello_owatcom_nt.upx.exe
Parse error: ./hello_pacific.upx.exe => requested range [309100590..309100594) from object of len 4527

Here's a backtrace which appears to represent all of the panics:

ssokolow@monolith test_exes [rusty-core] % RUST_BACKTRACE=1 ./pe-test/target/debug/goblin-test hello_owatcom_nt.upx.exe
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /checkout/src/libcore/option.rs:329
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
             at /checkout/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:49
   1: std::sys_common::backtrace::_print
             at /checkout/src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at /checkout/src/libstd/sys_common/backtrace.rs:60
             at /checkout/src/libstd/panicking.rs:355
   3: std::panicking::default_hook
             at /checkout/src/libstd/panicking.rs:371
   4: std::panicking::rust_panic_with_hook
             at /checkout/src/libstd/panicking.rs:549
   5: std::panicking::begin_panic
             at /checkout/src/libstd/panicking.rs:511
   6: std::panicking::begin_panic_fmt
             at /checkout/src/libstd/panicking.rs:495
   7: rust_begin_unwind
             at /checkout/src/libstd/panicking.rs:471
   8: core::panicking::panic_fmt
             at /checkout/src/libcore/panicking.rs:69
   9: core::panicking::panic
             at /checkout/src/libcore/panicking.rs:49
  10: <core::option::Option<T>>::unwrap
             at /checkout/src/libcore/macros.rs:21
  11: goblin::pe::import::SyntheticImportDirectoryEntry::parse
             at /home/ssokolow/.cargo/registry/src/github.com-1ecc6299db9ec823/goblin-0.0.10/src/pe/import.rs:125
  12: goblin::pe::import::ImportData::parse
             at /home/ssokolow/.cargo/registry/src/github.com-1ecc6299db9ec823/goblin-0.0.10/src/pe/import.rs:158
  13: goblin::pe::PE::parse
             at /home/ssokolow/.cargo/registry/src/github.com-1ecc6299db9ec823/goblin-0.0.10/src/pe/mod.rs:80
  14: goblin::parse
             at /home/ssokolow/.cargo/registry/src/github.com-1ecc6299db9ec823/goblin-0.0.10/src/lib.rs:276
  15: goblin_test::run
             at ./pe-test/src/goblin.rs:17
  16: goblin_test::main
             at ./pe-test/src/goblin.rs:36
  17: __rust_maybe_catch_panic
             at /checkout/src/libpanic_unwind/lib.rs:98
  18: std::rt::lang_start
             at /checkout/src/libstd/panicking.rs:433
             at /checkout/src/libstd/panic.rs:361
             at /checkout/src/libstd/rt.rs:57
  19: main
  20: __libc_start_main
  21: <unknown>

While this renders it unsuitable for my project (the mere fact that Goblin is capable of dying at an unwrap (when the other PE parser I've tried so far simply used Result to indicate a parse failure) indicates that using it in my project would cause me more worry than simply writing my own MZ/NE/PE parser with Nom), I thought you'd want to know so you can fix the problem for others.

If you want to re-create my test binaries, the source materials are in the test_exes folder of ssokolow/game_launcher and build.sh contains instructions for the simplest, easiest way to install the requisite packages on a *buntu Linux 14.04 LTS machine like mine.

To reiterate what build.sh says, all compilers are optional, so producing just the binaries which caused panics here should only require apt-get install upx-ucl mingw-w64 and then downloading and unpacking OpenWatcom.

Goblin ergonomics

This deeply annoys me:

    let peek = goblin::peek(&mut fd)?;
    if let Hint::Unknown(magic) = peek {
        println!("unknown magic: {:#x}", magic)
    } else {
        let bytes = { let mut v = Vec::new(); fd.read_to_end(&mut v)?; v };
        match peek {
            Hint::Elf(_) => {

I think there's an architectural problem and an ergonomics problem here.

We don't want to read the entire binary/file if it doesn't even have proper magic; this is what peek is for
If we pass the peek test, we want to read the file in full, and pass these bytes into goblin, and receive the enum variant
We don't want to even think about the Unknown variant - we already peeked to make sure the magic is good! - everything else is just a parse error for that respective file format

So, what I want is my cake and eat it too:

I want the peek to ensure the magic is correct, and route to the correct binary parser, and return this result, without the Unknown variant, without temporarily allocations (or the full fd read is passed through).

@philipc @endeav0r you seem to be using goblin::Object as clients, does this bother you?

Anyone who happens to be watching/reading this, I'm open to proposals how to fix this make it nicer.

Afaics, being flexible w.r.t. the bytes + reading is going to be tricky; first thing that comes to my mind is some kind of closure style or an inout, like:

// has no Unknown variant, and is also totally me just randomly typing stuff
let object: Option<Result<Object>> = Object::parse_and_fill(fd, &mut bytes);

This would be a breaking change, but I think its important to get right sooner rather than later

Core dump notes parsing error

Hello,
I ran into a problem, while parsing notes in core dump of my test program. The error occurs from bad alignment. My alignment is equal to 0.

 match alignment {
                    4 => bytes.gread_with::<Nhdr32>(offset, ctx.le)?.into(),
                    // this is a guess; i haven't seen gcc/clang compilers emit 64-bit notes, and i don't have any non gcc/clang compilers
                    8 => bytes.gread_with::<Nhdr64>(offset, ctx.le)?.into(),
                    _ => return Err(error::Error::Malformed(format!("Notes has unimplemented alignment requirement: {:#x}", alignment)))
}

But readelf works perfectly fine and parses core dump correctly. I looked into source code of readelf and found this:

/* NB: Some note sections may have alignment value of 0 or 1.  gABI
     specifies that notes should be aligned to 4 bytes in 32-bit
     objects and to 8 bytes in 64-bit objects.  As a Linux extension,
     we also support 4 byte alignment in 64-bit objects.  If section
     alignment is less than 4, we treate alignment as 4 bytes.   */
  if (align < 4)
    align = 4;
  else if (align != 4 && align != 8)
    {
      warn (_("Corrupt note: alignment %ld, expecting 4 or 8\n"),
	    (long) align);
      return FALSE;
}

As I see, match should be like this:

match alignment {
                    0 ... 4 => bytes.gread_with::<Nhdr32>(offset, ctx.le)?.into(),
                    // this is a guess; i haven't seen gcc/clang compilers emit 64-bit notes, and i don't have any non gcc/clang compilers
                    8 => bytes.gread_with::<Nhdr64>(offset, ctx.le)?.into(),
                    _ => return Err(error::Error::Malformed(format!("Notes has unimplemented alignment requirement: {:#x}", alignment)))
}

Universal executable

Hello.

Is it possible to generate such a file that can be correctly parsed both with PE and ELF readers? Even if the executable would do nothing like int main() {}.

Support the WebAssembly object format (Disposition: No)

@m4b is this something you would be interested in having in goblin?

Error Management: switch to `Failure` ? (Disposition: No)

So I've been holding off on error libraries, but Failure genuinely looks exciting and cool.

I am ok with dynamic allocation, since the parser allocates already, and parsing binaries generally won't be in a hot loop, and if it is, it'll likely be dwarfed by io reads anyway.

This might also be an opportunity to provide better error messages because errors in goblin aren't so great. but this is mostly because of scroll error messages sucking pretty hard (but that's because it supports no-std).

Add 'alloc' feature

For gimli-rs/object#45, we need to be able to parse all the formats when using #[no_std]. Currently, a lot of the parsing requires allocation. Completely avoiding allocations is too much work for now, and probably not a good use of time. And anyway, we can still use the alloc crate with no_std (but this does require building with nightly currently).

So I propose we add an alloc feature that is midway between no_std and std. This feature will cover everything that uses allocations, and so mostly what will be left in std will be things that use std::fs or std::io.

I've done enough to verify that this approach works, so assuming this is acceptable, now I just need to clean it up and submit a PR.

One question I have is about the endian_fd feature. What exactly is this meant to cover? The readme says it 'parses according to the endianness in the binary', but it doesn't cover code such as
https://github.com/m4b/goblin/blob/master/src/elf/section_header.rs#L490. Currently it gates the entire mach/pe/archive formats, and it also requires 'std'. I want to relax this restriction as part of adding the alloc feature, but I'm not sure which parts of mach/pe/archive should still require it.

ELF: add relocation iterators

similar to symbol, add relocation iterators, for maximum laziness. This will give immediate perf results for clients reading large binaries, but not needing the relocations

Parsing PE with empty export data directory

Some well-formed PE may have its export data directory with an empty table of NumberOfFunctions (as well as NumberOfNames, AddressOfFunctions, etc.), for example the Export Data Directory of apisetschema.dll (a dll for API redirection from Windows 7) on my machine Windows7.SP1:

But the PE parser of goblin will refuse such a PE, that is because of some checks, for example when parsing the AddressOfNames:

let name_pointer_table_offset = &mut utils::find_offset_or(export_directory_table.name_pointer_rva as usize, sections, &format!("Cannot map export_directory_table.name_pointer_rva ({:#x}) into offset", export_directory_table.name_pointer_rva))?;

because export_directory_table.name_pointer_rva will be zero then the including function parse returns immediately with an Err(_).

The same with AddressOfNameOrdinals and AddressOfFunctions tables, a quick-and-dirty fix might be checking the value of export_directory_table.number_of_name_pointers and of export_directory_table.address_table_entries with 0 before parsing offsets of these tables:

let mut export_name_pointer_table: ExportNamePointerTable = Vec::with_capacity(number_of_name_pointers);
let mut export_ordinal_table: ExportOrdinalTable = Vec::with_capacity(number_of_name_pointers);

if number_of_name_pointers > 0 {
    let name_pointer_table_offset = &mut utils::find_offset_or(export_directory_table.name_pointer_rva as usize, sections, &format!("Cannot map export_directory_table.name_pointer_rva ({:#x}) into offset", export_directory_table.name_pointer_rva))?;
    for _ in 0..number_of_name_pointers {
        export_name_pointer_table.push(bytes.gread_with(name_pointer_table_offset, scroll::LE)?);
    }

    let export_ordinal_table_offset = &mut utils::find_offset_or(export_directory_table.ordinal_table_rva as usize, sections, &format!("Cannot map export_directory_table.ordinal_table_rva ({:#x}) into offset", export_directory_table.ordinal_table_rva))?;

    for _ in 0..number_of_name_pointers {
        export_ordinal_table.push(bytes.gread_with(export_ordinal_table_offset, scroll::LE)?);
    }
}

let mut export_address_table: ExportAddressTable = Vec::with_capacity(address_table_entries);
if address_table_entries > 0 {
    let export_address_table_offset = utils::find_offset_or(export_directory_table.export_address_table_rva as usize, sections, &format!("Cannot map export_directory_table.export_address_table_rva ({:#x}) into offset", export_directory_table.export_address_table_rva))?;
    let export_end = export_rva + size;
    let offset = &mut export_address_table_offset.clone();
    
    for _ in 0..address_table_entries {
        let rva: u32 = bytes.gread_with(offset, scroll::LE)?;
        if utils::is_in_range(rva as usize, export_rva, export_end) {
            export_address_table.push(ExportAddressTableEntry::ForwarderRVA(rva));
        } else {
            export_address_table.push(ExportAddressTableEntry::ExportRVA(rva));
        }
    }
}

EASY: remove all `scroll::Error` from goblin

Clients should never have to import, and impl error routes for scroll when they use goblin (unless of course they also use scroll, but that is orthogonal).

It is, in effect, in goblin an internal library.

Mach-O: improve entrypoint handling

Comparing goblin's implementation and my current working memory of dyld, I have two observations:

LC_MAIN and LC_UNIXTHREAD both provide entrypoint locations, but the environments provided by each are not interchangeable. (See dyld.cpp, dyldStartup.s.) Last time I needed to know the entrypoint of a Mach-O executable, I had to know which kind it was; struct MachO should probably retain this distinction.
All those registers in the LC_UNIXTHREAD thread state? You know, the arch-specific thread states which are not currently handled by goblin? Good news: they don't matter in the slightest. dyld uses only the instruction pointer, and the rest are entirely discarded. This makes them unusable in practice, and thus they're always zero.

add a zero-copy, safe, symtable for use in no std

It would pass out references to Syms, which are byte-casted from a backing &[u8]; and have a new -> Result api (which validates the bounds). We'd need two for elf32 and elf64 probably; or we can initialize as 32 or 64 bit (or pass a container context, which has less boolean blindness). This will be tricky and annoying I think due to type name punning stuff, so probably easiest to just add two typed versions and re-export it

Similar to strtab, I would also want it to implement Index, so it can be literally drop-in replaced in code that previously used a &[Sym].

Another approach could just provide a newtype wrapper on &[Sym] that validates the backing bytes and a count provided, and then Derefs to a &[Sym] so we get indexing for free.

Lots of options.

If we want to get fancy-pantsy, we might be able generify it to both ELF and Mach symbols, and have it return references via something like:

get::<Symbol>(index) -> Result<&Symbol>

but that might not be worth the effort.

Status of 32 bit ELF

Hey @m4b, I saw you added structure definitions for 32 bit ELF files. Is there any plan to implement Elf::from_fd for 32 bits, too? I only had a brief look at it, but it looks like you'd need to duplicate most of the code for the 64 ELF files.

add more tests

tests, more, add them.

Tracking issue for fuzzing

Many overflow issues should disappear once upgraded to latest scroll

With seed 4, I get another overflow at src/strtab.rs:37. (To reproduce, be sure to delete fuzz/corpus.)
Seed 10 gives index out of bounds at src/elf/mod.rs:205. (Corrupt sh_link.)

/cc @sanxiyn

Remove cow from `Strtab`

This will force Strtab to be zero-copy and also usable in no_std scenarios (which someone complained about wasn't possible).

This is easy, just deletion of code, and update definitions. But this is blocked on #14, #15, #16

Will close #12 , as it makes it no longer necessary.

Expose PDB70 signature as a UUID? (Disposition: Yes)

I wrote some code to read the PDB70 info out of a PE using goblin, to implement the equivalent of symstore.exe. I found that CodeviewPDB70DebugInfo::signature wasn't really that useful as raw bytes, since it's intended to be a GUID in little-endian byte order. I wound up pulling in byteorder and writing a little function like:

fn sig_to_uuid(sig: &[u8; 16]) -> Result<Uuid, Error> {
    let mut rdr = Cursor::new(sig);
    Ok(Uuid::from_fields(rdr.read_u32::<LittleEndian>()?,
                         rdr.read_u16::<LittleEndian>()?,
                         rdr.read_u16::<LittleEndian>()?,
                         &sig[8..])?)
}

...but it seems likely that anyone touching this data would need the same thing. Since you're already using scroll here it ought to be trivial to do this. I don't think the uuid crate is a particularly big dependency (and it's no-std by default, you have to enable the use_std feature explicitly).

rlib support

What about rlib support used by Rust itself. It's mostly an ar afiak and the ar crate supports Linux variant of it (no support for macOS version for some reason).

Switch all iterables and vectors lazy transducer

This will essentially make the entire crate zero-allocation, lazy, and parallelizable.

ELF needs section relocs to be lazy
Mach can replace almost everything with lazy versions
PE can probably replace all vectors (i.e., ordinal lists, etc.) with lazy versions

See lazy_transducer documentation for information if anyone wants to tackle this.

Interface to `dl_iterate_phdr`?

I've recently embedded C into a Rust project that needed the dl_iterate_phdr() interface.

From the OpenBSD manual:

SYNOPSIS
     #include <link.h>

     int
     dl_iterate_phdr(int (*callback)(struct dl_phdr_info *, size_t, void*),
         void *data);

DESCRIPTION
     The dl_iterate_phdr() function iterates over all shared objects loaded
     into a process's address space, calling callback for each shared object,
     passing it information about the object's program headers and the data
     argument.

The interface is somewhat portable, but there are slightly different semantics across platforms.

I would have written my interfacing code in Rust, but decided against it due to the need to define (and keep in sync) the required ELF structs.

Since goblin is fundamentally concerned with these structures, I wonder if goblin would be a good place to implement a rust interface to dl_iterate_phdr()?

Thanks

consider implementing a COW for the strtab

Currently it's tied to std and requires non-std using consumers to do without the strtab (redox, for example). Currently this isn't much of an issue for most people, but in the future will just be nice to have the strtab not tied to std (which theoretically it shouldn't need to be).

Nlist's str_x field has host-sensitive size

In goblin::mach::symbols::Nlist, the str_x field has type usize. Since this represents an external format, it should have a host-independent type.

For comparison, LLVM's corresponding data structure uses u32 for this field: https://github.com/llvm-mirror/llvm/blob/4604874612fa292ab4c49f96aedefdf8be1ff27e/include/llvm/BinaryFormat/MachO.h#L964

Internalize scroll::error::Error

When using goblin with error chains, if I want to match the error with things like elf.dynstrtab.get(), I need to handle the scroll::error::Error error. This would require me to also include the scroll crate.

It would be nice if this became a goblin error instead, so I don't need to explicitly include another crate to handle this error.

Nonstandard configurations need CI coverage

Nonstandard configurations needs CI coverage so people (like me) don't break it by accident.

make api runs:

cargo build --no-default-features
cargo build --no-default-features --features="std"
cargo build --no-default-features --features="elf32"
cargo build --no-default-features --features="elf32 elf64"
cargo build --no-default-features --features="elf32 elf64 std"
cargo build --no-default-features --features="elf32 elf64 endian_fd"
cargo build --no-default-features --features="archive"
cargo build --no-default-features --features="mach64"
cargo build --no-default-features --features="mach32"
cargo build --no-default-features --features="mach64 mach32"
cargo build --no-default-features --features="pe32"
cargo build --no-default-features --features="pe32 pe64"
cargo build

Is this the right list for Travis to build? Should Travis call make api, or should this get inlined into .travis.yml?

Should cargo test work in any configuration besides --default-features? It doesn't now, but it could be fixed and tested going forwards.

mach: Some loaded dylibs type aren't handled by parse()

load_command::CommandVariant::LoadWeakDylib isn't handled, which may result in a panic when retrieving imports.

Could not find `btree_map` in `alloc`

Hmm ok filing an issue since I don't quite understand the interaction here...
If I compile goblin 8bbbcf5 directly with cargo +nightly build --no-default-features --features alloc it works fine.

If I try to compile the object crate with the goblin dependency declared as follows:

[dependencies.goblin]
git = "https://github.com/m4b/goblin"
default-features = false
features = ["alloc", "endian_fd", "elf32", "elf64", "mach32", "mach64", "pe32", "pe64", "archive"]

Using cargo +nightly build --no-default-features or cargo +nightly build -v --no-default-features --features goblin/alloc, I get:

   Compiling goblin v0.0.16 (https://github.com/m4b/goblin#8bbbcf5d)
error[E0432]: unresolved import `alloc::btree_map`==========>          ] 10/12: goblin
  --> /Users/gz/.cargo/git/checkouts/goblin-d0c041c0a85ca4ca/8bbbcf5/src/archive/mod.rs:15:12
   |
15 | use alloc::btree_map::BTreeMap;
   |            ^^^^^^^^^ Could not find `btree_map` in `alloc`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0432`.
error: Could not compile `goblin`.

To learn more, run the command again with --verbose.

The following diff gz@a6369ac fixes the problem.

EASY: add logging to goblin

It's time to get Real Serious ™️ and add logging to goblin.

There are a number of places in goblin where things have "gone wrong", but not enough that we shouldn't parse.

Refactoring dyn to check the index of DT_NEEDED to fix a bug found while fuzzing, ref #27, we should continue parsing, but the client only receives a None, which is fine for clients, but we may also want to know why we received none, hence a warn! would be appropriate.

There are also many times when debugging, e.g., #28, I just need to see the execution state at the point of failure, which is precisely what debug! is for.

Steps

add extern log
add logs in various places, useful places. debug! for extremely verbose stuff, info maybe, and warn! where the binary is malformed some way, e.g., in first example.
add to PE
add to ELF
add to mach

zero-copy archive

see #14

This might be more tedious.

Also should revert the archive header name back to original byte array, to keep repr(C), which should allow for implementation to be derive(Pwrite, Pread)

needs `debug_struct`, etc., in Debug impls

E.g.:

        impl fmt::Debug for Rel {
            fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
                let sym = r_sym(self.r_info);
                let typ = r_type(self.r_info);
                write!(f,
                       "r_offset: {:x} r_typ: {} r_sym: {}",
                       self.r_offset,
                       typ,
                       sym
                )
            }
        }

should use a debug_struct so can pretty print, etc. It just looks crappy now :/

Parse ELF notes

greadelf can produce data like:

Displaying notes found in: .note.ABI-tag
  Owner                 Data size	Description
  GNU                  0x00000010	NT_GNU_ABI_TAG (ABI version tag)
    OS: Linux, ABI: 2.6.24

Displaying notes found in: .note.gnu.build-id
  Owner                 Data size	Description
  GNU                  0x00000014	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 42f22997b0796cdd2f49d3f3bd148081b8fe2845

Displaying notes found in: .note.gnu.gold-version
  Owner                 Data size	Description
  GNU                  0x00000009	NT_GNU_GOLD_VERSION (gold version)
    Version: gold 1.11

I want this data -- particularly NT_GNU_BUILD_ID -- for collating with external debugging data.

My understanding is that the linker (usually?) consolidates all the note sections into a single PT_NOTE segment, and that the segment remains parseable even if the section headers are stripped.

I think the PT_NOTE segment is a series of target-endian structs like:

struct Note<'a> {
  namesz: u32,
  descsz: u32,
  type: u32,

  name: &'a [u8], // NUL terminated string, where `namesz` includes the terminator
  // padding such that namesz + padding % 4 == 0
  desc: &'a [u8], // arbitrary data of length `descsz`
  // padding such that descsz + padding % 4 == 0
}

The meaning of type depends on name, meaning that if I want to determine that value, I need to find a note having both name == b"GNU\0" and type == NT_GNU_BUILD_ID == 3.

Allow zero-copy reading of native endian binaries

For the simple case, where you're reading a binary of native endianness, it would be nice if the API allowed doing this without copying memory. @nrc has a very simple elf parser that uses his zero crate to acheve this. The API winds up looking like:

pub fn parse_header<'a>(input: &'a [u8]) -> Header<'a>

Your current implementation is very close, but it does clone the resulting Header:

goblin/src/elf/mod.rs

Line 110 in 7d21ea4

header.clone()

It would be even nicer if the API returned an error for non-native endianness so consumers could fall back to from_fd_endian.

Fuzz Mach and PE with corpi

Tracking issue for fuzz related stuff.

We'll start using a corpi now. In particular, I'd like to see PE and Mach backends fuzzed extensively, I'm sure they have more bugs.

/cc @sanxiyn

PE: ImportLookupTableEntry::parse() panics

I'm getting panics on this line while trying to parse a particular executable. Unfortunately, this executable is proprietary so I can't share it, and I don't know enough about the PE format to understand what's going on here.

I did however write a script to run through the PE executables on my machine, which found that a random GDAL distribution I had laying around includes a curl.exe that causes the exact same panic. curl is something I can share, so steps to reproduce are:

$ wget -q https://s3.willglynn.com/goblin/curl.exe
$ RUST_BACKTRACE=1 cargo run --example rdr curl.exe 
    Finished dev [unoptimized + debuginfo] target(s) in 0.0 secs
     Running `target/debug/examples/rdr curl.exe`
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', src/libcore/option.rs:335
stack backtrace:
   0: std::sys::imp::backtrace::tracing::imp::unwind_backtrace
   1: std::panicking::default_hook::{{closure}}
   2: std::panicking::default_hook
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic
   5: std::panicking::begin_panic_fmt
   6: rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::panicking::panic
   9: <core::option::Option<T>>::unwrap
  10: goblin::pe::import::ImportLookupTableEntry::parse
  11: goblin::pe::import::SyntheticImportDirectoryEntry::parse
  12: goblin::pe::import::ImportData::parse
  13: goblin::pe::PE::parse
  14: goblin::parse
  15: rdr::run
  16: rdr::main
  17: __rust_maybe_catch_panic
  18: std::rt::lang_start
  19: main
$ git rev-parse --short HEAD
1595f19

Come to think of it, I bet my original executable statically links libcurl, so even though these executables are from totally different environments, that might be a common thread.

zero-copy pe

See #14

parse_self test assumes that crate is built in debug mode

While when we are building crate / running tests, we use release mode in Fedora.

Overflow

Do you care about overflow?

For example, you can trigger one by replacing 8 bytes from 0x20 (e_phoff) of ELF-64 binary with 0xff repeated 8 times.

$ RUST_BACKTRACE=1 cargo run --example rdr -- elf
thread 'main' panicked at 'attempt to add with overflow', /home/thomas/.cargo/registry/src/github.com-1ecc6299db9ec823/scroll-0.5.0/src/greater.rs:140
stack backtrace:
  10: <[u8] as scroll::greater::TryOffsetWith<Ctx>>::try_offset
             at /home/thomas/.cargo/registry/src/github.com-1ecc6299db9ec823/scroll-0.5.0/src/greater.rs:140
  11: scroll::greater::Gread::gread_with
             at /home/thomas/.cargo/registry/src/github.com-1ecc6299db9ec823/scroll-0.5.0/src/greater.rs:69
  12: goblin::elf::program_header::std::ProgramHeader::parse
             at src/elf/program_header.rs:142
  13: goblin::elf::impure::Elf::parse
             at src/elf/mod.rs:156
  14: goblin::parse
             at src/lib.rs:273

Support for "foreign" endianess

Goblin assumes that the integers inside the headers have the same byte order as the current platform. This obv. fails when reading ELF files meant for architectures with different endianess. Do you have any plans to add (or accept PRs) for byte order aware reading of headers?

zero-copy elf

Like incoming mach parser, add "zero-copy" implementation to elf.

E.g.:

https://github.com/m4b/goblin/blob/better_mach/src/mach/mod.rs#L21-L33

This will require removal of try_from api for taking owned fd and creating struct, as well as updating the Strtab's to use the lifetime of the Elf struct, and a couple more optimizations we can perform.

In addition, might be nice to add Exports, Imports, and relocations for lazy parsing, but that can be a future issue.

make size consts consistent

e.g., EHDR_SIZE -> SIZEOF_EHDR

Change debug prints to show "strings" as strings

Right now some of the debug prints are hard to read because they spit out the byte arrays as arrays instead of something more byte readable. In particular this affects things like section names.

goblin::mach::load_command::Segment::sections() returns sections with wrong lifetime

I'm trying to use goblin as the object file loader in the gimli crate. Basically all it needs to do is parse the ELF/Mach-O header and return the data for sections with a given name.

However, I'm having trouble getting Mach-O sections, because Segment::sections() returns sections that have the lifetime of the segment, instead of the data. That is, Segment::sections() is defined as:

impl<'a> Segment<'a>
    pub fn sections<'b>(&'b self) -> error::Result<Vec<Section<'b>>> {
        ...
    }
}

but I want:

impl<'a> Segment<'a>
    pub fn sections<'b>(&'b self) -> error::Result<Vec<Section<'a>>> {
        ...
    }
}

Fixing this will probably require changing the scroll::Gread trait, but I'm having too much trouble understanding how that works to be able to fix it myself.

For reference, here's how I'm trying to call it:

fn macho_get_section<'a>(macho: &mach::MachO<'a>, section_name: &str) -> Option<&'a [u8]> {
    let segment_name = "__DWARF";
    let section_name = macho_translate_section_name(section_name);

    for segment in &*macho.segments {
        if let Ok(name) = segment.name() {
            if name == segment_name {
                if let Ok(sections) = segment.sections() {
                    for section in sections {
                        if section_name == parse_section_name(&section.sectname[..]) {
                            return Some(section.data);
                        }
                    }
                }
            }
        }
    }
    None
}

Use opt-in features instead of opt-out

some tests are failing on BE machines (ppc64 / s390x)

---- iter_symbols stdout ----
	thread 'iter_symbols' panicked at 'called `Result::unwrap()` on an `Err` value: Malformed("LoadCommandHeader: LC_UNKNOWN size: 1207959552 has size larger than remainder of binary: 8464")', src/libcore/result.rs:906:4
note: Run with `RUST_BACKTRACE=1` for a backtrace.
---- parse_sections stdout ----
	thread 'parse_sections' panicked at 'called `Result::unwrap()` on an `Err` value: Malformed("LoadCommandHeader: LC_UNKNOWN size: 1207959552 has size larger than remainder of binary: 8464")', src/libcore/result.rs:906:4

archive: needs to parse osx/bsd random archives

It can parse them without dying but the filename delimiters are different (uses #), so it doesn't really parse them at all.

See https://en.wikipedia.org/wiki/Ar_(Unix)#BSD_variant