Git Product home page Git Product logo

powerfake's People

Contributors

hedayat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

crackercat

powerfake's Issues

Need help with linker error

Hi,
I am having this error when bind_fakes, couldn't figure out why...

bind_fakes(test_libcli_helper applib wrap_lib)

[ 67%] Generating test_libcli_helper.powerfake.link_flags, test_libcli_helper.powerfake.link_script
objcopy: --redefine-sym: Symbol "__wrap_cli_loop" is target of more than one redefinition
Exception: Running objcopy failed

Can anyone help me with this?

Regards,

Bandu

How do I fake this method?

Hi,
How do i fake this method?
void cli_print(struct cli_def *cli, const char *format, ...);

throw error when i do like this...
auto offk = PowerFake::MakeFake<void (struct cli_def *cli, const char *format, ...)>(cli_print,
{ cout << "Fake called for overloaded(float)" << endl; }
);

error message:
In instantiation of ‘PowerFake::FakePtr PowerFake::MakeFake(Signature*, Functor) [with Signature = void(cli_def*, const char*, ...); Functor = C_A_T_C_H_T_E_S_T_6()::<lambda()>; PowerFake::FakePtr = std::unique_ptrPowerFake::internal::FakeBase]’:
test_libcli_helper.cpp:59:5: required from here
/usr/local/include/powerfake/powerfake.h:1097:17: error: invalid use of incomplete type ‘struct PowerFake::internal::func_cv_processor<void ()(cli_def, const char*, ...)>’
1097 | return std::make_unique<internal::Fake>(
| ~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from test_libcli_helper.cpp:6:
/usr/local/include/powerfake/powerfake.h:281:8: note: declaration of ‘struct PowerFake::internal::func_cv_processor<void ()(cli_def, const char*, ...)>’
281 | struct func_cv_processor;
| ^~~~~~~~~~~~~~~~~
In file included from test_libcli_helper.cpp:6:
/usr/local/include/powerfake/powerfake.h:1098:55: error: invalid use of incomplete type ‘struct PowerFake::internal::func_cv_processor<void ()(cli_def, const char*, ...)>’
1098 | internal::Wrapper::WrapperObject(func_ptr), f);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~

Thank you so much.

Bandu

Wrapping calls in the same compilation unit

A limitation of the --wrap method, is that it cannot intercept calls to a function inside the same compilation unit as where the original function is defined. This can be solved by moving to-be-intercepted functions into their own .cpp files, but that hurts the readability of the original code, requires changes to the original code when adding wrapped functions in the testsuite, and makes it a bit more fragile to breaking the wrapping when the original code is changed. So I'd like to see this limitation lifted. I've been digging into the ld sources to find a way around this, and this issue presents a couple of options for this. Below is a more generic writeup for handling wrapping, not limited to the PowerFake usage necessarily.

How ld does linking

In a gross oversimplification of the tremendously complex linking process, here's what ld does when it links an executable. I've mostly looked at linking elf object files into elf executables (specifically elf64-x86-64), but I think most of this will be similar for other targets as well (and the elf target is what we're using in almost all cases anyway).

Resolving symbols

  • For each object file in turn, each global (exported) symbol in the file's symbol table is considered (elf_x86_64_relocate_section()) and entered into a global (in-memory) symbol table (that starts out empty). Note that the symbol table also contains undefined symbol entries, for symbols that are referenced but not defined.
  • This starts by looking up any existing symbol by the same name in the global symbol table (_bfd_elf_merge_symbol()). If the name is not in the global table yet, it is simply added. If it is already there, the existing symbol and the new symbol are merged (_bfd_elf_merge_symbol() and _bfd_generic_link_add_one_symbol()). This merging handles this like a strong symbol replacing a weak symbol (or a weak symbol being discarded when there is already a strong symbol), or a strong symbol replacing a previously undefined symbol, raising an error when trying to merge two strong symbols, etc.
  • In any case, the resulting symbol is associated with the symbol table entry in the current file. This resulting symbol may be the same symbol (typical for strong symbols, or weak symbols that are not defined yet), but it can also point to a different symbol (typical for an undefined symbol entry which was already defined in a previous object file).
  • For example, consider processing a UNDEF foo() entry first (from a file that references but does not define foo()). This creates a new UNDEF foo() entry in the global symbol table, which is associated with the entry in the current file's symbol table. Then, in another file that actually defines foo(), a DEF foo() entry is processed. This looks in global table, finds the existing UNDEF entry, and merges it with the new DEF entry by overwriting the existing UNDEF entry with the new DEF entry. This also causes the (UNDEF) entry in the first file, to be associated with this new, merged, DEF symbol, so it can be found later.

Resolving relocations

  • When you write a function call in the code, the compiler generates a dummy instruction, e.g. call 0x0. It then also leaves an instruction for the linker, saying "Put the address of the global symbol foo at this address (i.e. after the call)". This instruction is called a relocation.
  • In the relocation, "the address of the global symbol foo" is not so explicit. Instead, a relocation refers to an entry in the file's symbol table.
  • If the compilation unit does not define foo() itself, the symbol table contains an UNDEF foo() entry. After resolving symbols, as described above, the relocation is resolved by looking at the associated file symbol table entry, which has an associated global symbol table entry, which is now a DEF foo() entry that tells the linker where foo() is actually defined.
  • If the compilation unit does define foo() itself, the symbol table contains a DEF foo() entry. In most cases, the associated global symbol table entry will be this same DEF foo() entry, so the linker resolves relocations for foo() to the definition in this file itself. There can be exceptions, e.g. when foo() is weakly defined, then the actual foo() to be used by the relocation can still be in a different file.

Implementing --wrap

To implement the --wrap option, the linker changes the second step in the resolving symbols procedure. When it "looks up any existing symbol by the same name in the global symbol table", and the name to look up was passed to --wrap, it actually does a lookup for __wrap_<name> instead. Similarly, when it has to look up __real_<name>, it looks up <name> instead. Simple, but quite powerful.

Except that it can only do this for UNDEF entries in the file's symbol table. Consider what would happen otherwise: There is a DEF foo() entry in one file. The global table lookup uses __wrap_foo instead, and finds an existing entry DEF __wrap_foo(), which is the wrapper to be used. Trying to merge these two entries will fail, since both are strong definitions. If you would instead discard the DEF foo() (as if it were weak) and associate the DEF __wrap_foo() entry with the file symbol table entry, then relocations (calls to foo()) in the same compilation unit would correctly resolve to __wrap_foo(). However, you would have discarded the original foo() entry, so you can no longer access it through __real__foo().

I guess the linker could have also (in addition to looking up __wrap_foo() and associating the result with the current entry) put the original DEF foo() entry in the global symbol table, without associating that entry with any symbol table entry in the current file (but putting it out there to be associated with other entries in other files later), but maybe this didn't seem relevant, or maybe this has a ton of unexpected side effects (the linker is horrendously complex after all, I'm just showing the supersimplified version of it here).

Diverting local calls

This analysis does suggest two possible ways you can still divert a call to a locally defined function to some other function:

  • Make the locally defined function weak, so it can be overridden by something else (without using --wrap)
  • Make the relocations point to an UNDEF foo() symbol that can be wrapped, and have a second DEF foo() symbol with the actual definition. This is essentially what you do when you move the definition of foo() into its own source file, but I think this could be done inside a single .o file as well.

In addition, Greg Carter shows that you can also use e.g. -Wl,--defsym,foo=__wrap_foo as a linker option to forcibly replace the foo function with the wrapper. I haven't full investigated how this works in the linker internals, but I believe that this approach loses access to the original symbol (unless you duplicate it under another name by modifying the object files, as suggested below), so I haven't investigated this option much further.

How to wrap local calls

So, how can you then actually achieve --wrap for local calls? The above suggests some ingredients, below I'll mix those into a couple of different (but similar) approaches.

A downside of all below approaches is that they require inspecting and modifying the object files in the build. A solution where you would not need to inspect the object files at all (or maybe just the object files that contain the wrappers, to get a list of wrapped functions) and handle everything by just adding compiler and/or linker flags would be ideal, but I haven't been able to figure out a way to allow this.

1. Weakening symbols, without --wrap

  • Define the original symbols as weak, or use objcopy --weaken-symbol to do so after compilation (to prevent having to modify the original source files).
  • Define the wrapper using the same name as the original, so it will replace the original.
  • Do not use --wrap anymore.
  • This is essentially what Peter Huewe proposes here, except they also suggest using --globalize-symbol to ensure the symbol is global, but I think that's only needed for non-exported (i.e. globals with the static keyword) symbols, and making those global might end up creating conflicts that were not previously present, so this must be done with care).
  • Con: This looses access to the original function, so you cannot really "wrap" an existing symbol, only replace it.
  • Con: Must know which symbol is the original and which is the wrapper (to not modify the wrapper).
  • Con: If the original function has weak versions, the processed .o files can no longer be used to link the original executable, without adding the wrappers (since now all versions of the original name are weak, so a different version might be chosen than before making the one strong version weak).

2. Weakening symbols and adding a __real_ version, without --wrap

  • Like above: Use objcopy --weaken-symbol to mark the original as weak
  • Use objcopy --add-symbol to add a new symbol called __real_<name> with the address (i.e. pointing to the same bit of compiled code). This new symbol should ideally be a perfect copy (same section, size, visibility, flags, weakness, etc.), and the copy should be made in all compilation units that have the function (except where the wrapper is defined), so that if the existing symbol already has multiple copies (e.g. a weak and strong version), it will still resolve as without these changes.
  • Like above: Define the wrapper using the same name as the original, so it will replace the original.
  • Call __real_<name> from the wrapper to access the original.
  • This is essentially what Javier Escalada proposes here, except they do not make a perfect copy of the symbol.
  • Con: Wrapper is named the same as original, which can be confusing.
  • Con: Build process must know where wrappers are defined (to not modify the wrapper).
  • Con: Unsure if objcopy can create a proper identical copy, so might require building a custom command or script to parse and modify the elf file.
  • Con: If the original function has weak versions, the processed .o files can no longer be used to link the original executable, without adding the wrappers (since now all versions of the original name are weak, so a different version might be chosen than before making the one strong version weak).

3. Splitting symbols, with --wrap

  • For each definition of the original symbol, add an identical copy under the same name, and modify the original one to be an UNDEF instead. This result in two symbols by the same name, where relocations point to the UNDEF one and the DEF one points to the implementation, allowing to use --wrap as normal.
  • Con: It does not seem objcopy can do this, so this probably requires building a custom command or script to parse and modify the elf file.
  • Con: Having a duplicate name in the symbol table of an .o file might confuse tools? The base specification for the ELF format does not discuss uniqueness of names in the symbol table at all, it seems.
  • Pro: Wrapper is named differently, making it potentially clearer.
  • Pro: Build process can treat all object files equally, since wrappers can be distinguished from the original by their name.

4. Reimplementing --wrap

  • For each definition of the original symbol, add an identical copy named __real_<name> and replace the original entry with an UNDEF __wrap_<name> (so that existing relocations now point to the wrapper).
  • For undefined reference to the original symbol, also replace that with an UNDEF __wrap_<name> entry.
  • Con: It does not seem objcopy can do this, so this probably requires building a custom command or script to parse and modify the elf file.
  • Pro: Wrapper is named differently, making it potentially clearer.
  • Pro: Build process can treat all object files equally, since wrappers can be distinguished from the original by their name.
  • Pro: No duplicate names in the .o file symbol table.
  • Pro: No dependency on the linker's --wrap implementation.
  • Con: The processed .o files can no longer be used to link the original executable, without adding the wrappers.

I'm inclined to further investigate the last option (if we need to modify .o files with custom tooling anyway, might as well do the entire wrap thing ourselves, which might also simplify things because we no longer need to comply with the linker's requirements on __wrap_ naming). However, the fact that these files are no longer usable as part of a regular build is a bit annoying, and might make option 3 more suitable (if it works, I haven't tried it yet A quick manual edit using https://elfy.io suggests this indeed works). Option 1 is not feasible, since it does not allow calling the original function, and option 2 feels a bit fragile when it comes to existing weak functions.

Documentation & improvements

Hey @hedayat, nice codebase you made here :-)

I'm looking to use Powerfake in a project of mine, it looks like an ideal way to do testing (or in my case, also simulation of code that normally runs on a microcontroller) without requiring invasive changes in the original codebase.

Since the project lacks documentation, I dove in a bit to figure out how things work, which could maybe lead to a contribution of documentation. However, while digging, I found some things that could maybe be improved too. I've listed them below, along with my notes about how things work (which are more notes than proper documentation, but might be useful for others already).

Improvement ideas

  • If we're messing with object files and linker flags, do we still need
    -Wl,--wrap? Couldn't some fiddling with --defsym and/or
    --redefine-sym achieve the same thing, but with simpler code? I
    haven't thought this through yet, though.

  • Currently, function calls in the same compilation unit as the
    definition are not wrapped. It seems that using objcopy --globalize-symbol and --weaken-symbol can be used to work around
    this limitation: https://stackoverflow.com/a/46103766/740048 (though
    this seems to use weak symbols instead of --wrap, so is essentially
    the same as the previous point). To also keep access to the __real_
    symbol, it might need to be manually renamed:
    https://stackoverflow.com/a/60604263/740048. All this is nice for
    simple examples, but might end up a lot more complicated on a
    real-world project, especially if the original codebase also uses
    weak functions (which, I think, might be handled more gracefully
    using --wrap).

  • I don't really like that the bind_fakes_* tool that is run during
    compilation actually links against the main lib (corelib) code, since
    that might cause application/lib code to be ran during build (i.e.
    global constructors). I guess this is needed to allow including
    wrap.cpp in the link, but maybe:

    • wrap.cpp can be compiled twice, once where it only uses the name
      and type of the wrapped function, without also storing its
      address, so it can be linked into bind_fakes_* without also
      having to link corelib?
    • wrap.o can be inspected externally using e.g. objdump or nm,
      rather than being included in the link? This would mean that the
      "generate linker flags" binary could be a more generic tool (not
      tied to a particular project), which could also clean up the build
      process and make it a lot easier to embed into another project.
      This might need some additional symbols to be generated (i.e.
      global string variables instead of / in addition to the current
      arguments to the Wrapper constructor) to make it easier to
      inspect the object file.
  • Terminology is a bit messy here and there. Terms like "Fake",
    "Wrapper", "Wrapper object", "Prototype" seem to be used slightly
    inconsistent and not entirely self-explanatory (but maybe that's just
    my lack of understanding). I believe some improvement is possibly
    here, though I don't understand things well enough to propose
    specific changes yet.

Notes for documentation

A project using powerfake consists of three parts:

  1. The original code. This remains standalone, without a dependency on
    the other parts.

    In the sample, this is functions.cpp and SampleClass.cpp. In the
    sample, this is called corelib, in the bind_fakes cmake
    function, this is called test_lib.

  2. The wrapper code. This part defines which functions must be wrapped
    and defines appropriate wrapper objects. This depends on the
    original code.

    In the sample, this is wrap.cpp. In the sample, this is called
    wrap_lib, in the bind_fakes cmake function, this is called
    wrapper_funcs_lib.

  3. The testing code. This defines the actual tests, and uses the
    wrapper code to selectively replace functions in the original code
    when needed. This depends on both other parts.

    In the sample, this is faked.cpp. In the
    sample, this is called samples, in the bind_fakes cmake
    function, this is called target_name.

Each of these three parts are initially compiled independently.

Build process

The build produces these:

  • libpowerfake.a (contains powerfake.cpp)
  • libpw_bindfakes.a (contains powerfake.cpp, NMSymbolReader.cpp, SymbolAliasMap.cpp, Reader.cpp and bind_fakes.cpp, with -DBIND_FAKES)
  • libcorelib.a (contains functions.cpp, SampleClass.cpp)
  • libwrap_lib.a (contains wrap.cpp)
  • bind_fakes_samples (contains dummy.cpp, libpw_bindfakes.a, libwrap_lib.a (twice?), libcorelib.a, libpowerfake.a (needed?))
  • powerfake.link_flags, produced by running bind_faked_samples libcorelib.a libwrap_lib.a
  • samples (contains faked.cpp, libwrap_lib.a, libcorelib.a,
    libpowerfake.a, linked using powerfake.link_flags).

bind_fakes_samples:

  • Reads all symbol names from libcorelib.a
    • This either uses nm -po libcorelib.a, or with --passive reads
      from the passed files directly (which should be nm output rather
      than .a files then).
    • With --leading-underscore, leading _ are stripped from all
      symbol names.
    • This just returns the names unprocessed, so these are the mangled
      names.
  • For each real symbol, it finds a corresponding wrapper. This
    demangles the symbol names from nm and does some more processing on
    them, and then matches them to the names passed to the e.g.
    WRAP_FUNCTION macros (also after some processing). These latter
    names are taken from the map kept by the WrapperBase class, which
    is available since libwrap and corelib are both linked into this
    executable.
  • For each wrapped symbol, it writes a -Wl,--wrap=... option with the
    (mangled) name of the original symbol to powerfake.link_flags.
  • Reads all symbol names from libwrap_lib.a
  • For each real (should be undefined) symbol in there, it renames it to
    __real_... to match what ld expects. Similarly for each wrapper
    (should be defined) symbol in there. This either uses objcopy --redefine-symbol to actually modify the .o file, or adds
    -Wl,--defsym to powerfake.link_flags to let the linker do this at
    link time.

C++ Code

WRAP_FUNCTION(...) defines a "wrapper object" and (when !BIND_FAKES)
a "wrapper function" too (though it seems wrap.cpp is always compiled
without -DBIND_FAKES, so the #indef BIND_FAKES in powerfake.h is
probably useless).

The "wrapper object" is an instance of PowerFake::internal::Wrapper,
named PowerFakeWrap_alias_<lineno> (first part should be modified when
using different files).

The type of the function is wrapped in a PowerFake::internal::FuncType
for some reason.

This object essentially just (optionally) stores a function pointer to
relace a particular function. The PowerFake::internal::WrapperBase
additionally keeps a registry of all instantiated Wrapper objects, which
can be looked up based on type and address, which is used by MakeFake to
return a Fake object for the right Wrapper.

WrapperBase also has a "WrappedFunctions" prototype registry, but that
seems to be only used (and filled) when BIND_FAKES (though it is
always declared, for no real reason it seems).

The "wrapper function" is a function to replace the original function
using -Wl,--wrap. It basically just looks up the "wrapper object", if
it is callable / set, calls that, otherwise calls the original function.
This uses a temporary name for the wrapper function and real function,
since the __wrap_ and __real_ prefixes must be applied to the
mangled names, not the original names, so this happens in a
post-processing step later. Also, this wrapper function is declared
inside a class template with two versions of which only one is
instantiated, seemingly to support both regular function and member
functions.

Static wrapping in addition to dynamic wrapping

One thing I think might be interesting to support is static wrapping. Right now, only "dynamic" wrapping is supported: You declare some function as wrapped, and you can replace it at runtime. With "static wrapping", I mean to replace some function with some wrapper function unconditionally (provided the object containing the wrapping is included in the link and the needed linker arguments are given, of course).

I can imagine cases where, when running unit tests, you always want to replace some functions since they'll never do something meaningful in a test environment. You can do this by just replacing them at the top of main, but doing it implicitly allows a more declarative syntax for this.

Another case for this could be when you want to use Powerfake in a lighter mode, where you just need to replace some functions. You do not need to full runtime support, but do want to use Powerfake's machinery for figuring out the right wrapper filenames and linker commandline options.

One challenge here is to support actual wrapping: How does the replacement function know the name to use for the original function? I can imagine passing a pointer to it as an extra argument to the wrapper function, or maybe passing the name to the macro that sets this up (so then the user decides the name, and Powerfake must make sure it's set up as an alias of some sort to the real function).

This is not anything critical, but something that seemed like an interesting idea to share and not forget :-)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.