Git Product home page Git Product logo

Comments (5)

hedayat avatar hedayat commented on June 18, 2024

Thanks! I'll try to go through your post one by one.

BTW, about docs, as you are both interested in using such a tool & improving it, you are talking about both. But it'd be much better to address these separately; so that developer docs/discussion is separate from user docs. And I should confess that I've tried much more to provide a good API for the end user, than to clean up the internal implementation (and so, naming of the internal stuff might be worse than that of user API).

If we're messing with object files and linker flags, do we still need
-Wl,--wrap? Couldn't some fiddling with --defsym and/or
--redefine-sym achieve the same thing, but with simpler code? I
haven't thought this through yet, though.

Well, currently the only object file which is modified (or --defsym is used for) is the one(s) with WRAP_FUNCTION...() calls, usually just wrap.cpp. No other files are touched. And I really prefer to limit its use as much as possible. Ideally, I prefer to do most things in C++ as much as possible.

Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym; so that we don't bother with object files at this step even more.

... weak symbols..

About weak symbols, well we are discussion in issue #2, but briefly: I don't like it to be used to replace --wrap, but if we can come up with a clean solution to use it for functions in the same TU, it can be looked into seriously.

I don't really like that the bind_fakes_* tool that is run during
compilation actually links against the main lib (corelib) code, since
that might cause application/lib code to be ran during build (i.e.
global constructors). I guess this is needed to allow including
wrap.cpp in the link, but maybe:
...

Hmm... it was certainly needed at some point, and I never thought that it might be a problem. Yeah, it might be possible to fix. I'll try to look into it soon, and if it seems possible, either fix it or create an issue for it. I prefer to not try to inspect it externally; and I hope it is possible.

Terminology is a bit messy here and there. Terms like "Fake",
"Wrapper", "Wrapper object", "Prototype" seem to be used slightly
inconsistent and not entirely self-explanatory (but maybe that's just
my lack of understanding). I believe some improvement is possibly
here, though I don't understand things well enough to propose
specific changes yet.

Yeah, that is very likely. Specially since the code has been gone through a number of changes which affected the naming. But I suggest to discuss user facing API and internal API (most of which is in internal namespace). User API is much more important, and I hope it looks good. But if you have any suggestions, it'd be great.
Internal API, well you are probably right. I'll try to read it again with a fresh eye, but I certainly like to see how someone else thinks and if there are suggestions.

from powerfake.

matthijskooijman avatar matthijskooijman commented on June 18, 2024

But it'd be much better to address these separately; so that developer docs/discussion is separate from user docs. And I should confess that I've tried much more to provide a good API for the end user, than to clean up the internal implementation (and so, naming of the internal stuff might be worse than that of user API).

Yeah, this is a good point, but I'm still in the "trying to understand it all" phase, where anything helps :-p

Well, currently the only object file which is modified (or --defsym is used for) is the one(s) with WRAP_FUNCTION...() calls, usually just wrap.cpp. No other files are touched. And I really prefer to limit its use as much as possible. Ideally, I prefer to do most things in C++ as much as possible.

Good point, I think I'd agree there (except that maybe, to allow same-source-file-calls, all object files need to be modified as suggested in #2, but it would be good if that stays a separate and optional step).

Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym; so that we don't bother with object files at this step even more.

Oh, that's an interesting thought indeed. So you'd compile wrap.cpp once to gather info, write out a .h file with the needed naming and then compile it again to actually declare the needed stuff. I like that thought.

I prefer to not try to inspect it externally; and I hope it is possible.

Why not? To me, doing things external means you could put things in an external, generic tool, and never have to actually run any "user" code as part of the build process (not even just the wrap.cpp, which could be ok, though).

(PS You're missing some empty lines after quotes in your previous message, so some of your replies are shown as quotes now)

from powerfake.

matthijskooijman avatar matthijskooijman commented on June 18, 2024

Thinking on this for just a little more, I wonder about the generation of names. I suspect that the central challenge here is to convert a name and/or reference to some C++ function into the actual linker filename that it will be using, right?

  • Ideally, you'd just pass a fullly-qualified name (e.g. SampleClass::CallThis to e.g. WRAP_FUNCTION() and then convert that to __wrap__ZNK8FakeTest11SampleClass8CallThisEv so you can define the wrapper function by that name.
  • Maybe this conversion can be implemented using C++ templates and constexpr functions, but you need to name your function using this result, which AFAIK cannot be done in C++. So, you'll need to build the name of this wrapper using the preprocessor, which pretty certainly will not be able to replicate the mangling needed.
  • So, instead, you generate an arbitrary but unique name, store the original qualified name as a string, and execute the generated code to access the original name so you can generate the mangled name and use objcopy or --defsym to rename the symbol to the right name.

Looking at the code, I see that it also inspects the original (corelib) object files for the names in there. Is this needed because it is impossible to fully predict the mangled names based on the original names? So you need to (somewhat fuzzily) cross-match against the actual names to figure out what actual names have been chosen?

If so, then maybe it would be sufficient to just inspect the wrap.o after all, since that actually also contains the referenced symbols as undefined references:

 nm sample/libwrap_lib.a |grep CallThisEi
                 U __real__ZN8FakeTest12SampleClass28CallThisEi
0000000000000000 W __wrap__ZN8FakeTest12SampleClass28CallThisEi
                 U _ZN8FakeTest12SampleClass28CallThisEi

(note that this is post-renaming, so __real_ and __wrap_ are already renamed, but the original U symbol was already present before the renaming).

Or is there something else (other than symbol names) that you're using from the corelib object files?

Note that this approach might somewhat complicate my original suggestion of not linking against the corelib for the command-to-be-executed at build time, since the above only works when it does reference corelib, which then also (I think) requires linking against corelib. So then you'd get three versions of the wrap.o file:

  1. With references to corelib, but with temporary names. Intended to be inspected and modified.
  2. Without references to corelib, still with temporory names. Intended to be ran.
  3. With references to corelib but with the final names. This is 1. with modifications, or 1. recompiled with the header-with-names you suggested.

Using an external tool to inspect wrap.o, rather than executing it, could maybe remove the need for 2., though. Or maybe some making these weak undefined references (I think that's a thing) could allow linking the command-to-be-executed without actually linking to corelib, so 1. can be used instead of 2.? Or maybe some "ignore undefined references" linker option for that command-to-be-executed only (though that's likely to mask other, serious, linker errors).

Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym;

I wonder if, rather than recompiling wrap.cpp with a generated header, you could somehow add another .o file into the link that handles the renaming. I've read things about "indirect symbols", which seem to be essentially symbols pointing to other symbols, that might serve this purpose, but I'm not sure how to generate these yet (maybe adding a linker script as one of the "objects" to link could work, though that is essentially what --defsym also does, I think, or maybe something with gcc's alias attribute, though that probably requires knowing the exact type of the alias).

If you happen to remember why --defsym didn't always work, that might be useful input here as well :-p

from powerfake.

hedayat avatar hedayat commented on June 18, 2024

Thinking on this for just a little more, I wonder about the generation of names. I suspect that the central challenge here is to convert a name and/or reference to some C++ function into the actual linker filename that it will be using, right?

* Ideally, you'd just pass a fullly-qualified name (e.g. `SampleClass::CallThis` to e.g. `WRAP_FUNCTION()` and then convert that to `__wrap__ZNK8FakeTest11SampleClass8CallThisEv` so you can define the wrapper function by that name.

* Maybe this conversion can be implemented using C++ templates and constexpr functions, but you need to _name_ your function using this result, which AFAIK cannot be done in C++. So, you'll need to build the _name_ of this wrapper using the preprocessor, which pretty certainly will not be able to replicate the mangling needed.

* So, instead, you generate an arbitrary but unique name, store the original qualified name as a string, and execute the generated code to access the original name so you can generate the mangled name and use `objcopy` or `--defsym` to rename the symbol to the right name.

Yes, that's right (Sorry for no documentation :( ). If there were a method to generate mangled names at compile time, all things would be much simpler. The whole bind_fakes process is to match mangled names of functions to actual functions and provide their names to wrap.o. Or if ld supported C++ directly so we didn't need mangled names...

Looking at the code, I see that it also inspects the original (corelib) object files for the names in there. Is this needed because it is impossible to fully predict the mangled names based on the original names? So you need to (somewhat fuzzily) cross-match against the actual names to figure out what actual names have been chosen?

If so, then maybe it would be sufficient to just inspect the wrap.o after all, since that actually also contains the referenced symbols as undefined references:
...

Or is there something else (other than symbol names) that you're using from the corelib object files?

Well, my poor documentation again :( even at the commit level. I should dig to see if I've said something in commit messages, but what I remember is that while wrap.o includes most of the symbols, sometimes it doesn't include all of them. I'm pretty sure that I've tried inspecting just wrap.o before. I must document even my failed attempts :(

Maybe it was needed for mingw. I've taken some steps to support mingw too, but it not complete IIRC! For example, the --leading-underscore option is specifically for how mingw mangles function names. I'm not sure this part is related to mingw though. Well, probably it should be tested again. I've a number of projects in which I've used powerfake, so I'll try them soon and see if I can see when wrap.o doesn't provide needed symbols.

Note that this approach might somewhat complicate my original suggestion of not linking against the corelib for the command-to-be-executed at build time, since the above only works when it does reference corelib, which then also (I think) requires linking against corelib. So then you'd get three versions of the wrap.o file:

1. With references to corelib, but with temporary names. Intended to be inspected and modified.

2. Without references to corelib, still with temporory names. Intended to be ran.

3. With references to corelib but with the final names. This is 1. with modifications, or 1. recompiled with the header-with-names you suggested.

Using an external tool to inspect wrap.o, rather than executing it, could maybe remove the need for 2., though. Or maybe some making these weak undefined references (I think that's a thing) could allow linking the command-to-be-executed without actually linking to corelib, so 1. can be used instead of 2.? Or maybe some "ignore undefined references" linker option for that command-to-be-executed only (though that's likely to mask other, serious, linker errors).

I can't comment on this right now. Should spend some time for it. I'm thinking it might be possible for wrap.cpp to not actually reference functions in other libraries when compiled for bind_fakes (Needs two separate compilation of wrap.cpp, but I think that is OK). But Im not sure if a completely generic binary is possible. Because we should match temporary names with actual names and mangled names. Should look into it, maybe we can find some solution. BTW, I'm mainly thinking about not linking with corelib, rather than being generic; although a generic tool might be better specially if it doesn't increase dependency on external tools very much.

Actually, one of the things I'm thinking to look into is to see if I can generate a header file which includes final names rather than using --redefine-sym/--defsym;

I wonder if, rather than recompiling wrap.cpp with a generated header, you could somehow add another .o file into the link that handles the renaming. I've read things about "indirect symbols", which seem to be essentially symbols pointing to other symbols, that might serve this purpose, but I'm not sure how to generate these yet (maybe adding a linker script as one of the "objects" to link could work, though that is essentially what --defsym also does, I think, or maybe something with gcc's alias attribute, though that probably requires knowing the exact type of the alias).

If you happen to remember why --defsym didn't always work, that might be useful input here as well :-p

That'd be great too. Yeah, I also thought about link script, but when --defsym didn't work, I also get unhappy. (Well, I also remember experimenting with link scripts... but not documented as expected :D )
About --defsym, well I don't know why it didn't work! What I don't remember is when it didn't work. I'll try to experiment again with it and document the results for future :)

from powerfake.

hedayat avatar hedayat commented on June 18, 2024

In the latest version, it is now possible to build bind_fakes as an generic & independent tool when using "standalone mode". However, it is still experimental, and in some situations it might need users help to be able to match symbols with actual functions, since the name available at compile time doesn't always match the demangled name of a type.

from powerfake.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.