Git Product home page Git Product logo

Comments (17)

wjwwood avatar wjwwood commented on July 18, 2024

To be changed: ros2/ros1_bridge@b41b472#diff-d1066de058a30ec5a5edd4febc16451cR23

from rclcpp.

dirk-thomas avatar dirk-thomas commented on July 18, 2024

Also to be changed: ros2/ros1_bridge@587ec6e#diff-af3b638bc2a3e6c650974192a53c7291R98

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

I'm going to work on this, since my prototyping is now stalled on being able to build things into separate object files and then combine them later into a library or executable.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

Ok, I've been iterating on things in a simplified example in https://github.com/wjwwood/weak_linking.git.

I have one pr (wjwwood/weak_linking#3) which is (as near as I can tell) functionally the same as what we have now with the implementation moved into a cpp file from the header. It has Linux passing but OS X and Windows failing.

The crux of the issue is that the cpp files in rclcpp will use symbols declared in the headers in rmw, which are defined in an rmw implementation, but the implementation is neither available when building librclcpp nor could we pick which one to use if they were. On Linux this is ok, because the default behavior when linking is to ignore missing symbols. This means that the librclcpp library is incomplete, because it references symbols which have not been resolved yet and instead will try to at runtime. The default on OS X is to require all symbols be resolved at link time by way of providing the libraries which define them as link arguments. So by default this fails with OS X, however, this can be changed.

The next pr (wjwwood/weak_linking#2) tells the OS X linker to handle undefined symbols by looking them up dynamically at runtime. This allows Linux and OS X to pass, but Windows is still lagging behind.

I've tried a lot of things to get this to work on Windows. I'll try to summarize them:

  • Use the /FORCE option for link.exe.
    • This allows librclcpp to link with "warnings" about the missing symbols and for the user's executables to compile and link.
    • When the user executable is run though I get an access violation when trying to call a function from the rmw implementation library.
    • Looking at the executable with Dependency Walker it is linked against both librclcpp and librmw_impl{1 or 2}.
    • I also tried swapping the library link orders.
    • To get this to link against the implementation library I have to also pass /INCLUDE:foo to the link arguments for the user's executable.
    • From https://msdn.microsoft.com/en-us/library/2s3hwbhs.aspx: "This feature is useful for including a library object that otherwise would not be linked to the program."
  • Use the /DELAYLOAD option.
    • I couldn't get this to work, but it could possibly work if we produced a generic librmw_impl which was a copy of what ever specific implementation we wanted to use, since we have to have a dll name to delay loading.
  • I also played with the incremental linking options, but I think that's a dead-end too.
  • I considered true "weak linking" (not dynamic symbol lookup which is what the other pr's are actually doing).

So, as usual Windows is the stick in the mud.


Unless we figure this problem out on Windows, we'll have to restructure how we build the executables and how/when we select the rmw implementation to be used.

One extreme action is that we could restructure our code so that rclcpp depends on rmw_implemenation, which directly depends on all of the default implementations, but only passes one along. Which one it passes along is controlled by a CMake variable and/or by disabling packages which provide alternative implementations. That way librclcpp will be built after the rmw implementation library and can be linked against it.

Pros:

  • Works around the need for dynamic symbol lookup, allowing us to make rclcpp a library.
  • Simpler dependency layout and easier to understand.

Cons:

  • Moves us farther away from ABI compatible switching of implementations (choosing an implementation at runtime rather than build time).
  • Removes the ability to have user code generate binaries for each implementation.
  • Removes the ability to easily automate cross vendor testing.

While we're still testing different DDS and RTPS vendors, the lost features are sort of a deal breaker for right now. So I don't think this is a good solution, but in the long term, when we've settled on one implementation and most people use one at a time, this might be ok.


A slightly different solution might be to make rclcpp depend on rmw_implementations, but instead of picking an implementation before building librclpp, build a version of librclcpp for each implementation and link one of them against the user's executable, similar to how our message package libraries are currently linked.

Pros:

  • Should let us make rclcpp into a library.

Cons:

  • Moves us further away from ABI swappable implementations since we'll have to rely on a symbol in the user's code to help the linker pick the right rclcpp library (same as for the message packages).
  • Generates extra libraries, which increases clutter and build times. (though having rclcpp as a library may decrease the overall build time)

Another idea I have is to use template functions in rclcpp as an indirection to calling the rmw_* functions. This would delay the use of any rmw_* functions until someone uses the rclcpp headers. However, this would not work for intermediate libraries which use rclcpp but are created before an implementation has been chosen. It would also add a lot of clutter to the rclcpp code base with many functions that only call other functions. It might also require us to invent duplicate types to avoid using symbols which are in the rmw implementation, but I'm not sure about that. This is my least baked idea.

from rclcpp.

dirk-thomas avatar dirk-thomas commented on July 18, 2024

Can you please add a list of the symbols which are not available and currently need to be linked weak.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

In ros2? All rmw_* symbols which would be used in the cpp file of librclcpp. I don't see the point in making a list.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

To be more specific, it is any symbol used in a cpp file of librclcpp which is declared in a header or is forward declared, but is defined in the implementation specific rmw library or message library.

Off the top of my head rmw_init, rmw_create_node, and rmw_destroy_node were in the list, since the constructor of rclcpp::node::Node was the first thing I tried to put in the rclcpp library. But there are many, which is why I made the simpler example setup in my git repo.

from rclcpp.

dirk-thomas avatar dirk-thomas commented on July 18, 2024

Let's assume we would follow our idea originally from the mindmup to create a dynamic rmw implementation which loads a specific rmw implementation at runtime using dlopen. Then all rmw_* symbols would be available at link time of the user land code while the specific rmw implementation will be selected at runtime e.g. based on an environment variable.

I assume there will still be other symbols in this case which are problematic. I think we should consider those first since they seem to be the stronger constraint.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

Let's assume we would follow our idea originally from the mindmup to create a dynamic rmw implementation which loads a specific rmw implementation at runtime using dlopen.

Well there are pros and cons to that approach as well, which I think we should consider before assuming that is the way to go. For example, this strategy basically excludes the possibility of a statically linked executable. It also adds a lot of complexity to our code, which is probably why we didn't start out doing it that way. I'm not saying that we shouldn't do it, but it's not clear to me that it's the obvious choice.

I assume there will still be other symbols in this case which are problematic. I think we should consider those first since they seem to be the stronger constraint.

It's hard to know which symbols will be the problems until we do the dlopen implementation. The only ones I can think of off the top of my head are the type support symbols which are used when ever rclcpp uses a message like ParameterEvent or IntraProcessMessage.

from rclcpp.

dirk-thomas avatar dirk-thomas commented on July 18, 2024

I am not suggesting the dlopen approach as the only way to move forward. But I expect the remaining symbols to be even more difficult to address and think we should investigate them first (to know what further constraints exist). It should be easy to figure out this list by just linking against an empty rmw implementation.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

Rather than make a fake library for rmw, I just tested out having rclcpp depend on rmw_implementation and use the default implementation to build librclcpp. These are the additional missing symbols:

Undefined symbols for architecture x86_64:
  "rosidl_message_type_support_t const* rosidl_generator_cpp::get_message_type_support_handle<rcl_interfaces::msg::ParameterEvent_<std::__1::allocator<void> > >()", referenced from:
      std::__1::shared_ptr<rclcpp::publisher::Publisher> rclcpp::node::Node::create_publisher<rcl_interfaces::msg::ParameterEvent_<std::__1::allocator<void> > >(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, rmw_qos_profile_t const&) in node.cpp.o
  "rosidl_message_type_support_t const* rosidl_generator_cpp::get_message_type_support_handle<rcl_interfaces::msg::IntraProcessMessage_<std::__1::allocator<void> > >()", referenced from:
      ___cxx_global_var_init6 in node.cpp.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [librclcpp.dylib] Error 1
make[1]: *** [CMakeFiles/rclcpp.dir/all] Error 2
make: *** [all] Error 2

Which are symbols generated by message packages for the two messages currently used in rclcpp.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

I should clarify that these are the only missing symbols so far. I haven't converted all of rclcpp into a library yet. I'll continue working towards that looking for more issues.

from rclcpp.

dirk-thomas avatar dirk-thomas commented on July 18, 2024

That also explains why my class_loader / pluginlib example branch was working before. At that time there was neither intra process comm nor parameter events.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

Summarizing our meeting.


We have two main use cases:

  • Single node as an executable.
    • This node executable chooses an rmw implementation at link time, by linking to exactly one rmw implementation.
    • It should an option to statically link everything into this executable (at least on Linux and OS X).
  • Single node as a shared library, loaded dynamically into a container with other nodes.
    • This node shared library does not pick an rmw implementation at link time, and instead the container executable picks the rmw implementation.
    • The container executable would be an example of the single node as an executable.

The single node as an executable is what we do now. The decision of what rmw implementation to use is deferred until you link the executable. This works now because we don't have any intermediate libraries, i.e. there are no libraries between the rmw interface and the user's code, there are only headers. In order to turn rclcpp into a library, or in order to build intermediate libraries, we need to be able to build an "incomplete" shared library with unresolved symbols (rmw symbols) which are resolved when the user's executable is built and linked against an rmw implementation. This has to be balanced with the ability to build statically linked binaries (to support building images for embedded devices), but is probably only needed on Linux and maybe OS X.

The intermediate libraries, like librclcpp, are in the same category as the second use case since a node shared library is just another intermediate library. Both will be useless without being later linked against an rmw implementation and both will defer that until runtime by ignoring unresolved symbols at link time. The difference between them is that the node shared library is additionally designed to be loaded dynamically at run time. The missing symbols will be resolved by linking the program which dynamically loads the node against an rmw implementation. That way the missing rmw symbols are implicitly resolved since the process which loads the node shared library has already selected an implementation.

Other alternatives and design related decisions that we discussed in the meeting:

  • We considered having rmw implementations specific intermediate libraries (similar to how we can create one executable for each implementation), but we decided that required too much duplication and CMake infrastructure.
  • We considered simply not supporting multi vendor support in one workspace on Windows (since it lacks the linker functionality we use to accomplish this on OS X and Linux), but we'll save that as a last resort since cross-vendor testing is still useful on Windows.
  • We need to test out building a static library.
    • For this we'll likely need to be able to get rmw implementation specific versions of the message libraries through CMake. (currently they pass all implementation libraries and the linker picks the right one based on the rmw implementation library which is linked)

First we need to prototype the two possible solutions on Windows:

  • Implement an rmw implementation which is empty, but at runtime dynamically loads one of the rmw implementations (a la dlopen) and redirects library calls to the dynamically loaded library.
    • Link this against rclcpp and other intermediate libraries to make Windows happy.
    • This could be useful on Linux and OS X too.
    • Need to link against the correct message libraries, a similar shim maybe required for each message package too.
  • Implement a solution where each rmw implementation generates a library named librmw_implementation.dll, which get installed to name spaced folders, instead of each of them creating a name spaced library, e.g. librmw_opensplice_cpp.
    • Intermediate libraries would be linked against an empty copy of librmw_implementation and at runtime the actual implementation would be picked by swapping out the version of librmw_implementation or by adjusting the PATH so that the right dll is found.
    • This either requires a change in how we distribute and run our executables or it will mean that Windows will be asymmetric to our Linux and OS X installation layout.

After we prototype those to options and make sure they're possible, we'll have to pick one and move forward.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

@codebot pointed me to http://alain.frisch.fr/flexdll.html which looks promising. I'm still working on a prototype that uses this to accomplish what we want on Windows.

from rclcpp.

wjwwood avatar wjwwood commented on July 18, 2024

Ok, I've been working a couple different options since my last comment. I've also brought in @gerkey and @codebot to help figure out what the right path forward is.


First, the flexdll path had some issues. I think the technical solution might work, but integrating it into the CMake linking step and convincing msbuild to do something other than link.exe is proving to be very difficult and messy. Also @gerkey is concerned about relying on an alternative linker implementation (flexdll is an alternative to link.exe, implemented in OCaml), and I tend to agree with him since I ran into one issue already with using it in combination with VS2015 and I'm not convinced I could fix a blocking problem if it came up. Basically it's not something we want to depend on nor something that we want to be in the position of maintaining/fixing ourselves in the future.

We also will avoid doing the solution where we link against a truly empty implementation on Windows and change which implementation gets loaded using the PATH. We did this because it would make the workflow and output different on Windows from *nix platforms.


So we talked about other options and came up with three paths forward:

  • Dynamically Loading rmw (and rosidl_typesupport) Implementations

I mentioned this before in this thread (#48 (comment)). In this solution we would make a new rmw implementation and a corresponding rosidl_typesupport implementation which only passes function calls through to a dynamically loaded rmw and a corresponding rosidl_typesupport implementation. This would mean a new rmw shared library and a new shared library for each message package. We could enable them only on Windows and on Windows we would take these are additional link libraries in places where we would not normally link an rmw implementation, like in rclcpp or a user's node shared library.

The trick is communicating what rmw and rosidl_typesupport libraries to load to this dynamic shim implementation. This is where the most risk is at and where the most work would be required. We could pass this information in an extern string, which is defined in an rmw implementation library. Or we could convey it in environment variables or some other mechanism.

The end result would be that it looks and behaves very similar to everything on Linux/OS X and doesn't move us away from the goal of swapping implementations at runtime (it might actually get us a little closer). The down side is that it will take the most time and presents to most risk moving forward.

  • Restructure dependencies and generate copies of everything for each vendor

In this strategy we would restructure our dependencies (on all platforms) so that the decision of which implementation is used is never deferred, i.e. all rmw implementations are built before anything that uses rmw is built. Therefore anytime an executable or library needs to be linked against an rmw implementation, the executable or library is forked into a version for each rmw implementation. For example, rclcpp would generate a shared library for each vendor (opensplice, connext, etc) and provide infrastructure to downstream projects so they can get the right library. Similarly user's nodes would generate a shared library for each implementation as well. We can keep the cmake interfaces clean by providing libraries for the default implementation through the "normal" find_package call.

The idea behind this is that we only need to generate binaries for multiple implementations for software on which we want to do cross-vendor testing. Doing cross-vendor testing of higher level software would require that software and all the software it depends on to check for and generate multiple binaries, which means higher level software would probably not do it.

This approach is basically the brute force approach and resolves the problem but at a higher overhead. It also moves us further away from the goal of ABI switchable middleware's on Linux and OS X, where our current approach works.

  • Work around the issue on Windows

In this approach, we leave Linux and OS X as is (with any fixes to get it working, e.g. -undefined dynamic linker flag on OS X) and just work around the issue on Windows. This means we'll abandon (for now) cross vendor testing in a single workspace on Windows.

This would be accomplished by having rclcpp (and/or any other packages which use rmw headers directly) also find and use rmw_implementation (or something like it but without the direct dependencies on the default rmw implementations), but only do this on Windows. rclcpp (and/or others) will then get the default rmw implementation, link against it to resolve its needed symbols, and pass it along as a transitive dependency.

For this to work we need the default rmw implementation to be built before rclcpp, so we can either enforce that by depending on rmw_implementation directly or implicitly. The downside to having the direct dependency is that it affects the build order even on Linux and OS X. I don't think this is a problem, but we could also say that on Windows you have to ensure that the rmw implementation you're building for is built before everything else, either with workspace chaining or using the build tool's selective building options. This way the dependency is implicit, and the benefit is that it really doesn't affect Linux or OS X at all, but the downside is that building Windows from source becomes more tricky because you need to make sure you build the rmw implementation first. That's easy to do on the farm, but harder to remember when doing it manually.

Finally we would make rmw_implementation fail if more than one rmw implementation option is available. That way packages which can build a binary for each implementation don't try to do so.


So after talking with the group (@gereky, @codebot, @tfoote, @jacquelinekay, and @esteve) on the whiteboard we decided to take the work around on Windows for now so that we can unblock other tasks waiting for rclcpp/rcl as a library and make our imminent release deadline.

Later we can continue to work towards finding a way to bring Windows back to cross vendor testing, with either one of the other options. We just concluded that both would take more time than we have at the moment.

I'll also point out that before deciding to work around this linking issue I spent a few days to get class_loader, it's tests, and a more complete "integration" test working on Windows to ensure this problem would not impact our ability to plugins. It does not and I'm almost done fixing up class_loader so it can work on all the platforms in ros/class_loader#27.


So next for me is to:

  • Finish ros/class_loader#27
  • Refactor cmake login in rmw_implementation into a separate package (rmw_implementation_cmake?).
  • Restructure dependencies such that rmw_implementation depends on rmw_implementation_cmake and the default rmw_* implementation packages.
  • Add a dependency on rmw_implementation_cmake from rclcpp
    • If we decided to enforce the dependency on the implementations, add a dependency on rmw_implementation from rclcpp.
  • In rclcpp (everything is conditioned on if (WIN32):
    • Find rmw_implementation_cmake, and use it to find the default rmw implementation.
    • Find the default rmw implementation.
    • Link against the default implementation.
    • Pass link flags for default implementation along transitively for downstream libraries and executables.
  • Convert rclcpp into a library and test.

Open questions:

  • Do we add the explicit dependency from rclcpp to rmw_implementation (all of the rmw implementations)?
  • Should put the logic of finding and passing along the default implementation within rmw instead of rclcpp?

from rclcpp.

dirk-thomas avatar dirk-thomas commented on July 18, 2024

I understand why the easy path was chosen but I disagree with the reasoning behind it. This is not about adding a little bit more of technical debt which we can easily pick up later (likely very much later). This is affecting a core part of our software architecture. Imo delaying it will inflict a significantly higher cost later.

Looking at our Windows CI job which has an ever increasing number of failing tests (more then 70 by now) I think it is the wrong direction to cut off cross-vendor testing on Windows.

If the majority thinks we should proceed with this path I can only highly recommend to schedule a work item for a sustainable solution on the roadmap for one of the very next alphas.

from rclcpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.