ros2 / ros2_tracing Goto Github PK
View Code? Open in Web Editor NEWTracing tools for ROS 2.
Home Page: https://docs.ros.org/en/rolling/
License: Apache License 2.0
Tracing tools for ROS 2.
Home Page: https://docs.ros.org/en/rolling/
License: Apache License 2.0
As first discussed in safe-ros/ros2_profiling#1
The idea would be to read raw CTF traces into some intermediate time-series data that is well suited for analysis tasks. Further high-level APIs could be built to ingest the intermediate data.
General requirements:
tracetools_read
is currently using Pandas Dataframes
Proposed alternatives:
CC: @iluetkeb
Investigate using babeltrace2
for ingesting trace files from disk: https://github.com/efficios/babeltrace
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/45)
Hello, I want to know if ros2_tracing support to trace rclpy? Lttng tools is too weak for python.
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/133)
Ubuntu 22.04 (for ROS 2 Humble in May 2022 and later) has LTTng-UST 2.13. Althought it's still backwards-compatible, macros and functions were renamed to add a LTTNG_UST_
/lttng_ust_
prefix, e.g.:
tracepoint()
-> lttng_ust_tracepoint()
TRACEPOINT_EVENT()
-> LTTNG_UST_TRACEPOINT_EVENT()
see:
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/142)
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/139)
I'm trying to run tracing for my ROS2 node. I was able to get the tracetools_analysis
working and I can get the callback durations. I would also like to get further detail about my node. Could you answer the following questions? Thanks!
ros2_tracing
support tracing my own non-ROS2 functions?ros2_tracing
in a tool like tracecompass
?tracecompass
only has a list of events and event densitySo far I've just used this launch file. Do I need to add tracepoints within my code or something like that?
import os
import launch
from ament_index_python.packages import get_package_share_directory
from launch_ros.actions import Node
from tracetools_launch.action import Trace
def generate_launch_description():
"""Generate launch description."""
return launch.LaunchDescription(
[
Trace(
session_name='my_session',
events_kernel=['sched_switch'],
),
Node(
package="my_package",
executable="my_executable",
name="some_name",
)
]
)
https://github.com/christophebedard/ros-world-2021-demo
https://gitlab.com/ros-tracing/ros2_tracing/-/raw/master/doc/2021-10-20_ROS_World_2021_-_Tracing_ROS_2_with_ros2_tracing.pdf
In tracetools_trace
:
15:13:39 tracetools_trace/tools/lttng_impl.py:18
15:13:39 /tmp/ws/src/ros2_tracing/tracetools_trace/tracetools_trace/tools/lttng_impl.py:18: DeprecationWarning: The distutils package is deprecated and slated for removal in Python 3.12. Use setuptools or check PEP 632 for potential alternatives
15:13:39 from distutils.version import StrictVersion
Looks like there are a couple options.
As mentioned in #40 (and in the README), we need to start tracing before the application launches to collect initialization information/metadata.
We current workaround is to start tracing before the application launches, then stop after 30-60 s, and then later on (when needed) start tracing again. This still requires us to actually set up tracing before the application launches.
One possible solution would be to store the metadata and re-emit it at regular intervals. Then we'd simply need to record one metadata "burst."
Since #31 (I think it's this one) this no longer builds on macos:
% colcon build --symlink-install --merge-install --cmake-args -DPython3_EXECUTABLE=`which python3` -DPYTHON_EXECUTABLE=`which python3` --mixin compile-commands debug --event-handlers console_direct+ console_stderr- --packages-select tracetools
Starting >>> tracetools
Not searching for unused variables given on the command line.
-- Found ament_cmake_ros: 0.11.2 (/Users/william/2011_ws/install/share/ament_cmake_ros/cmake)
-- Override CMake install command with custom implementation using symlinks instead of copying resources
-- Checking for module 'lttng-ust'
-- No package 'lttng-ust' found
CMake Error at /opt/homebrew/Cellar/cmake/3.23.2/share/cmake/Modules/FindPkgConfig.cmake:605 (message):
A required package was not found
Call Stack (most recent call first):
/opt/homebrew/Cellar/cmake/3.23.2/share/cmake/Modules/FindPkgConfig.cmake:827 (_pkg_check_modules_internal)
CMakeLists.txt:36 (pkg_check_modules)
-- Configuring incomplete, errors occurred!
See also "/Users/william/2011_ws/build/tracetools/CMakeFiles/CMakeOutput.log".
See also "/Users/william/2011_ws/build/tracetools/CMakeFiles/CMakeError.log".
Failed <<< tracetools [0.45s, exited with code 1]
Summary: 0 packages finished [1.09s]
1 package failed: tracetools
1 package had stderr output: tracetools
Reverting to e664e5b let me build cleanly (with some compiler warnings):
% colcon build --symlink-install --merge-install --cmake-args -DPython3_EXECUTABLE=`which python3` -DPYTHON_EXECUTABLE=`which python3` --mixin compile-commands debug --event-handlers console_direct+ console_stderr- --packages-select tracetools
Starting >>> tracetools
Not searching for unused variables given on the command line.
-- Found ament_cmake_ros: 0.11.2 (/Users/william/2011_ws/install/share/ament_cmake_ros/cmake)
-- Override CMake install command with custom implementation using symlinks instead of copying resources
-- Checking for module 'lttng-ust'
-- No package 'lttng-ust' found
-- Found ament_lint_auto: 0.14.0 (/Users/william/2011_ws/install/share/ament_lint_auto/cmake)
-- Found gtest sources under '/Users/william/2011_ws/install/src/gtest_vendor': C++ tests using 'Google Test' will be built
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Added test 'copyright' to check source files copyright and LICENSE
-- Added test 'cppcheck' to perform static code analysis on C / C++ code
-- Configured cppcheck include dirs:
-- Configured cppcheck exclude dirs and/or files:
-- Added test 'cpplint' to check C / C++ code against the Google style
-- Configured cpplint exclude dirs and/or files:
-- Added test 'lint_cmake' to check CMake code style
-- Added test 'uncrustify' to check C / C++ code style
-- Configured uncrustify additional arguments:
-- Added test 'xmllint' to check XML markup files
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/william/2011_ws/build/tracetools
[ 23%] Building CXX object gtest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
[ 23%] Building C object CMakeFiles/tracetools.dir/src/tracetools.c.o
[ 23%] Building CXX object CMakeFiles/tracetools.dir/src/utils.cpp.o
...
I know macos isn't a tiered platform anymore, but it would be nice to have it compile still. I briefly glance at the pr, but I wasn't able to see what should change quickly, so I decided to just open an issue for now.
Looking at the nightly builds, it looks like the new lttngpy
package fails to build on Windows Debug: https://ci.ros2.org/view/packaging/job/packaging_windows_debug/2070/
In particular, the end of the trace says:
--- output: lttngpy
Not searching for unused variables given on the command line.
-- Selecting Windows SDK version 10.0.19041.0 to target Windows 10.0.20348.
-- The C compiler identification is MSVC 19.29.30153.0
-- The CXX compiler identification is MSVC 19.29.30153.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/BuildTools/VC/Tools/MSVC/14.29.30133/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found ament_cmake: 2.3.2 (C:/ci/ws/install/share/ament_cmake/cmake)
-- Found Python3: C:/Python38/python.exe (found version "3.8.3") found components: Interpreter
-- Found Python3: C:/Python38/python.exe (found version "3.8.3") found components: Interpreter Development Development.Module Development.Embed
-- Found pybind11_vendor: 3.1.1 (C:/ci/ws/install/share/pybind11_vendor/cmake)
-- Performing Test HAS_MSVC_GL_LTCG
-- Performing Test HAS_MSVC_GL_LTCG - Success
-- Found pybind11: C:/ci/ws/install/opt/pybind11_vendor/include (found version "2.9.1")
-- Configuring done
-- Generating done
-- Build files have been written to: C:/ci/ws/build/lttngpy
Microsoft (R) Build Engine version 16.11.2+f32259642 for .NET Framework
Copyright (C) Microsoft Corporation. All rights reserved.
Checking Build System
Building Custom Rule C:/ci/ws/src/ros2/ros2_tracing/lttngpy/CMakeLists.txt
running egg_info
creating lttngpy.egg-info
writing lttngpy.egg-info\PKG-INFO
writing dependency_links to lttngpy.egg-info\dependency_links.txt
writing top-level names to lttngpy.egg-info\top_level.txt
writing manifest file 'lttngpy.egg-info\SOURCES.txt'
reading manifest file 'lttngpy.egg-info\SOURCES.txt'
writing manifest file 'lttngpy.egg-info\SOURCES.txt'
Building Custom Rule C:/ci/ws/src/ros2/ros2_tracing/lttngpy/CMakeLists.txt
Building Custom Rule C:/ci/ws/src/ros2/ros2_tracing/lttngpy/CMakeLists.txt
_lttngpy_pybind11.cpp
status.cpp
LINK : fatal error LNK1104: cannot open file 'python38.lib' [C:\ci\ws\build\lttngpy\_lttngpy_pybind11.vcxproj]
---
Failed <<< lttngpy [41.5s, exited with code 1]
@christophebedard can you take a look?
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/65)
For an implementation with babeltrace2, see safe-ros/ros2_profiling#25
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/134)
Here and in all callers:
ros2_tracing/tracetools_trace/tracetools_trace/tools/lttng_impl.py
Lines 100 to 103 in 0ecf505
Search for ":param context_names: DEPRECATED, use context_fields instead" and see d196944
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/129)
Related with this build https://ci.ros2.org/view/nightly/job/nightly_linux_clang_libcxx/1846/clang-tidy/new/folder.106785574/
must specify at least one argument for '...' parameter of variadic macro
According with the documentation the warning is expected.
Warning is generated here
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/138)
ros2 trace
creates a tracing session, then tries to enable userspace & kernel events and add context field types. If something in there fails, then the tracing session still exists. Having something linger on in case of a failure is annoying.
For example, when enabling a context type that isn't recognized:
$ ros2 trace -l -c 'unknown' -s test
UST tracing enabled (52 events)
...
kernel tracing disabled
context (1 names)
unknown
writing tracing session to: /home/$USER/.ros/tracing/test
press enter to start...
failed to find context type: unknown
$ lttng list
Available recording sessions:
1) test [inactive]
Trace output: /home/$USER/.ros/tracing/test
Use lttng list <session_name> for more details
The user must then destroy the tracing session:
$ lttng destroy test
Session test destroyed
The solution could be to simply destroy the tracing session if an exception is thrown.
When creating the tracing session, we do destroy it if it already exists (see _create_session()
), so not destroying the session will not cause any issues if the user tries running ros2 trace
with the same session name, but it's cleaner to clean up and destroy the session after a failure.
We should check if this fix also applies to the Trace
launch action; I think it should, since this is all in the setup()
function.
Finally, a test would be great here, e.g., set up a tracing session so that it fails and assert that the tracing session doesn't exist.
I just tried building Jazzy from source on Fedora 40. I'm getting the following error in tracetools.
--- stderr: tracetools In file included from /usr/include/lttng/tracepoint-event.h:69, from /src/ros2_jazzy/build/tracetools/include/tracetools/tp_call.h:514, from /src/ros2_jazzy/src/ros2/ros2_tracing/tracetools/src/tp_call.c:18: /src/ros2_jazzy/build/tracetools/include/tracetools/tp_call.h: In function ‘lttng_ust__event_probe__ros2___rclcpp_ring_buffer_enqueue’: /src/ros2_jazzy/build/tracetools/include/tracetools/tp_call.h:480:5: error: initialization of ‘const uint64_t *’ {aka ‘const long unsigned int *’} from ‘uint64_t’ {aka ‘long unsigned int’} makes pointer from integer without a cast [-Wint-conversion] 480 | ctf_integer(const uint64_t *, index, index_arg) | ^~~~~~~~~~~ gmake[2]: *** [CMakeFiles/tracetools.dir/build.make:104: CMakeFiles/tracetools.dir/src/tp_call.c.o] Error 1 gmake[1]: *** [CMakeFiles/Makefile2:163: CMakeFiles/tracetools.dir/all] Error 2 gmake[1]: *** Waiting for unfinished jobs.... gmake: *** [Makefile:146: all] Error 2
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/71)
ros2profile needs a few tracepoints which are currently located in the DDS implementations itself, e.g. here in cyclonedds: https://github.com/christophebedard/cyclonedds/blob/instrumentation-lttng-update/src/core/ddsc/src/tracing_lttng.h
The tracepoints in question are:
AFAIK none of these tracepoints made it to the master branches of the respective DDS implementation. However for commercial products like RTI ConnextDDS it is probably almost impossible to introduce these tracepoints in their code base.
Additionally the current approach imposes a massive maintainance burden.
I had a look at rmw_cyclonedds and rmw_fastrtps and found that it should be possible to add the respective tracepoints there as well. I had added tracepoints to rmw_connextdds before, too:
Please note that
Based on these findings I propose to:
Adding the source_timestamp to ros2:rmw_publish has the nice effect, that it would mirror ros2:rmw_take quite nicely then. In fact for ros2_profiling, if one only want's to analyse a system with lot's of nodes, the ros2:rmw_publish and ros2:rmw_take are needed for connexting different nodes; ros2:rcl_publish, ros2:rcl_take, ros2:rclcpp_publish and ros2:rclcpp_take (and dds:write and dds:read for that matter) would become optional in that case. In cases where the overhead of the tracing significantly impacts the system performance, this can be really valuable.
Migrating ros2_tracing and ros2_profiling to other OSs like QNX would also be much simpler with these changes applied, as all required tracepoints would be defined in a single place.
I created this issue in order to ask for some feedback. Is anyone seeing any impediments or issues with the proposed changes?
My team is developing a robotic platform that runs as a hybrid ROS 1 (Noetic) and ROS 2 (Foxy) system, and I have instrumented the ROS 2 side of our system with ros2_tracing
off of the foxy
branch. When everything runs properly, the outputs from ros2_tracing are great, and I've been able to do some insightful analysis from the data with help from the starter scripts in tracetools_analysis
.
One problem I've been running into is that approximately ~50% of the time the outputted metadata file is empty, ~25% of the time the outputted metadata file is half-filled, and ~25% of the time it seems fully populated. Since the metadata file has to be fully populated in order to analyze the trace, most of our traces have to be discarded.
I'm wondering if anyone has thoughts on what could be causing this? From the README, it states:
Note: tracing must be started before the application is launched.
Metadata is recorded during the initialization phase of the application.
This metadata is needed to understand the rest of the trace data, so if tracing is started after the application started executing, then the trace data might be unusable.
Since the Trace
launch action is added at the beginning of our system's main ROS 2 launch.py
launch file, I feel like we should be satisfying this requirement, but I might be wrong. I did notice that the metadata (generated from lttng?) file is populated with data when shutting down our system, rather than during the beginning/middle of operation, so perhaps everything is shutting down before lttng has an opportunity to output the data?
Any thoughts from others on this would be appreciated! Please let me know if you'd like any additional information from my side, and thanks for creating this great tool!
In tracetools_launch
:
tracetools_launch/actions/ld_preload.py:23:1: error: Module "launch" has no attribute "logging" [attr-defined]
tracetools_launch/action.py:28:1: error: Module "launch" has no attribute "logging" [attr-defined]
test/tracetools_launch/test_ld_preload_action.py:21:1: error: Module "launch" has no attribute "LaunchDescription" [attr-defined]
test/tracetools_launch/test_ld_preload_action.py:22:1: error: Module "launch" has no attribute "LaunchService" [attr-defined]
launch/example.launch.py:17:1: error: Module "launch" has no attribute "LaunchDescription" [attr-defined]
Found 5 errors in 4 files (checked 13 source files)
and tracetools_test
:
tracetools_test/case.py: note: In member "assertMatchingField" of class "TraceTestCase":
tracetools_test/case.py:318:36: error: Incompatible default for argument "matching_event_name" (default has type "None", argument has type "str") [assignment]
tracetools_test/case.py:318:36: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
tracetools_test/case.py:318:36: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
tracetools_test/case.py:319:35: error: Incompatible default for argument "events" (default has type "None", argument has type "List[Any]") [assignment]
tracetools_test/case.py:319:35: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
tracetools_test/case.py:319:35: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
tracetools_test/case.py: note: In member "get_events_with_name" of class "TraceTestCase":
tracetools_test/case.py:410:35: error: Incompatible default for argument "events" (default has type "None", argument has type "List[Any]") [assignment]
tracetools_test/case.py:410:35: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
tracetools_test/case.py:410:35: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
tracetools_test/case.py: note: In member "get_events_with_procname" of class "TraceTestCase":
tracetools_test/case.py:426:35: error: Incompatible default for argument "events" (default has type "None", argument has type "List[Any]") [assignment]
tracetools_test/case.py:426:35: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
tracetools_test/case.py:426:35: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
tracetools_test/case.py: note: In member "get_events_with_field_value" of class "TraceTestCase":
tracetools_test/case.py:446:35: error: Incompatible default for argument "events" (default has type "None", argument has type "List[Any]") [assignment]
tracetools_test/case.py:446:35: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
tracetools_test/case.py:446:35: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
tracetools_test/case.py: note: In member "get_events_with_field_not_value" of class "TraceTestCase":
tracetools_test/case.py:466:35: error: Incompatible default for argument "events" (default has type "None", argument has type "List[Any]") [assignment]
tracetools_test/case.py:466:35: note: PEP 484 prohibits implicit Optional. Accordingly, mypy has changed its default to no_implicit_optional=True
tracetools_test/case.py:466:35: note: Use https://github.com/hauntsaninja/no_implicit_optional to automatically upgrade your codebase
Found 6 errors in 1 file (checked 10 source files)
See: https://github.com/ros2/ros2_tracing/actions/runs/3529381626/jobs/5920365209#step:5:26473
When using the ros2 trace
command to configure LTTng, there's no warning if the trace directory (that the user is about to write to) already exists. LTTng itself simply appends the new trace data to the existing trace data, which can lead to confusing results: weird time jump in the middle of the combined trace, weird trace analysis results since two traces are unknowingly combined, etc.
It would be good if the ros2 trace
command could check if the trace directory already exists, and, if so, print a warning.
This new warning could also be added to the Trace
launch action.
We've only instrumented rclcpp
(and below), since the C++ client library is the obvious choice for applications for which performance is important. However, instrumenting rclpy
would still be valuable; it has been requested many many times.
Since part of the ROS 2 instrumentation is in rcl
and rmw
, we only need to port some of the instrumentation (i.e., what is in rclcpp
) to rclpy
, so this should not be a lot of work, especially if the rclcpp
and rclpy
are similar enough.
Some things to figure out:
rclcpp
over to rclpy
? In other words, does each rclcpp
tracepoint have a corresponding location in rclpy
? If not, what parts are the same, and what parts are different?ros2:rclcpp_*
to ros2:rclpy_*
?tracetools
: https://lttng.org/docs/v2.13/#doc-python-applicationOtherwise, the events in the initialization phase aren't recorded, which makes the rest of the trace data unusable. This isn't really obvious to people, e.g.: https://gitlab.com/ros-tracing/tracetools_analysis/-/issues/54#note_1195588968
This warning/note could perhaps go somewhere in this section: https://github.com/ros2/ros2_tracing#tracing
Currently, ros2 trace
cannot be used in scripts or in non-interactive environments. After running the ros2 trace
command, the user needs to press start to start tracing and then enter again to stop tracing. We could provide start
and stop
sub-commands to do that instead. Also, sub-commands like pause
and resume
could help with #44:
$ ros2 trace start ... # Same args as 'ros2 trace' except for a mandatory explicit session name, but starts tracing and returns immediately
# ... wait until initialization is done
$ ros2 trace pause <session_name> # Pauses tracing and returns immediately
# ... some time later
$ ros2 trace resume <session_name> # Starts tracing again and returns immediately
$ ros2 trace stop <session_name> # Stops tracing and returns immediately
Here is a comparison with the lttng
commands:
ros2 trace start ...
: equivalent to configuring the tracing session (session name, events, etc.) and lttng start
ros2 trace pause <session_name>
: equivalent to lttng stop
ros2 trace resume <session_name>
: equivalent to lttng start
ros2 trace stop <session_name>
: equivalent to [lttng stop &&] lttng destroy
We could offer a ros2 trace create ...
, which would configure the tracing session without starting to trace, i.e., ros2 trace start ...
without lttng start
. However, since there's currently no point in configuring the tracing session without also starting to trace, this wouldn't really be useful. We can always add it later on anyway.
The implementation wouldn't be too bad. We already have separate setup()
, start()
, stop()
, and destroy()
functions in tracetools_trace/tracetools_trace/tools/lttng_impl.py
.
Finally, note that lttng
's start
/stop
/destroy
don't require a session name argument, since the "current" session name is written to ~/.lttngrc
: https://lttng.org/man/7/lttng-concepts/v2.13/#doc-_current_recording_session. This isn't the case when tracing using the LTTng Python bindings, but we could re-implement this functionality (e.g., write the current session name to ~/.ros/tracing/.rc
).
Hi,
Using ros2_tracing, I am able to generate a ust file which provides me with all the trace calls while the system was running. However, is there a way to get these calls while the system is running in real-time?
Nightly GitHub CI sometimes hangs and fails. The failure always happens during tests for test_tracetools_launch
(note the timestamps):
2023-07-15T06:06:32.2792732Z Finished <<< test_ros2trace [22.1s]
2023-07-15T06:06:32.2801940Z Starting >>> test_tracetools_launch
2023-07-15T11:06:48.5788313Z ##[error]The operation was canceled.
test_tracetools_launch
only has one test file, test_trace_action.py
. There's a few tests in that file and the raw test logs don't include information about which specific test was running before hanging, so it's hard to tell which one hangs.
Things to do to investigate:
--event-handlers console_direct+
and --pytest-args -sv
with colcon test
to print test output to the console directly
LaunchService
with debug=True
:
--retest-until-fail N
(combined with --pytest-args ... -k 'test_trace_action'
) to see if it can be reproduced(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/136)
tracetools::get_symbol()
calls tracetools::detail::demangle_symbol()
, which calls abi::__cxa_demangle()
. The returned pointer should be freed by the caller, but it isn't.
tracetools::get_symbol()
is only called when the rclcpp_callback_register
tracepoint is called, so this only happens during initialization, and the size of the leaked memory is proportional to the number of callbacks (timers, subscriptions, services). This should still be fixed before ros2/ros2#1177, though.
Working on a fix.
This is a follow-up to #94.
#95 changed the tracing tests to be able to run them in parallel with normal tests and tracing tests. However, there was still an issue when test_ros2trace
and test_tracetools_launch
were running tests in parallel in ci.ros2.org, see: #95 (comment). This issue does not seem to happen locally. If test_ros2trace
and test_tracetools_launch
happen to not run tests simultaneously, then the tests usually pass, but that is mostly based on luck.
To temporarily work around it, we made test_ros2trace
depend on test_tracetools_launch
so that they wouldn't run tests in parallel, see: #95 (comment).
Now, we need to fix the issue and remove the workaround.
(Issue moved from GitLab, see: https://gitlab.com/ros-tracing/ros2_tracing/-/issues/131)
Parallel testing was enabled in ci.ros2.org in ros2/ci#723. Since then, many of the tracing-related tests in ros2_tracing
have been failing. For example, see the failed tests for March 6th here: osrf/buildfarm-tools#31 (comment).
There are 2 issues:
I changed the tracing test code to provide more information when a tracing test fails due to incorrect/unexpected trace data in #93. We can see the result in the March 6th nightlies, which confirms issue 1:
23:01:14 6: =================================== FAILURES ===================================
23:01:14 6: __________________________ TestLifecycleNode.test_all __________________________
23:01:14 6:
23:01:14 6: self = <test_lifecycle_node.TestLifecycleNode testMethod=test_all>
23:01:14 6:
23:01:14 6: def test_all(self):
23:01:14 6: # Check events as set
23:01:14 6: self.assertEventsSet(self._events_ros)
23:01:14 6:
23:01:14 6: # Check that the lifecycle node has an init event just like a normal node
23:01:14 6: rcl_node_init_events = self.get_events_with_name(tp.rcl_node_init)
23:01:14 6: lifecycle_node_matches = self.get_events_with_field_value(
23:01:14 6: 'node_name',
23:01:14 6: 'test_lifecycle_node',
23:01:14 6: rcl_node_init_events,
23:01:14 6: )
23:01:14 6: > self.assertNumEventsEqual(lifecycle_node_matches, 1)
23:01:14 6:
23:01:14 6: test/test_lifecycle_node.py:49:
23:01:14 6: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
23:01:14 6: ../../../../install/tracetools_test/lib/python3.10/site-packages/tracetools_test/case.py:327: in assertNumEventsEqual
23:01:14 6: self.assertEqual(len(events), expected_number, msg)
23:01:14 6: E AssertionError: 2 != 1 : unexpected number of events: [{'_name': 'ros2:rcl_node_init', '_timestamp': 1709708441732661082, 'vpid': 319439, 'rmw_handle': 94411761626944, 'vtid': 319439, 'namespace': '/', 'cpu_id': 1, 'node_handle': 94411760046688, 'procname': 'test_lifecycle_', 'node_name': 'test_lifecycle_node'}, {'_name': 'ros2:rcl_node_init', '_timestamp': 1709708447683372035, 'vpid': 321945, 'rmw_handle': 94716619878512, 'vtid': 321945, 'namespace': '/', 'cpu_id': 2, 'node_handle': 94716619735216, 'procname': 'simple_lifecycl', 'node_name': 'test_lifecycle_node'}]
The test expects 1 event but gets 2 events. We can see that the PIDs (vpid
) are different, and that the procname of the second event is simple_lifecycl
with a test_lifecycle_node
node name, which corresponds to this ros2lifecycle
test: https://github.com/ros2/ros2cli/blob/d0c8c35f7255f826b542f0d9283d9b706cb2b522/ros2lifecycle/test/test_cli.py#L137-L150.
We need to let other ROS 2 tests/applications run while the tracing test runs & traces its test application(s), so we need to either only trace the relevant applications or only consider tracing data from the relevant applications. I can think of a few solutions:
lttng enable-event -k event_name --filter '$ctx.procname == "string"'
procname
value, but there could be collisions, since the same test executables are used by many different tests and procname
is limited to 15 characterslttngpy
Python bindings, so we would need to add support for itprocname
of spawned processes through Node(exec_name='...')
and only consider events with that procname
procname
to correspond to the executable namelttng_ust_tracef
API, and have the test code filter out trace data from any process that hasn't recorded that unique ID
test_tracetools
package)test_tracetools
package, but not all tracing tests are done with tracetools_test
for test_tracetools
; I need probably need to make this filtering logic to other tests (mainly test_ros2trace
?)This will then allow us to remove the constraint that led to issue 2.
I am using Ubuntu 22.04 and ros2 humble.
I have finished installing ros2_traing by these commands:
sudo apt-get update
sudo apt-get install lttng-tools liblttng-ust-dev
sudo apt-get install python3-babeltrace python3-lttng
cd src/
git clone https://gitlab.com/ros-tracing/ros2_tracing.git
cd ../
colcon build --packages-up-to tracetools ros2trace
And the output in terminal of command ros2 run tracetools status
is correct:
Tracing enabled
But when I tried to build a simple rclcpp
project, I received an error message:
ylyt@ylyt-virtual-machine:~/ros2_ws$ colcon build --packages-up-to example_cpp --cmake-args -DTRACETOOLS_DISABLED=ON
Starting >>> example_cpp
--- stderr: example_cpp
CMake Warning:
Manually-specified variables were not used by the project:
TRACETOOLS_DISABLED
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/node_01.dir/build.make:151: node_01] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:137: CMakeFiles/node_01.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2
---
Failed <<< example_cpp [1.04s, exited with code 2]
Summary: 0 packages finished [1.24s]
1 package failed: example_cpp
The code of the cpp
file is as following:
#include "rclcpp/rclcpp.hpp"
int main(int argc, char **argv)
{
/* Initialize rclcpp */
rclcpp::init(argc, argv);
/*produce a node named node_01*/
auto node = std::make_shared<rclcpp::Node>("node_01");
// Print string
RCLCPP_INFO(node->get_logger(), "node_01 is started");
rclcpp::spin(node);
rclcpp::shutdown();
return 0;
}
Similarly, when I tried to build test_tracetools
, I got these errors:
ylyt@ylyt-virtual-machine:~/ros2_ws$ colcon build --packages-up-to test_tracetools
[0.440s] WARNING:colcon.colcon_core.package_selection:Some selected packages are already built in one or more underlay workspaces:
'tracetools' is in: /home/ylyt/ros2_ws/install/tracetools, /opt/ros/humble
If a package in a merged underlay workspace is overridden and it installs headers, then all packages in the overlay must sort their include directories by workspace order. Failure to do so may result in build failures or undefined behavior at run time.
If the overridden package is used by another package in any underlay, then the overriding package in the overlay must be API and ABI compatible or undefined behavior at run time may occur.
If you understand the risks and want to override a package anyways, add the following to the command line:
--allow-overriding tracetools
This may be promoted to an error in a future release of colcon-override-check.
Starting >>> tracetools_trace
Starting >>> tracetools_read
Starting >>> tracetools
Finished <<< tracetools [0.95s]
--- stderr: tracetools_read
/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
---
Finished <<< tracetools_read [1.34s]
--- stderr: tracetools_trace
/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
---
Finished <<< tracetools_trace [1.37s]
Starting >>> tracetools_launch
--- stderr: tracetools_launch
/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
---
Finished <<< tracetools_launch [0.90s]
Starting >>> tracetools_test
--- stderr: tracetools_test
/usr/lib/python3/dist-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
---
Finished <<< tracetools_test [0.91s]
Starting >>> test_tracetools
--- stderr: test_tracetools
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/test_timer.dir/build.make:151: test_timer] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:329: CMakeFiles/test_timer.dir/all] Error 2
gmake[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: /opt/ros/humble/lib/librclcpp_lifecycle.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/test_lifecycle_node.dir/build.make:161: test_lifecycle_node] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:225: CMakeFiles/test_lifecycle_node.dir/all] Error 2
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
collect2: error: ld returned 1 exit status
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/test_service_ping.dir/build.make:159: test_service_ping] Error 1
gmake[2]: *** [CMakeFiles/test_publisher.dir/build.make:159: test_publisher] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:355: CMakeFiles/test_service_ping.dir/all] Error 2
gmake[1]: *** [CMakeFiles/Makefile2:173: CMakeFiles/test_publisher.dir/all] Error 2
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/test_lifecycle_client.dir/build.make:159: test_lifecycle_client] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:251: CMakeFiles/test_lifecycle_client.dir/all] Error 2
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/test_ping.dir/build.make:159: test_ping] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:277: CMakeFiles/test_ping.dir/all] Error 2
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
/usr/bin/ld: /opt/ros/humble/lib/librclcpp.so: undefined reference to `tracetools::detail::get_symbol_funcptr(void*)'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/test_pong.dir/build.make:159: test_pong] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:303: CMakeFiles/test_pong.dir/all] Error 2
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/test_intra.dir/build.make:159: test_intra] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:199: CMakeFiles/test_intra.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2
---
Failed <<< test_tracetools [0.93s, exited with code 2]
Summary: 5 packages finished [4.32s]
1 package failed: test_tracetools
5 packages had stderr output: test_tracetools tracetools_launch tracetools_read tracetools_test tracetools_trace
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.