Comments (32)
No, I didn't rebuild Python and all of that (which that document suggests). I fired up the test program from the first comment, and after about 30 seconds of running, I'm now up to 3g resident memory (confirmed using top
). I killed it at that point; something is clearly leaking a lot of memory pretty quickly. If I run the same program, but leave timeout_sec as None (the default), no memory is leaked at all.
from rclpy.
With all the referenced PRs merged I can run the infinite spin_once
loop as well as the large publisher (and a corresponding subscriber) with constant memory usage. 🎉
@ArkturusMengsk @IanTheEngineer @srsidd @BrannonKing please give the current state a try and report back how your experience is. I will close this ticket for now but if there are still reproducible examples left which need fixing it can be reopened. Thanks.
from rclpy.
I spent some time investigating this leak today, figuring it would be a good chance to dip my toe into ros2
. It does seem there is a memory leak here, and as far as I can tell there is something funny happening beneath the rmw_wait()
function whenever you pass in timeout_sec=0.0
into rclpy.spin_once()
, as you did in your example code, @ArkturusMengsk.
As a workaround, it seems you can make the timeout anything greater than 0.0
, and the memory will at least leak slowly (possibly not at all?).
The middleware seems to exit without releasing the zero-initialized wait_set memory, guard_condition, and timer_ready_list that it allocates. This issue is exacerbated by the while loop continually spinning without a pause.
Instrumented Script:
#! /usr/bin/env python3
import rclpy
import time
import tracemalloc
rclpy.init()
node = rclpy.create_node('test_spin_once_mem')
period = 0.0 # 0.1, 0.01, 0.001
count = 0
begin_time = time.time()
tracemalloc.start()
while rclpy.ok() and time.time() - begin_time < 2.0:
rclpy.spin_once(node, timeout_sec=period)
count = count + 1
#time.sleep(0.01) # Uncomment for debugging loop speed
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
print("[Loop Count: {} ]".format(count))
for stat in top_stats[:10]:
print(stat)
rclpy.shutdown()
Output with #time.sleep(0.01)
commented in the loop:
$ ./mem_test_spin_once.py
[Loop Count: 205663 ]
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:44: size=17.3 MiB, count=205663, average=88 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:54: size=16.5 MiB, count=616990, average=28 B
./mem_test_spin_once.py:14: size=680 B, count=1, average=680 B
/home/imcmahon/devinstall_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:106: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:96: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:90: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:84: size=528 B, count=2, average=264 B
./mem_test_spin_once.py:17: size=448 B, count=1, average=448 B
./mem_test_spin_once.py:13: size=416 B, count=1, average=416 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:52: size=96 B, count=1, average=96 B
Output with time.sleep(0.01)
un-commented in the loop:
$ ./mem_test_spin_once.py
[Loop Count: 196 ]
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:44: size=16.8 KiB, count=196, average=88 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:54: size=16.1 KiB, count=589, average=28 B
./mem_test_spin_once.py:14: size=680 B, count=1, average=680 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:106: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:96: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:90: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:84: size=528 B, count=2, average=264 B
./mem_test_spin_once.py:17: size=448 B, count=1, average=448 B
./mem_test_spin_once.py:13: size=416 B, count=1, average=416 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:52: size=96 B, count=1, average=96 B
All this really shows is the sources of memory allocation, and that the faster the spin_once
function is called, the more memory is lost. I was hoping to have a fix to accompany this write up, but I've yet to uncover the underlying issue. Hope this is helpful to those debugging this :)
from rclpy.
I don't know if this is related, but I found a huge memory leak when I publish large amounts of data. It doesn't matter what the second argument of spin_once
is.
I first stumbled upon this while publishing and subscribing to images (~250kB). I then made a simple publisher with integers of similar size and end up with the same result.
import rclpy
from random import choice
from std_msgs.msg import String
from rclpy.qos import qos_profile_default, qos_profile_sensor_data
class MinimalPublisher:
def __init__(self, node):
# Define all publishers
self.str_publisher_ = node.create_publisher(String, 'boo', qos_profile=qos_profile_sensor_data)
timer_period = 1 # seconds
# Define a new timer for each pub
self.str_timer = node.create_timer(timer_period, self.str_timer_callback)
def str_timer_callback(self):
str_msg = String()
str_msg.data = ''.join(choice(ascii_uppercase) for i in range(250000))
self.str_publisher_.publish(str_msg)
print('Publishing: ', len(str_msg.data))
def main(args=None):
rclpy.init(args=args)
node = rclpy.create_node('minimal_publisher')
minimal_publisher = MinimalPublisher(node)
minimal_publisher # prevent unused variable warning
while rclpy.ok():
rclpy.spin_once(node)
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
I tried using Valgrind, based on someone's recommendation but this is my first time using it so I couldn't make much sense of what I saw. But the conclusions kind of confirmed a memory leak. Here are snippets from the valgrind run -
==1626== LEAK SUMMARY:
==1626== definitely lost: 1,501,126 bytes in 22 blocks
==1626== indirectly lost: 1,501,966 bytes in 48 blocks
==1626== possibly lost: 922,843 bytes in 267 blocks
==1626== still reachable: 13,676,704 bytes in 13,043 blocks
==1626== suppressed: 0 bytes in 0 blocks
and
==1626== 500,002 bytes in 2 blocks are possibly lost in loss record 1,954 of 1,962
==1626== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1626== by 0xAFA59FD: rosidl_generator_c__String__assignn (in /home/sidd/ros2_ws/build/rosidl_generator_c/librosidl_generator_c.so)
==1626== by 0xAFA5AC0: rosidl_generator_c__String__assign (in /home/sidd/ros2_ws/build/rosidl_generator_c/librosidl_generator_c.so)
==1626== by 0xF40400B: std_msgs_string__convert_from_py (in /home/sidd/ros2_ws/build/std_msgs/rosidl_generator_py/std_msgs/std_msgs_s__rosidl_typesupport_c.cpython-35m-x86_64-linux-gnu.so)
==1626== by 0x9D77A18: rclpy_publish (in /home/sidd/ros2_ws/build/rclpy/rclpy/_rclpy.cpython-35m-x86_64-linux-gnu.so)
==1626== by 0x4E9B9E: PyCFunction_Call (in /usr/bin/python3.5)
==1626== by 0x524413: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x52D82E: ??? (in /usr/bin/python3.5)
==1626== by 0x529331: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x52D2E2: ??? (in /usr/bin/python3.5)
and
==1626== 1,500,198 (192 direct, 1,500,006 indirect) bytes in 8 blocks are definitely lost in loss record 1,961 of 1,962
==1626== at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1626== by 0xF824502: std_msgs__msg__String__create (in /home/sidd/ros2_ws/build/std_msgs/libstd_msgs__rosidl_generator_c.so)
==1626== by 0xF403A01: std_msgs_string__convert_from_py (in /home/sidd/ros2_ws/build/std_msgs/rosidl_generator_py/std_msgs/std_msgs_s__rosidl_typesupport_c.cpython-35m-x86_64-linux-gnu.so)
==1626== by 0x9D77A18: rclpy_publish (in /home/sidd/ros2_ws/build/rclpy/rclpy/_rclpy.cpython-35m-x86_64-linux-gnu.so)
==1626== by 0x4E9B9E: PyCFunction_Call (in /usr/bin/python3.5)
==1626== by 0x524413: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x52D82E: ??? (in /usr/bin/python3.5)
==1626== by 0x529331: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626== by 0x52D2E2: ??? (in /usr/bin/python3.5)
==1626== by 0x528EED: PyEval_EvalFrameEx (in /usr/bin/python3.5)
Not sure what this means exactly but if someone can help me out with this, it'd be great!
from rclpy.
Okay I was able to do some more debugging on this, and it seems that the PR from #79 and #80 are not the only problems. Even after integrating the PR's I was still losing large amounts of memory. Running valgrind on this gave the following output -
==14986== 115,984 bytes in 1,318 blocks are definitely lost in loss record 1,991 of 2,009
==14986== at 0x4C2DB4F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14986== by 0x9D718FB: rclpy_get_zero_initialized_wait_set (_rclpy.c:976)
==14986== by 0x51EE73: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==14986== by 0x52426E: ??? (in /usr/bin/python3.5)
==14986== by 0x51FD82: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==14986== by 0x523D28: ??? (in /usr/bin/python3.5)
==14986== by 0x51F93E: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==14986== by 0x523D28: ??? (in /usr/bin/python3.5)
==14986== by 0x5249EE: PyEval_EvalCode (in /usr/bin/python3.5)
==14986== by 0x601401: ??? (in /usr/bin/python3.5)
==14986== by 0x603939: PyRun_FileExFlags (in /usr/bin/python3.5)
==14986== by 0x603B2B: PyRun_SimpleFileExFlags (in /usr/bin/python3.5)
==14986== LEAK SUMMARY:
==14986== definitely lost: 2,866,571 bytes in 1,331 blocks
==14986== indirectly lost: 0 bytes in 0 blocks
==14986== possibly lost: 180,288 bytes in 278 blocks
==14986== still reachable: 14,342,588 bytes in 18,886 blocks
==14986== suppressed: 0 bytes in 0 blocks
It seems that the middle ware exits without releasing the zero-initialized wait_set memory
as pointed by @IanTheEngineer . After looking at the c implementation, it seems that rcl_wait_set_fini()
function (Member of rcl/src/rcl/wait.c
) destroys the wait_set?
I tried adding a method in _rclpy.c
and call this method at the end of spin_once
in rclpy/__init__.py
. But that didn't really help....
Method added to _rclpy.c
-
static PyObject *
rclpy_del_wait_set(PyObject * Py_UNUSED(self), PyObject * args)
{
PyObject * pywait_set;
if (!PyArg_ParseTuple(args, "O", &pywait_set)) {
return NULL;
}
rcl_wait_set_t * wait_set = (rcl_wait_set_t *)PyCapsule_GetPointer(pywait_set, NULL);
rcl_ret_t ret;
ret = rcl_wait_set_fini(wait_set);
if (ret != RCL_RET_OK) {
PyErr_Format(PyExc_RuntimeError,
"Failed to delete wait set: %s", rcl_get_error_string_safe());
return NULL;
}
Py_RETURN_NONE;
}
Can someone comment on whether this is correct?
from rclpy.
Your added function looks right to me, and I think it should be necessary (through the current rclpy implementation is fast and loose with "on shutdown" cleanup). I don't know why it would not have the effect you wanted. Are you sure it is getting called from Python?
from rclpy.
Also, I'm not sure if where you said you're calling it from (at the end of spin_once()
) is the right place. I'd have to read deeper into the code. Maybe @mikaelarguedas has an idea off-hand.
from rclpy.
Looks to be the right place given that IIRC, spin_once is the function creating and populating the waitset before using it and... keeping it apparently...
from rclpy.
Yes it is getting called from Python. Maybe that's not the fix then?
I had a question - Why does spin_once
create the wait_set every time it's called. Looking at the rclcpp
implementation, the wait set is initialized once and hence forth used as a member of the Executor
class.
I felt the memory leak is because waitset is created every time spin once is called, hence added the rcl_wait_set_fini
method....
from rclpy.
@srsidd it probably should. The rclpy
implementation has only been implemented well in places. We've cut corners in lots of others because we simply haven't had time. I think there might be some issues about these shortfalls already:
I thought there was one for adding a proper spin function to rclpy
too, but I can't find that one.
from rclpy.
Right, I'd assume the best way forward would be to create a class for node within the __init__.py
script and use the existing api to make calls to the node class?
I can try that and see if it helps.
from rclpy.
After experimenting a bunch more, I found that invoking the garbage collector fixes the issue (I think). I call the gc.collect()
method at the end of spin_node
. Valgrind still shows that there's a memory leak, but monitoring the ram on htop shows that the leak is plugged. It maybe that Valgrind isn't smart enough to recognize the garbage collector in python. If this seems the right thing to do, I can create a PR with this change..
from rclpy.
It maybe that Valgrind isn't smart enough to recognize the garbage collector in python.
That's very likely imo. It could also be that Python is not garbage collecting when it knows it is shutting down, and just lets the OS handling reclaiming the memory. Sort of like http://en.cppreference.com/w/cpp/utility/program/quick_exit.
If this seems the right thing to do, I can create a PR with this change.
That would be great, it does sound like we should be destroying the waitset in the spin_once()
. We can look at it closer when reviewing your pr if you open one.
Either way thanks for looking into it.
from rclpy.
Just a cursory look at the Internet seems to confirm our theories about CPython and valgrind:
We see already valgrind is reporting memory has been leaked. This is normal, as there are a variety of small structures allocated by CPython at startup that it never frees, usually because doing so would be too difficult.
from rclpy.
Yes, there is also a README for using python with valgrind - http://svn.python.org/projects/python/trunk/Misc/README.valgrind. There is a way to disable to PyMalloc errors although I haven't tried it myself -
Valgrind may show some unexpected results when PyMalloc is used.
Starting with Python 2.3, PyMalloc is used by default. You can disable
PyMalloc when configuring python by adding the --without-pymalloc option.
from rclpy.
Doing a quick test here, I still see the memory leak from the original report. Marking this as a bug.
from rclpy.
Question: Does this test use the disabling of PyMalloc suggested in #74 (comment) ? If yes, can you post the invocation here? that will be useful for further benchmarking of python extensions
from rclpy.
So, I did a bit more research into this problem.
First, part of the reason that rclpy.spin_once(node, timeout_sec=None)
never leaks memory is that it blocks forever. That is, since this is a simple test program, it just sits inside spin_once() forever, and since nothing is happening, it never does anything further.
Second, if I change the timeout_sec to a value other than 0 (in this case, I tried 0.01), the program still leaks memory, just at a slower rate. This suggests that we have something more fundamental going on.
I think at least part of the problem stems from _rclpy.c using PyMem_Malloc()
without ever calling PyMem_Free()
. This page implies that memory allocated with PyMem_Malloc()
must be explicitly freed, and I see no calls in _rclpy.c to do that (please correct me if I'm wrong, here). I think we are going to do a large refactoring of rclpy in the near future; what that happens, we should explicitly make sure to try and free this memory.
from rclpy.
This is a major issue -- please focus on it! At ASI we've had a lot of issues with large messages causing leaked (er, gushing) memory both in CPP and Python. We also have big issues with CPU performance problems in the middleware wait methods -- a place where I would never expect high CPU usage. It's difficult to track because the profilers show the problem to be in the middleware, but it may not be the middleware's fault if the cleanup methods aren't getting called.
from rclpy.
Can you please report this in the ros2 repository or the appropriate middleware repository? Based on your comment it looks like the problem is not only related to Python.
Could you provide a reproducible and independent examples for the CPU consumption and the memory leak ?
We fixed several memory leaks recently but are welcoming any contribution to patch the remaining ones.
Thanks!
from rclpy.
I did some more digging on the memory leaks and high CPU I was seeing. The high CPU issue in Python3 was actually cause by a pygame Alsa bug. Disabling the pygame audio fixed it. The issue with large messages causing a memory leak (and running slow) appears to be a CoreDX-only thing. I couldn't reproduce the problem with the FastRTPS middleware. I've sent TwinOaks some information on the problem.
from rclpy.
@BrannonKing thanks for following up on it.
from rclpy.
Can you please try the patch from #109 which should fix the leaking of the waitset
in the spin_once
call.
I tried this with the following example:
import rclpy
rclpy.init()
node = rclpy.create_node('test_node')
i = 0
while rclpy.ok() and i < 100:
rclpy.spin_once(node, timeout_sec=0.1)
i += 1
from rclpy.
Given the low number of iteration it's not seeable, can you try the same test with another 3 orders of magnitude ?
while rclpy.ok() and i < 100000:
rclpy.spin_once(node, timeout_sec=0.0001)
i += 1
from rclpy.
Given the low number of iteration it's not seeable, can you try the same test with another 3 orders of magnitude ?
Still not leaking for me. The process does have an increasing memory usage but once it finishes valgrind
does report no incremental leaks for me (only three tiny leaks of a few kb).
from rclpy.
Unfortunately, the fix doesn't seem to help my program. The "increasing memory" you mention is just too much. The program below leaks massive amounts of memory even with the fix. I've never had good luck with Valgrind for Python memory leaks. I think we need a different way to track these. You can put a gc.collect()
call into the while loop; the memory still increases but the rate is less.
#!/usr/bin/env python3
import rclpy
def main():
rclpy.init()
node = rclpy.create_node('spin_n_leak')
while rclpy.ok():
rclpy.spin_once(node, timeout_sec=0.0)
if __name__ == '__main__':
main()
from rclpy.
@BrannonKing what you are seeing (the increasing memory usage) is not a memory leak. You can confirm that by running the script with valgrind --leak-check=full
which python3 your_script.py
. Instead of running forever you should add an upper bound to the loop as in my example above which makes the script terminate after a deterministic number of iterations. When you run the script without this patch it shows a memory leak in the order of iterations of the loop referencing the code in _rclpy.c
which creates and initializes the wait set. With this patch the memory leak is gone.
Since the memory leak is being fixed by #109 I will go ahead an merge that PR. There will also be a follow up PR (#110) to fix the memory leaks coming from the other entities (beside the wait set). After that the increasing memory usage still needs to be investigated and addressed.
from rclpy.
Since the memory leak has been addressed I will change the title of this ticket.
from rclpy.
Since the memory leak and memory usage increase has been fixed in #112 and #113 I'm going to close this.
Follow-up PRs fixing memory leaks outside of spin_once
are pending.
@BrannonKing feel free to comment here if this is not fixed for you or to open another ticket if you face issues unrelated to spin_once.
thanks @ArkturusMengsk @IanTheEngineer and @srsidd for the initial report and investigation
from rclpy.
This is not yet resolved with #114 still pending. So I reopened it for now.
from rclpy.
Reporting back: I don't see the leak any more. Good jorb.
from rclpy.
Reporting back: I don't see the leak any more. Good jorb.
Thank you for checking!
from rclpy.
Related Issues (20)
- Exception when instantiating ActionServer as part of ROS package tests HOT 2
- CMake Error (Non-existent Path) When Building From Source HOT 4
- Cannot log in ROS2 node after building with cython HOT 6
- ReadtheDocs page is missing a lot of documentation for API. HOT 5
- SIGINT signal on parent not terminated Turtlesim GUI application HOT 1
- ros2 topic HZ option is not predictable for report freq values when using default rmw_fastrtps_cpp rmw implementation HOT 3
- MultiThreadExecutor may throw exception after destroying subscription with non-default callback group
- Failed to send request: cannot publish data HOT 5
- Lifecycle node `Transition is not registered` exception kills node HOT 1
- debian12 humble source build: invalid use of incomplete type ‘PyFrameObject’
- failed to create domain error when spawning many python nodes at once from launch file with cyclonedds HOT 3
- Only hold weak internal reference to callbacks HOT 1
- Opinion: MultiThreadedExecutor should not raise exceptions if the Task's exception was retrieved HOT 3
- xmlrpc.client.Fault: <Fault 1: "<class 'RuntimeError'>:!rclpy.ok()"> HOT 5
- [design question] get_parameter exception choice HOT 5
- Timer hanging and high CPU load when using MultiThreadedExecutor HOT 18
- rclpy._rclpy_pybind11 HOT 1
- Feature Request for MultiProcessExecutor HOT 2
- async wait/sleep implementation HOT 4
- KeyError in ActionServer._execute_goal HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rclpy.