Git Product home page Git Product logo

Comments (32)

clalancette avatar clalancette commented on July 4, 2024 1

No, I didn't rebuild Python and all of that (which that document suggests). I fired up the test program from the first comment, and after about 30 seconds of running, I'm now up to 3g resident memory (confirmed using top). I killed it at that point; something is clearly leaking a lot of memory pretty quickly. If I run the same program, but leave timeout_sec as None (the default), no memory is leaked at all.

from rclpy.

dirk-thomas avatar dirk-thomas commented on July 4, 2024 1

With all the referenced PRs merged I can run the infinite spin_once loop as well as the large publisher (and a corresponding subscriber) with constant memory usage. 🎉

@ArkturusMengsk @IanTheEngineer @srsidd @BrannonKing please give the current state a try and report back how your experience is. I will close this ticket for now but if there are still reproducible examples left which need fixing it can be reopened. Thanks.

from rclpy.

IanTheEngineer avatar IanTheEngineer commented on July 4, 2024

I spent some time investigating this leak today, figuring it would be a good chance to dip my toe into ros2. It does seem there is a memory leak here, and as far as I can tell there is something funny happening beneath the rmw_wait() function whenever you pass in timeout_sec=0.0 into rclpy.spin_once(), as you did in your example code, @ArkturusMengsk.

As a workaround, it seems you can make the timeout anything greater than 0.0, and the memory will at least leak slowly (possibly not at all?).

The middleware seems to exit without releasing the zero-initialized wait_set memory, guard_condition, and timer_ready_list that it allocates. This issue is exacerbated by the while loop continually spinning without a pause.

Instrumented Script:

#! /usr/bin/env python3

import rclpy
import time
import tracemalloc

rclpy.init()
node = rclpy.create_node('test_spin_once_mem')
period = 0.0 # 0.1, 0.01, 0.001
count = 0
begin_time = time.time()
tracemalloc.start()
while rclpy.ok() and time.time() - begin_time < 2.0:
    rclpy.spin_once(node, timeout_sec=period)
    count = count + 1
    #time.sleep(0.01)                # Uncomment for debugging loop speed
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')

print("[Loop Count: {} ]".format(count))
for stat in top_stats[:10]:
    print(stat)
rclpy.shutdown()

Output with #time.sleep(0.01) commented in the loop:

$ ./mem_test_spin_once.py 
[Loop Count: 205663 ]
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:44: size=17.3 MiB, count=205663, average=88 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:54: size=16.5 MiB, count=616990, average=28 B
./mem_test_spin_once.py:14: size=680 B, count=1, average=680 B
/home/imcmahon/devinstall_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:106: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:96: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:90: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:84: size=528 B, count=2, average=264 B
./mem_test_spin_once.py:17: size=448 B, count=1, average=448 B
./mem_test_spin_once.py:13: size=416 B, count=1, average=416 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:52: size=96 B, count=1, average=96 B

Output with time.sleep(0.01) un-commented in the loop:

$ ./mem_test_spin_once.py 
[Loop Count: 196 ]
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:44: size=16.8 KiB, count=196, average=88 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:54: size=16.1 KiB, count=589, average=28 B
./mem_test_spin_once.py:14: size=680 B, count=1, average=680 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:106: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:96: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:90: size=528 B, count=2, average=264 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:84: size=528 B, count=2, average=264 B
./mem_test_spin_once.py:17: size=448 B, count=1, average=448 B
./mem_test_spin_once.py:13: size=416 B, count=1, average=416 B
/home/imcmahon/dev/install_isolated/rclpy/lib/python3.5/site-packages/rclpy/__init__.py:52: size=96 B, count=1, average=96 B

All this really shows is the sources of memory allocation, and that the faster the spin_once function is called, the more memory is lost. I was hoping to have a fix to accompany this write up, but I've yet to uncover the underlying issue. Hope this is helpful to those debugging this :)

from rclpy.

srsidd avatar srsidd commented on July 4, 2024

I don't know if this is related, but I found a huge memory leak when I publish large amounts of data. It doesn't matter what the second argument of spin_once is.

I first stumbled upon this while publishing and subscribing to images (~250kB). I then made a simple publisher with integers of similar size and end up with the same result.

import rclpy
from random import choice

from std_msgs.msg import String
from rclpy.qos import qos_profile_default, qos_profile_sensor_data

class MinimalPublisher:

    def __init__(self, node):
        # Define all publishers
        self.str_publisher_ = node.create_publisher(String, 'boo', qos_profile=qos_profile_sensor_data)
        timer_period = 1  # seconds

        # Define a new timer for each pub
        self.str_timer = node.create_timer(timer_period, self.str_timer_callback)

    def str_timer_callback(self):
        str_msg = String()
        str_msg.data = ''.join(choice(ascii_uppercase) for i in range(250000))
        self.str_publisher_.publish(str_msg)
        print('Publishing: ', len(str_msg.data))

def main(args=None):
    rclpy.init(args=args)

    node = rclpy.create_node('minimal_publisher')

    minimal_publisher = MinimalPublisher(node)
    minimal_publisher  # prevent unused variable warning

    while rclpy.ok():
        rclpy.spin_once(node)

    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

I tried using Valgrind, based on someone's recommendation but this is my first time using it so I couldn't make much sense of what I saw. But the conclusions kind of confirmed a memory leak. Here are snippets from the valgrind run -

==1626== LEAK SUMMARY:
==1626==    definitely lost: 1,501,126 bytes in 22 blocks
==1626==    indirectly lost: 1,501,966 bytes in 48 blocks
==1626==      possibly lost: 922,843 bytes in 267 blocks
==1626==    still reachable: 13,676,704 bytes in 13,043 blocks
==1626==         suppressed: 0 bytes in 0 blocks

and

==1626== 500,002 bytes in 2 blocks are possibly lost in loss record 1,954 of 1,962
==1626==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1626==    by 0xAFA59FD: rosidl_generator_c__String__assignn (in /home/sidd/ros2_ws/build/rosidl_generator_c/librosidl_generator_c.so)
==1626==    by 0xAFA5AC0: rosidl_generator_c__String__assign (in /home/sidd/ros2_ws/build/rosidl_generator_c/librosidl_generator_c.so)
==1626==    by 0xF40400B: std_msgs_string__convert_from_py (in /home/sidd/ros2_ws/build/std_msgs/rosidl_generator_py/std_msgs/std_msgs_s__rosidl_typesupport_c.cpython-35m-x86_64-linux-gnu.so)
==1626==    by 0x9D77A18: rclpy_publish (in /home/sidd/ros2_ws/build/rclpy/rclpy/_rclpy.cpython-35m-x86_64-linux-gnu.so)
==1626==    by 0x4E9B9E: PyCFunction_Call (in /usr/bin/python3.5)
==1626==    by 0x524413: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x52D82E: ??? (in /usr/bin/python3.5)
==1626==    by 0x529331: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x52D2E2: ??? (in /usr/bin/python3.5)

and

==1626== 1,500,198 (192 direct, 1,500,006 indirect) bytes in 8 blocks are definitely lost in loss record 1,961 of 1,962
==1626==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==1626==    by 0xF824502: std_msgs__msg__String__create (in /home/sidd/ros2_ws/build/std_msgs/libstd_msgs__rosidl_generator_c.so)
==1626==    by 0xF403A01: std_msgs_string__convert_from_py (in /home/sidd/ros2_ws/build/std_msgs/rosidl_generator_py/std_msgs/std_msgs_s__rosidl_typesupport_c.cpython-35m-x86_64-linux-gnu.so)
==1626==    by 0x9D77A18: rclpy_publish (in /home/sidd/ros2_ws/build/rclpy/rclpy/_rclpy.cpython-35m-x86_64-linux-gnu.so)
==1626==    by 0x4E9B9E: PyCFunction_Call (in /usr/bin/python3.5)
==1626==    by 0x524413: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x528813: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x52D82E: ??? (in /usr/bin/python3.5)
==1626==    by 0x529331: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==1626==    by 0x52D2E2: ??? (in /usr/bin/python3.5)
==1626==    by 0x528EED: PyEval_EvalFrameEx (in /usr/bin/python3.5)

Not sure what this means exactly but if someone can help me out with this, it'd be great!

from rclpy.

srsidd avatar srsidd commented on July 4, 2024

Okay I was able to do some more debugging on this, and it seems that the PR from #79 and #80 are not the only problems. Even after integrating the PR's I was still losing large amounts of memory. Running valgrind on this gave the following output -

==14986== 115,984 bytes in 1,318 blocks are definitely lost in loss record 1,991 of 2,009
==14986==    at 0x4C2DB4F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14986==    by 0x9D718FB: rclpy_get_zero_initialized_wait_set (_rclpy.c:976)
==14986==    by 0x51EE73: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==14986==    by 0x52426E: ??? (in /usr/bin/python3.5)
==14986==    by 0x51FD82: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==14986==    by 0x523D28: ??? (in /usr/bin/python3.5)
==14986==    by 0x51F93E: PyEval_EvalFrameEx (in /usr/bin/python3.5)
==14986==    by 0x523D28: ??? (in /usr/bin/python3.5)
==14986==    by 0x5249EE: PyEval_EvalCode (in /usr/bin/python3.5)
==14986==    by 0x601401: ??? (in /usr/bin/python3.5)
==14986==    by 0x603939: PyRun_FileExFlags (in /usr/bin/python3.5)
==14986==    by 0x603B2B: PyRun_SimpleFileExFlags (in /usr/bin/python3.5)
==14986== LEAK SUMMARY:
==14986==    definitely lost: 2,866,571 bytes in 1,331 blocks
==14986==    indirectly lost: 0 bytes in 0 blocks
==14986==      possibly lost: 180,288 bytes in 278 blocks
==14986==    still reachable: 14,342,588 bytes in 18,886 blocks
==14986==         suppressed: 0 bytes in 0 blocks

It seems that the middle ware exits without releasing the zero-initialized wait_set memory as pointed by @IanTheEngineer . After looking at the c implementation, it seems that rcl_wait_set_fini() function (Member of rcl/src/rcl/wait.c) destroys the wait_set?

I tried adding a method in _rclpy.c and call this method at the end of spin_once in rclpy/__init__.py. But that didn't really help....

Method added to _rclpy.c -

static PyObject *
rclpy_del_wait_set(PyObject * Py_UNUSED(self), PyObject * args)
{
  PyObject * pywait_set;

  if (!PyArg_ParseTuple(args, "O", &pywait_set)) {
    return NULL;
  }
  rcl_wait_set_t * wait_set = (rcl_wait_set_t *)PyCapsule_GetPointer(pywait_set, NULL);
  rcl_ret_t ret;
  ret = rcl_wait_set_fini(wait_set);
  if (ret != RCL_RET_OK) {
    PyErr_Format(PyExc_RuntimeError,
      "Failed to delete wait set: %s", rcl_get_error_string_safe());
    return NULL;
  }
  Py_RETURN_NONE;
}

Can someone comment on whether this is correct?

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

Your added function looks right to me, and I think it should be necessary (through the current rclpy implementation is fast and loose with "on shutdown" cleanup). I don't know why it would not have the effect you wanted. Are you sure it is getting called from Python?

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

Also, I'm not sure if where you said you're calling it from (at the end of spin_once()) is the right place. I'd have to read deeper into the code. Maybe @mikaelarguedas has an idea off-hand.

from rclpy.

mikaelarguedas avatar mikaelarguedas commented on July 4, 2024

Looks to be the right place given that IIRC, spin_once is the function creating and populating the waitset before using it and... keeping it apparently...

from rclpy.

srsidd avatar srsidd commented on July 4, 2024

Yes it is getting called from Python. Maybe that's not the fix then?

I had a question - Why does spin_once create the wait_set every time it's called. Looking at the rclcpp implementation, the wait set is initialized once and hence forth used as a member of the Executor class.
I felt the memory leak is because waitset is created every time spin once is called, hence added the rcl_wait_set_fini method....

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

@srsidd it probably should. The rclpy implementation has only been implemented well in places. We've cut corners in lots of others because we simply haven't had time. I think there might be some issues about these shortfalls already:

I thought there was one for adding a proper spin function to rclpy too, but I can't find that one.

from rclpy.

srsidd avatar srsidd commented on July 4, 2024

Right, I'd assume the best way forward would be to create a class for node within the __init__.py script and use the existing api to make calls to the node class?
I can try that and see if it helps.

from rclpy.

srsidd avatar srsidd commented on July 4, 2024

After experimenting a bunch more, I found that invoking the garbage collector fixes the issue (I think). I call the gc.collect() method at the end of spin_node. Valgrind still shows that there's a memory leak, but monitoring the ram on htop shows that the leak is plugged. It maybe that Valgrind isn't smart enough to recognize the garbage collector in python. If this seems the right thing to do, I can create a PR with this change..

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

It maybe that Valgrind isn't smart enough to recognize the garbage collector in python.

That's very likely imo. It could also be that Python is not garbage collecting when it knows it is shutting down, and just lets the OS handling reclaiming the memory. Sort of like http://en.cppreference.com/w/cpp/utility/program/quick_exit.

If this seems the right thing to do, I can create a PR with this change.

That would be great, it does sound like we should be destroying the waitset in the spin_once(). We can look at it closer when reviewing your pr if you open one.

Either way thanks for looking into it.

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

Just a cursory look at the Internet seems to confirm our theories about CPython and valgrind:

We see already valgrind is reporting memory has been leaked. This is normal, as there are a variety of small structures allocated by CPython at startup that it never frees, usually because doing so would be too difficult.

-- http://pythonsweetness.tumblr.com/post/151938194807/debugging-c-extension-memory-safety-with-valgrind

from rclpy.

srsidd avatar srsidd commented on July 4, 2024

Yes, there is also a README for using python with valgrind - http://svn.python.org/projects/python/trunk/Misc/README.valgrind. There is a way to disable to PyMalloc errors although I haven't tried it myself -

Valgrind may show some unexpected results when PyMalloc is used.
Starting with Python 2.3, PyMalloc is used by default. You can disable
PyMalloc when configuring python by adding the --without-pymalloc option.

from rclpy.

clalancette avatar clalancette commented on July 4, 2024

Doing a quick test here, I still see the memory leak from the original report. Marking this as a bug.

from rclpy.

mikaelarguedas avatar mikaelarguedas commented on July 4, 2024

Question: Does this test use the disabling of PyMalloc suggested in #74 (comment) ? If yes, can you post the invocation here? that will be useful for further benchmarking of python extensions

from rclpy.

clalancette avatar clalancette commented on July 4, 2024

So, I did a bit more research into this problem.

First, part of the reason that rclpy.spin_once(node, timeout_sec=None) never leaks memory is that it blocks forever. That is, since this is a simple test program, it just sits inside spin_once() forever, and since nothing is happening, it never does anything further.

Second, if I change the timeout_sec to a value other than 0 (in this case, I tried 0.01), the program still leaks memory, just at a slower rate. This suggests that we have something more fundamental going on.

I think at least part of the problem stems from _rclpy.c using PyMem_Malloc() without ever calling PyMem_Free(). This page implies that memory allocated with PyMem_Malloc() must be explicitly freed, and I see no calls in _rclpy.c to do that (please correct me if I'm wrong, here). I think we are going to do a large refactoring of rclpy in the near future; what that happens, we should explicitly make sure to try and free this memory.

from rclpy.

BrannonKing avatar BrannonKing commented on July 4, 2024

This is a major issue -- please focus on it! At ASI we've had a lot of issues with large messages causing leaked (er, gushing) memory both in CPP and Python. We also have big issues with CPU performance problems in the middleware wait methods -- a place where I would never expect high CPU usage. It's difficult to track because the profilers show the problem to be in the middleware, but it may not be the middleware's fault if the cleanup methods aren't getting called.

from rclpy.

mikaelarguedas avatar mikaelarguedas commented on July 4, 2024

Can you please report this in the ros2 repository or the appropriate middleware repository? Based on your comment it looks like the problem is not only related to Python.
Could you provide a reproducible and independent examples for the CPU consumption and the memory leak ?
We fixed several memory leaks recently but are welcoming any contribution to patch the remaining ones.

Thanks!

from rclpy.

BrannonKing avatar BrannonKing commented on July 4, 2024

I did some more digging on the memory leaks and high CPU I was seeing. The high CPU issue in Python3 was actually cause by a pygame Alsa bug. Disabling the pygame audio fixed it. The issue with large messages causing a memory leak (and running slow) appears to be a CoreDX-only thing. I couldn't reproduce the problem with the FastRTPS middleware. I've sent TwinOaks some information on the problem.

from rclpy.

wjwwood avatar wjwwood commented on July 4, 2024

@BrannonKing thanks for following up on it.

from rclpy.

dirk-thomas avatar dirk-thomas commented on July 4, 2024

Can you please try the patch from #109 which should fix the leaking of the waitset in the spin_once call.

I tried this with the following example:

import rclpy

rclpy.init()
node = rclpy.create_node('test_node')
i = 0
while rclpy.ok() and i < 100:
    rclpy.spin_once(node, timeout_sec=0.1)
    i += 1

from rclpy.

mikaelarguedas avatar mikaelarguedas commented on July 4, 2024

Given the low number of iteration it's not seeable, can you try the same test with another 3 orders of magnitude ?

while rclpy.ok() and i < 100000:
    rclpy.spin_once(node, timeout_sec=0.0001)
    i += 1

from rclpy.

dirk-thomas avatar dirk-thomas commented on July 4, 2024

Given the low number of iteration it's not seeable, can you try the same test with another 3 orders of magnitude ?

Still not leaking for me. The process does have an increasing memory usage but once it finishes valgrind does report no incremental leaks for me (only three tiny leaks of a few kb).

from rclpy.

BrannonKing avatar BrannonKing commented on July 4, 2024

Unfortunately, the fix doesn't seem to help my program. The "increasing memory" you mention is just too much. The program below leaks massive amounts of memory even with the fix. I've never had good luck with Valgrind for Python memory leaks. I think we need a different way to track these. You can put a gc.collect() call into the while loop; the memory still increases but the rate is less.

#!/usr/bin/env python3
import rclpy


def main():
    rclpy.init()
    node = rclpy.create_node('spin_n_leak')
    while rclpy.ok():
       rclpy.spin_once(node, timeout_sec=0.0)

if __name__ == '__main__':
    main()

from rclpy.

dirk-thomas avatar dirk-thomas commented on July 4, 2024

@BrannonKing what you are seeing (the increasing memory usage) is not a memory leak. You can confirm that by running the script with valgrind --leak-check=full which python3 your_script.py. Instead of running forever you should add an upper bound to the loop as in my example above which makes the script terminate after a deterministic number of iterations. When you run the script without this patch it shows a memory leak in the order of iterations of the loop referencing the code in _rclpy.c which creates and initializes the wait set. With this patch the memory leak is gone.

Since the memory leak is being fixed by #109 I will go ahead an merge that PR. There will also be a follow up PR (#110) to fix the memory leaks coming from the other entities (beside the wait set). After that the increasing memory usage still needs to be investigated and addressed.

from rclpy.

dirk-thomas avatar dirk-thomas commented on July 4, 2024

Since the memory leak has been addressed I will change the title of this ticket.

from rclpy.

mikaelarguedas avatar mikaelarguedas commented on July 4, 2024

Since the memory leak and memory usage increase has been fixed in #112 and #113 I'm going to close this.
Follow-up PRs fixing memory leaks outside of spin_once are pending.
@BrannonKing feel free to comment here if this is not fixed for you or to open another ticket if you face issues unrelated to spin_once.

thanks @ArkturusMengsk @IanTheEngineer and @srsidd for the initial report and investigation

from rclpy.

dirk-thomas avatar dirk-thomas commented on July 4, 2024

This is not yet resolved with #114 still pending. So I reopened it for now.

from rclpy.

BrannonKing avatar BrannonKing commented on July 4, 2024

Reporting back: I don't see the leak any more. Good jorb.

from rclpy.

dirk-thomas avatar dirk-thomas commented on July 4, 2024

Reporting back: I don't see the leak any more. Good jorb.

Thank you for checking!

from rclpy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.