I tried to run the code on various platforms, including PC, laptop and embedded system

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks a lot <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Cannot run on low-profile GPU about pilco HOT 10 CLOSED

nrontsis commented on July 25, 2024

Cannot run on low-profile GPU

from pilco.

Comments (10)

nrontsis commented on July 25, 2024

Thanks for opening the issue. Could you provide with a minimal example so that we investigate?

from pilco.

jaztsong commented on July 25, 2024

I'm simply using the example - PILCO/tests/test_predictions.py.

I tested on CPU and GPU version of both tensorflow 1.13.1 and 1.14.0. There was no problem for CPU tensorflow, but no luck on GPU version. My guess is that the problem occurred when the function predict_given_factorization passing through the autoflow of GPflow.

from pilco.

jaztsong commented on July 25, 2024

BTW, if possible, can you implement a GPytorch version? It looks like GPytorch outperforms GPflow in large data scenarios.

from pilco.

nrontsis commented on July 25, 2024

Can you try running ~~PILCO/tests/test_predictions.py~~ CORRECTION: examples/inverted_pendulum.py with the following changes?

controller = LinearController(state_dim=state_dim, control_dim=control_dim)

instead of

controller = RbfController(state_dim=state_dim, control_dim=control_dim, num_basis_functions=5)

and

pilco = PILCO(X, Y, controller=controller, horizon=10)

instead of

pilco = PILCO(X, Y, controller=controller, horizon=40)

The RL algorithm will probably not converge to any useful policy but I am curious to see if this is a memory or architecture issue.

from pilco.

nrontsis commented on July 25, 2024

When we wrote the repo GPytorch was not as mature as GPflow. I guess that switching to pytorch would be a great but unfortunately I might not have the time for such a big change (at least in the foreseeable future...)

from pilco.

jaztsong commented on July 25, 2024

I cannot try examples/inverted_pendulum.py on Jetson TX2, since installing the Mujoco(I actually using Roboschool) environment is excruciating. But I can confirm that the memory is not the issue as the memory usage didn't hike up when running the program.

After a two-day investigation, so far, my impression is that the operations in the function predict_given_fractorization caused cuda registers usage exceeding the existing resources.

from pilco.

kyr-pol commented on July 25, 2024

Hi @jaztsong,

I had experience with resource allocation errors but it was memory issues in my case. They arise usually in the backward pass (when calculating gradients), where big matrices, order N^2D^2 with N the number of data points and D the number of dimensions are involved.

You could try something like reducing the number of data points used in the test, (it's a 100 and it's
hard coded, you'd have to change it in a few places) just to make sure it's not because of the demands of the algorithm. Excluding that would point to an issue with GPflow or CUDA (or tensorflow) and how they handle resources on your machine, but I wouldn't really know how to address that.

from pilco.

jaztsong commented on July 25, 2024

@kyr-po I actually changed the number of data points to 3, and it still broke. Bummer.

from pilco.

jaztsong commented on July 25, 2024

After debugging the code line by line for a few days, I finally solved the problem. It turned out the culprit is that the matrix determinant operation in mgpr.py on GPU caused some memory issue. The workaround solution I got is to calculate determinant by getting LU decomposition first and calculating the product of the diagonal units as the determinant.

The code can run now without error after the modification. However, the prediction speed is unacceptably high when data point number and output dimension ramp up, e.g., N = 40, D_output = 30.

Anyway, thank you for everyone's input.

from pilco.

nrontsis commented on July 25, 2024

Thanks a lot @jaztsong for the investigation! I believe that there is quite a big space for improvement in mgpr.py. Hopefully I will find time at some point to optimize it...

from pilco.

Cannot run on low-profile GPU about pilco HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent