Git Product home page Git Product logo

rntop's People

Contributors

ekinkarabulut avatar razrotenberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rntop's Issues

rntop runs as root contrary to best practices

Is there any reason that the docker image for rntop leave the user as root? This is contrary to best practices for security.

A quick reading of the README and the source leads me to understand that rntop is ssh'ing to remote machines and running nvidia-smi on the remote machine. This means that you don't need root on either the local machine, remote machine, or within the container.

Failed to read private key: /root/.ssh/id_rsa

$ sudo docker run -it --rm -v ~/.ssh:/root/.ssh runai/rntop [email protected]
Error authenticating client side: Failed to read private key: /root/.ssh/id_rsa
terminate called after throwing an instance of 'std::exception'
what(): std::exception

Any idea how I can resolve this error?

Unable to run rntop

Hi, I have issue running rntop.

This is my setup:

  1. I have a ubuntu22.04 VM that runs the rntop docker container.
  2. I have a host that simulate the GPU machine that I want to monitor.
  • I can ping the host from my VM as well as SSH from VM to host.
  • I am able to run docker run -it --rm -v $HOME/.ssh:/root/.ssh --entrypoint bash runai/rntop -c "ssh user@machine nvidia-smi". From the README, it means that the container can connect to the machine and it's the rntop application itself that can't.
  • When I proceed by adding --ssh to the rntop command, i.e. sudo docker run -it --rm -v $HOME/.ssh:/root/.ssh --entrypoint bash runai/rntop:latest user@machine, it fails. I have error "GPUs wmove() failed. Terminate called after throwing an instance of 'std::expression'. In the printed output, there is no cluster and nodes info printed out too.

I am not sure how to further troubleshoot, any advice? Thanks

GPU names in the list?

Is there any option to get GPU names in the list?
not those short names from default nvidia-smi view, but longer versions from nvidia-smi -L (e.g. there are 3+ Titan cards TITAN X (Pascal) vs GeForce GTX TITAN X which short is GeForce GTX TIT...)

Error Handling for When the nvidia-smi fails on some nodes

Hi,

Thanks for the great tool. I'm running it for 9 machines, and inevitably sometimes some machines' nvidia-smi might be down. For example:

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch

Then the

sudo docker run -it --rm -v $HOME/.ssh:/root/.ssh runai/rntop ...

will result in

terminate called after throwing an instance of 'std::out_of_range'
  what():  vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)

Is this the expected behavior, and any plans to fix it? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.