Git Product home page Git Product logo

set_gpu_fans_public's Introduction

set_gpu_fans_public

Controlling the fan speed of an NVIDIA GPU on a headless linux system requires spoofing a display. This can be used to gain a few percent additonal performance, at the cost of increased noise. For installation and usage, read the comments in cool_gpu.

temp of multi-gpu is individually obtained and adjusted

  liuk@acgpu1 ~ $ nvidia-smi 
  Sat May 12 02:17:41 2018       
  +-----------------------------------------------------------------------------+
  | NVIDIA-SMI 396.24                 Driver Version: 396.24                    |
  |-------------------------------+----------------------+----------------------+
  | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
  | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
  |===============================+======================+======================|
  |   0  GeForce GTX 108...  On   | 00000000:18:00.0 Off |                  N/A |
  | 90%   72C    P2   202W / 250W |   2724MiB / 11178MiB |    100%      Default |
  +-------------------------------+----------------------+----------------------+
  |   1  GeForce GTX 108...  On   | 00000000:3B:00.0 Off |                  N/A |
  | 90%   72C    P2   196W / 250W |   2724MiB / 11178MiB |    100%      Default |
  +-------------------------------+----------------------+----------------------+
  |   2  TITAN V             On   | 00000000:86:00.0 Off |                  N/A |
  | 80%   63C    P2   142W / 250W |   2983MiB / 12066MiB |    100%      Default |
  +-------------------------------+----------------------+----------------------+
  |   3  TITAN V             On   | 00000000:AF:00.0  On |                  N/A |
  | 85%   66C    P2   151W / 250W |   2983MiB / 12066MiB |    100%      Default |
  +-------------------------------+----------------------+----------------------+

set_gpu_fans_public's People

Contributors

boris-dimitrov avatar liu-kan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

set_gpu_fans_public's Issues

libEGL warnings

When the program is running (4x2080Ti), the log says

libEGL warning: DRI2: failed to authenticate
libEGL warning: DRI2: failed to create any config
libEGL warning: DRI2: failed to create any config

here and there. I suppose they exist simply because the fake display does not fully support EGL and thus these warnings can be ignored, right?

sed

First off, thanks for the hack, it works fine on a headless GTX 1070.

The only problem I ran into is that when running the script as-is, when it executes sed, it outputs the following:

sed: couldn't open [xorg.conf full path]: No such file or directory

I "fixed" it by putting the command in a single line, I don't know if there is some shell incompatibility when handling spaces in the sed command or something like that.

It might be something related to my environment, and it's just a minor point, I'm just notifying it if anyone else run into it.

watt cannot increase

Hi I'm using both Alex and your version. I have GTX1070, the fan is increase to what I want. But during calculation the power (Watt) cannot go beyond 25W, does it happen in your server ?

target not set properly when using stop option in nvscmd

I'm not super familiar with tcsh, and I translated the tcsh code to bash for my purposes, but I noticed that the stop option may not work properly (it did not in my bash version) because the setting of target to -1 at this line is overwritten by the else statement in the next check, and target gets set to the string "stop".
The fix is to reverse the order of the two if statements:

if ("x$1" == "x" || "x$1" == "xstart" || "x$1" == "x-display") then
    # this means "use heuristic"
    @ target = -3
else
    @ target = $1
endif

if ("x$1" == "xstop") then
    @ target = -1
endif

/opt/set-gpu-fans/nvscmd: 84: /opt/set-gpu-fans/nvscmd: Syntax error: "(" unexpected (expecting "fi")

[1] 20166
Persistence mode is already Enabled for GPU 0000:02:00.0.
Persistence mode is already Enabled for GPU 0000:81:00.0.
All done.

X.Org X Server 1.18.4
Release Date: 2016-07-19
X Protocol Version 11, Revision 0
Build Operating System: Linux 4.4.0-83-generic x86_64 Ubuntu
Current Operating System: Linux WAIR-P8000 4.4.0-83-generic #106-Ubuntu SMP Mon Jun 26 17:54:43 UTC 2017 x86_64
Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-83-generic root=UUID=512494ce-8ef7-4b2f-8996-d81639985410 ro text
Build Date: 17 July 2017 05:05:12PM
xorg-server 2:1.18.4-0ubuntu0.3 (For technical support please see http://www.ubuntu.com/support)
Current version of pixman: 0.33.6
Before reporting problems, check http://wiki.x.org
to make sure that you have the latest version.
Markers: (--) probed, (**) from config file, (==) default setting,
(++) from command line, (!!) notice, (II) informational,
(WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Wed Jul 26 17:16:09 2017
(++) Using config file: "/tmp/xorg-aryyQphn.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
/opt/set-gpu-fans/nvscmd: 84: /opt/set-gpu-fans/nvscmd: Syntax error: "(" unexpected (expecting "fi")
xinit: connection to X server lost

waiting for X server to shut down (II) Server terminated successfully (0). Closing log file.

I'm wondering if there is a fix for this error?

xinit: connection to X server lost

We are testing the script on a headless system.

NVRM version: NVIDIA UNIX x86_64 Kernel Module 381.22 Thu May 4 00:55:03 PDT 2017

the test setup is a non Titan GTX and the nvidia-smi line used internally by the script
/usr/bin/nvidia-smi dmon -s p -c 1
results in

Not supported on the device(s)
Failed to process command line

The solution would be to simplify the script, eliminate nvidia-smi call and use it to run nvidia-settings without GPU monitoring

/usr/bin/nvidia-settings -a '[gpu:0]/GPUFanControlState=1' -a '[fan:0]/GPUTargetFanSpeed='100'

Steps we followed:

  • download into ${HOME}/set-gpu-fans
  • create symlink ln -sf ${HOME}/set-gpu-fans /opt/set-gpu-fans
  • no X running
  • set persistent mode sudo nvidia-smi -pm 1
  • execute
cd /opt/set-gpu-fans
sudo tcsh
./cool_gpu >& controller.log &
tail -f controller.log

We get the following output:

        (++) from command line, (!!) notice, (II) informational,
        (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: "/var/log/Xorg.0.log", Time: Sun May 21 20:43:25 2017
(++) Using config file: "/tmp/xorg-HLKicaMP.conf"
(==) Using system config directory "/usr/share/X11/xorg.conf.d"
@: Expression Syntax.
xinit: connection to X server lost

waiting for X server to shut down (II) Server terminated successfully (0). Closing log file.

two gpus with three fans

I have to say this is a wonderful work. I can run it on my system. However, I found new issues. On my computer, I have two cards, gpu0: TITIAN RTX (two fans) and gpu1: RTX 2080ti (one fan). After I run this script to control gpu fans, it seems that only one fan's speed are adjusted for TITAN RTX. leading to high temperature. Besides, two gpus' fans speed are controlled in the same level. Actually, the second gpu's temperature is 15 degrees lower the first gpu. I want to control them seperately. How should I modify this script to realize functions I want.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.