Git Product home page Git Product logo

slurm_tools's People

Contributors

colindaven avatar ggwena avatar martijnkruiten avatar oleholmnielsen avatar wpoely86 avatar xmontagut avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

slurm_tools's Issues

listing available gpus

A very useful software. How can we list the available vs used GRES for gpus?

For instance, if I do:

pestat -G

This is partially good, as I can see the GRES being used. But it doesn't show the GRES available.

For CPUs, you get to see used/total (in my case 0/48). How can I get a similar output for gpus?

awk error using slurmusersettings and not defined group/account

Hi,
I'm facing an error using slurmusersettings to add my users while ignoring the primary group (export ignore_primary_groups=1).

My overall idea is to select users to add to the cluster using only the secondary groups (as all my users has primary group USERGROUP which is not meaningful in this context).

My accounts.conf is defined like this (and SlurmDBD is consistent):

institute::1:all users
dep1:institute:1:Dep1
dep2:institute:1:Dep 2

When I run slurmusersettings, when the script manage a USERNAME with primary gid USERGROUP this error occours (debug=1):

...
User USERNAME group USERGROUP
### User USERNAME exists under account= fairshare=
### NOTICE: Group USERGROUP has no settings, assuming default values for user USERNAME
gawk: cmd. line:344: (FILENAME=- FNR=40) fatal: attempt to use a scalar value as array
...

USERGROUP account is intentionally not defined in SlurbDBD and USERNAME does not belongs to any other defined group/account.

I fixed (for me?) this issue with commit 7e42c73

The modified version returns a command like

/usr/bin/sacctmgr -i create user name=USERNAME

which then fail (no defaultaccount defined) which is for me an acceptable behavior (I'm also working on a second patch to not exclude users based on the primary group and there this behavior will be fixed I hope)

Thank you for your scripts and your documentation! They're very useful, nicely written and your documentation is precious!
In a cluster with LDAP users/groups (obtained via getent) we are evaluating to use slurmaccounts and slurmusersettings to manage accounting because of their simplicity (plain shell/awk scripting rulez!).

Cheers, Paolo

Flag '-F' showing magenta nodes

Hi Ole,

Using the flag -f was giving me too many results, so I tried -F. However this is still showing magenta nodes:

$ pestat -F -u alice                                                                              
Print only nodes that are flagged by * (RED nodes)
Select a single Slurm user: alice
Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
                            State Use/Tot  (15min)     (MB)     (MB)  JobID(JobArrayID) User ...
    c103           main*      mix  14  32   14.32    192000    89277* 7683123 alice 7677623(7677209_27) bob 7677209(7677209_28) bob 7678231 carol *
    c104           main*      mix  13  32   13.32    192000    74430* 7683130 alice 7689301(7688525_740) dave 7689290(7688525_729) dave 7678231 carol *

Obviously you can't see the colour above but everything in the Freemem and Joblist columns ist magenta. There is nothing in red.

What am I doing wrong?

Useraccounts: Partition-wise configuration and secondary groups

Dear Ole,

thank you for providing your scripts. There will be very helpful for me.

But I've two questions:

  1. I can't define partition-wise values. But it should be possible to maintain this directly via sacctmgr?
  2. The script doesn't evaluates secondary group memberships?

Kind regards,
Stefan

Need some mentoring re slurm admin in HPC

Hi, we are willing to build some AI HPC Cluster infrastructure for grants to researchers running open-source projects in AI field.

I am responsible to build up the cluster(s) and setup shop to deliver the compute grants to recipients. At this moment we were able to script the clusters launching (using AWS cloud) and we have to dive into slurm, accounting, quotas, qos and user management.

Looking at this repo I understand all these are already optimized on your end. We would like to achieve something similar in the near future and I was wondering for any kind of help we can get (from basic interactions here up to mentoring or even contracted services - not in charge with purchasing but if that is the correct way I could ask for it)

Please let me know what are our chances for that. Thanks

minor path suggestion

This is great--thanks!

It would be slightly easier to set up if it just used the 'awk' and SLURM commands already in $PATH, which is likely the usual case.

memory presented is using "free" memory vs "available memory"

Writing a note here that the current presentation of "free" memory is using "freemem" vs "available" memory, which presents the consumed memory incorrectly in terms of what is available for jobs landing on nodes.

example:
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
n799 partition maint* 0 112 0.06 515466 22191*

But that node has 497G of ram available:
n799 ~]# free -m
total used free shared buff/cache available
Mem: 515466 16546 22183 365 476736 497180

Note- This is actually coming from slurm itself as they do not track "available" memory. If you agree that this should be changed, bug id is: 15077

side note: super neat tools you have here! Thank you for spending the time on them!

Using a default group for per user groups

Hello,
first thank you for your useful tools.

I had to come up with a quick solution on my system for per user groups. It is on a "works for me" basis, but I hope it might be useful for someone else too. The way I chose to do it was, to replace the per user groups with a default group.
Here is the diff of the slurmusersettings file:

diff slurmusersettings.orig slurmusersettings.edited

33a34,35
> export defaultgroup="slurmuser"
> 
55c57
< export sacctmgr=/usr/bin/sacctmgr
---
> export sacctmgr=$(which sacctmgr)
148a151
>       defaultgroup    = ENVIRON["defaultgroup"]
348,350c351,355
<               print "### Ignoring users with per-user group: User=" u " group=" g
<               userinformation[u] = "IGNORE"
<               next
---
>               print "### user with per-user group: User=" u " group=" g
>               print "### setting default group to " defaultgroup
>               #userinformation[u] = "IGNORE"
>               #next
>               g = defaultgroup
467a473
>       if(isarray(setting[u][config]) && isarray(setting[u][current])) {
476c482,484
< 
---
> } else {
>       print "### config or current not arrays"
> }

Hard coded paths

Hi Ole,

It looks like your code has hard coded paths in them to the various Slurm utilities, which likely won't work for sites that don't use RPMs.

$ ./pestat 
./pestat: line 273: /usr/bin/sinfo: No such file or directory
sh: /usr/bin/squeue: No such file or directory
Hostname       Partition     Node Num_CPU  CPUload  Memsize  Freemem  Joblist
                            State Use/Tot              (MB)     (MB)  JobId User ...

It's best to let them be found by the $PATH I think (or alternatively have a config file that says what directory they are in).

All the best,
Chris

How to install the tools if I am not the superuser?

I think this is very useful tool, but I am just a normal user on a cluster. I want to know how to install the tools by myself?

I can not use sodu command.
I can not copy files to /etc/ or to /user/bin/ because of permission limitation.


I solved this problem by myself.
I edited the pestat file directoly to set the path,
and then I put this pestat file in my home directory,
and then I put this line in my bashrc file PATH=/home/myusername/bin:"$PATH"

free vs. available memory

Is there any way for pestat to report on what the free command calls "available" rather than "free"? Since "free" includes usage for file caching, I don't find it the most useful for figuring out if a node is actually overloaded.

total gpu usage with slurmacct

Hi,

I was wondering if you have a flag to get the total cpu and gpu usage with the slurmacct tool? Goal is to get the total cpu and gpu hours per month per partition.

Thank you

Some issues with Scripts.....

Hello together,

I had a "problem" regarding your SLURM - Tools and I can't find the mistake so far. I've got the following output, if I called the script (showuserjobs):

awk: not an option: --version
awk: line 11: regular expression compile failed (missing operand)
*
Batch job status for cluster influxus-physicus at Thu 30 Sep 2021 12:32:40 PM CEST

Job summary: 0 jobs total (max=10000) in all partitions.

Username/ Runnin Limit Pendin
Totals Account Jobs CPUs CPUs Jobs CPUs Further info
=========== ========== ====== ====== ====== ====== ====== =============================
awk: line 37: syntax error at or near [
awk: line 59: syntax error at or near [
awk: line 60: syntax error at or near [
awk: line 61: syntax error at or near [
awk: line 64: syntax error at or near [

I think, this is very simple for you, but I search for a while, why it doesn't work....
Thank you in advances.....

Z. Matthias

slurmusersetting: suppression of users with non-login shell

Would it be possible to add a flag to allow the suppression of the output regarding users with a non-login shell?

Our use-case is that, as part of our user life-cycle management, we set the login shells of users who haven't logged on for a certain period or whom we have not been able to contact to /sbin/nologin. We would like to keep them in the SlurmDB but just prevent them from logging in. This potentially affects quite a large number of users, so it would be nice to be able to suppress the corresponding output.

WDYT?

Do not create accounts for users not in certain group

Hi,
this repository looks very nice and would solve a lot of problems for our Slurm cluster. Our infrastructure manages login via ldap. I set up creating slurm users as you described and it works nicely. But we have domain accounts for which we do not want to create slurm accounts, e.g. mailboxes and service users. We manage access to our slurm cluster by assigning the group slurm-user, accounts that do not have this group should be ignored. Is that somehow possible with your code? I had a look but did not find an easy place to add it. I would have used the homedir checking, but our users need to log in for it to be created and user data is saved on different storages based on their association. Thanks!

Thanks, very useful

I particularly like the CPU and usage details in pestat and the monthly reporting commands. Thanks for sharing, this will help many of us at my institution.

awk: cmd. line:261: (FILENAME=- FNR=1) fatal: division by zero attempted

Hi when I run this script, I got the below error output, Could you pls have a look, and let me know how to fix it? Thank you.


Notice: Longest hostname length is truncated to 20
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (15min) (MB) (MB) JobID User ...
awk: cmd. line:261: (FILENAME=- FNR=1) fatal: division by zero attempted

pestat command doesn't list the job list

Hi OleHolmNielsen, I am using the pestat command, but this command doesn't show anything in job list, Could you help me?
For example, only in one node:

./pestat -n xula2503
Select only nodes in hostlist=xula2503
Hostname Partition Node Num_CPU CPUload Memsize Freemem Joblist
State Use/Tot (15min) (MB) (MB) JobID User ...
xula2503 xula2 idle 0 36 0.01 191000 186211

slurm MaxSubmitJobPerUser help needed !!!

Is there a way to restrict the user to submit a limited number of jobs for a particular partition is slurm ./ restrict user with maxjob and maxsubmitjobs per account in slurm using sacctmgr.

Incompatibilities between Dash and Bash

Hi.

I have troubles from time to time in running your scripts on Debian/Ubuntu machines.

The problem is related to the (in)famous switch of the /bin/sh symlink to /bin/dash instead of /bin/bash.

References:

I guess you are running the 'Slurm Tools' scripts on RHEL/CentOS, where /bin/sh
still points to Bash.

Would it be possible for you to consider switching the shabang prompt from #!/bin/sh to #!/usr/bin/env bash?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.