Git Product home page Git Product logo

stubl's Introduction

STUBL - SLURM Tools and UBiLities

STUBL is a collection of supplemental tools and utility scripts for SLURM.

docs/stubl_thumb.gif

REQUIREMENTS

A few commands included in STUBL require clush. For rhel based systems clush is in EPEL. Debian based systems can run apt-get install clush.

INSTALL

  • To install STUBL download the latest release:

    $ tar xvf stubl-0.x.x.tar.gz
    $ cd stubl-0.x.x
    
  • Copy the sample config file and edit to taste:

    $ cp conf/stubl.sample conf/stubl
    $ vim conf/stubl
    
  • Create empty NodeInfo.log file:

    $ cp log/NodeInfo.log.sample log/NodeInfo.log
    
  • (Optional) Build the slurmbf source:

    $ cd src/slurmbf
    $ make
    
  • (Optional) Build the slogs_helpers source:: $ cd bin/slogs_helpers $ g++ -c *.c *.cpp $ g++ -o tins tins.o $ g++ -o slogplus slogplus.o

  • (Optional) Populate slurmbf NodeInfo.log with clusters node information (RAM, disk size, scratch space, etc.). This will create a file log/NodeInfo.log and can be done by running the following command (can take a while):

    $ ./bin/GetNodeInfo.sh
    
  • Ensure stubl is in your path:

    $ export STUBL_HOME=/path/to/install/dir/stub-0.x.x
    $ export PATH=$STUBL_HOME/bin:$PATH
    

Summary of STUBL SLURM Commands

  • fisbatch

    Friendly Interactive SBATCH. A customized version of sbatch that provides a user-friendly interface to an interactive job with X11 forwarding enabled. It is analogous to the PBS "qsub -I -X" command. This code was adopted from srun.x11. (requires clush)

  • pbs2sbatch

    Converts PBS directives to equivalent SLURM SBATCH directives. Accommodates old UB CCR-specific PBS tags like IB1, IB2, etc.

  • pbs2slurm

    A script that attempts to convert PBS scripts into corresponding SBATCH scripts. It will convert PBS directives as well as PBS environment variables and will insert bash code to create a SLURM_NODEFILE that is consistent with the PBS_NODEFILE.

  • sausage

    Retrieves accounting/billing information for a user or account over some period of time.

  • scounts

    Computes the number of jobs completed by a user, group, or account.

  • sgetscr

    Retrieves the SLURM/SBATCH script and environment files for a job that is queued or running.

  • sjeff

    Determines the efficiency of one or more running jobs. Inefficient jobs are high- lighted in red text (requires clush).

  • slimits

    Retrieves SLURM account limits (e.g. max number of jobs) for the specified user.

  • slist

    Retrieves SLURM accounting and node information for a running or completed job (requires clush).

  • slogs

    Retrieves resource usage and accounting information for a user or list of users. For each job that was run after the given start date, the following information is gathered from the SLURM accounting logs:

    • Number of CPUS
    • Start Time
    • Elapsed Time
    • Amount of RAM Requested
    • Average RAM Used
    • Max RAM Used
  • slurmbf

    Analogous to the PBS "showbf -S" command.

  • snacct

    Retrieves SLURM accounting information for a given node and for a given period of time.

  • snodes

    A customized version of sinfo. Displays node information in an easy-to-interpet format. Filters can be applied to view (1) specific nodes, (2) nodes in a specific partition, or (3) nodes in a specifc state.

  • spinfo

    Show partition information for a cluster(s).

  • sqelp

    A customized version of squeue that only prints a double-quote if the information in a column is the same from row to row. Some users find this type of formatting easier to visually digest.

  • sqstat

    A customized version of squeue that produces output analogous to the PBS qstat and xqstat commands (requires clush).

  • sranks

    A command that lists the overall priorities and associated priority components of queued jobs in ascending order. Top-ranked jobs will be given priority by the scheduler but lower ranked jobs may get slotted in first if they fit into the scheduler's backfill window.

  • stimes

    Retrieves estimated starting times for queued jobs. All user-provided arguments are passed along to the squeue command.

  • suacct

    Retrieves SLURM accounting information for a given user's jobs for a given period of time.

  • sueff

    Determines the overall efficiency of the running jobs of one or more users. Users that are inefficient are highlighted in red text (requires clush).

  • yasqr

    Yet Another Squeue Replacement. Fixes squeue bugs in earlier versions of SLURM.

License

STUBL is released under the GNU General Public License ("GPL") Version 3.0. See the LICENSE file.

stubl's People

Contributors

aebruno avatar lsmatott avatar votreeshwaran avatar jbednasz avatar tonykew avatar pneerincx avatar dsajdak avatar

Stargazers

 avatar \/£@Ð avatar Altan Orhon avatar Raweeroj Thongdee avatar  avatar zw avatar  avatar  avatar  avatar Shengzhou Li avatar zengqingfu1442 avatar meghadri avatar Luka Leskovec avatar Eduardo R. A. avatar  avatar Adam avatar Jeffrey Tunison avatar  avatar  avatar LQ avatar Swix avatar  avatar kml avatar Alice Fage avatar yyykkgao avatar Ron Finn avatar Kyle Grammer avatar andi avatar Aidan Campbell avatar StarSkyZheng avatar Adrian Sevcenco avatar 齐泽文 avatar Peter Ruch avatar Ivan Pribec avatar  avatar Izaak "Zaak" Beekman avatar Adriano Amaricci avatar Jacob Danovitch avatar Alexey Nekrasov avatar Stuart Glenn avatar  avatar Shixin Zhang avatar Blair Bethwaite avatar Hassan Foroughi avatar Jorge Guerra avatar  avatar Martijn Kruiten avatar Sebastian Smith avatar Martin Ayling avatar Kelvin avatar Eric Martin avatar  avatar  avatar  avatar Javier Navarro avatar Adam Huffman avatar  avatar Tung N avatar Oleksandr Moskalenko avatar Darek Kedra avatar Paul R Johnston avatar Olli-Pekka Lehto avatar schlady avatar zz avatar Igor Nikonov avatar Mike Johnson avatar Paddy Doyle avatar Jamil Appa avatar

Watchers

 avatar Adrian Sevcenco avatar Trevor Cooper avatar Mike Johnson avatar James Cloos avatar Swix avatar Robert Romero avatar  avatar  avatar  avatar youngox avatar  avatar Artin avatar

stubl's Issues

sranks

The sranks command ranks the priority of jobs in the queue from highest to lowest. There section which list the COMPONENTS of the total priority.

RANK USER JOBID PRIORITY COMPONENTS_OF_TOTAL_PRIORITY_
==== ======== ====== ======== AGE FSHARE JOBSIZ PARTITION QOS

The TRES priority (%T) is not listed as a component, but is included in the total priority.

sqstat logic inconsistent

When calculating what is currently in use, sqstat is incorrectly listing the number of running jobs and total number of jobs. However, the current core usage and current node usage numbers are correct. Example:

Partition   : Summary of current jobs
======================================================
part1       :    1 jobs (    0 running ,    1 queued )

Partition   : Summary of current core usage
===============================================================
part1      :   56 cores (   29 in use,   27 idle,    0 other )

Partition   : Summary of current node usage
===================================================
part1      :    1 nodes (    1 in use,    0 idle/down )

Slurm reports:

$ squeue -M faculty -p part1
CLUSTER: faculty
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          13959167     part1 Analysis      user1  PD       0:00      1 (Resources)
          13948810     part1 EMMA mod  user2  R 3-01:49:43      1 cpn-m24-13

sueff truncates long usernames

usernames longer than 10 characters get cut off by slurm. Need to expand slurm command output in order for sueff to see full username

snodes not found, partition not found

May I ask why the partition is not detected by fisbatch?


[root@rocks7 stubl]# ./bin/fisbatch
./bin/fisbatch: line 98: snodes: command not found
./bin/fisbatch: line 99: snodes: command not found
There are no partitions named RUBY in the jupiter cluster!
[root@rocks7 stubl]# scontrol show partition RUBY
PartitionName=RUBY
   AllowGroups=ALL AllowAccounts=y8 AllowQos=ALL
   AllocNodes=compute-0-[0-6],rocks7 Default=NO QoS=N/A
   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=compute-0-[5-6]
   PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=112 TotalNodes=2 SelectTypeParameters=NONE
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED


Bug in slogs

There's a bug in the parameter checking code on this line

This checks to see if the user passed in a date and a user list or just a user list. The issue is that when the user provides dates in this format 030119 for example, this gets passed to the id command. If there exists a user on the system with a uid of 30119 for example, this will return true and mistakenly be taken as a user name and passed to sacct.

For example, suppose a user exists on the system:

$ id 030119
uid=30119(janedoe) gid=12345(janedoe)

This call to slogs will fail:

$ slogs 030119 testuser -X --accounts=testaccount
Retrieving accounting data for user 030119 ...
Invalid time specification (pos=0): testuser

The short term workaround is to just use a different date format like so:

$ slogs '2019-03-01' testuser -X --accounts=testaccount
Retrieving accounting data for user testuser ...
               JobID      User      NCPUS               Start    Elapsed     ReqMem     AveRSS     MaxRSS 
-------------------- --------- ---------- ------------------- ---------- ---------- ---------- ----------

But the code should fixed to better test input parameters and be insensitive to date formats.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.