bihealth / bih-cluster Goto Github PK

View Code? Open in Web Editor NEW

16.0 6.0 37.0 29.71 MB

BIH Cluster Wiki

License: Other

CSS 1.38% Awk 78.85% HTML 17.59% Makefile 2.18%

bih-cluster's Introduction

BIH HPC Cluster Documentation

You can find the built documentation here:

https://hpc-docs.cubi.bihealth.org

Building the Documentation Locally

Prerequisites

host:~$ sudo pip install pipenv  # maybe pip3 install or python -m pip install pipenv

Clone

host:~$ git clone xxx
host:~$ cd bih-cluster
host:bih-cluster$ pipenv install
host:bih-cluster$ pipenv shell
(bih-cluster) host:bih-cluster$ cd bih-cluster
(bih-cluster) host:bih-cluster$ mkdocs serve

bih-cluster's People

Contributors

Stargazers

Watchers

bih-cluster's Issues

Document that --nodes=1 forces allocation on one node

The scratch file removal time description isn't consistent

https://bihealth.github.io/bih-cluster/storage/storage-locations/ has different numbers for when scratch files are deleted.

will be automatically removed four weeks after creation.
will be removed within 4 weeks
will be removed 2 weeks after their creation
data on this volume is not to be kept for longer than 2 weeks

I am happy to fix the docs, if someone tells me what the right number is.

Create slurm-drmaa profile for BIH HPC

Start off here:

https://github.com/Snakemake-Profiles/slurm

Note well the limitations of the syntax allowed for slurm-drmaa when compared to the sbatch command.

Document how to use Snakemake profiles on the cluster

Cluster configuration has been deprecated.

Conda init required after conda installation

Hello,

As a first-time BIH cluster user, I was following the tutorial how to install conda (https://bihealth.github.io/bih-cluster/best-practice/software-installation-with-conda/). I set up a new environment and tried to activate (conda activate <environment_name>) but got an error message:

CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.                    
To initialize your shell, run                                                                                 
                                                                                                              
    $ conda init <SHELL_NAME>                                                                                 
                                                                                                              
Currently supported shells are:                                                                               
  - bash                                                                                                      
  - fish                                                                                                      
  - tcsh                                                                                                      
  - xonsh                                                                                                     
  - zsh                                                                                                       
  - powershell                                                                                                
                                                                                                              
See 'conda init --help' for more information and options.                                                     
                                                                                                              
IMPORTANT: You may need to close and restart your shell after running 'conda init'.

Indeed, running conda init, closing and restarting a shell fixed this and I can now activate the conda environment. However, this may be something to add to the software installation section?

update nodes names in docs

In a lot of examples, nodes have old hostnames such as med201 or similar.

Those should be updated to reflect the current naming scheme.

Charité VPN Zusatzantrag B - recommended or required?

Currently, the docs say that at one point that Zusatzantrag B would be recommended, but in the troubleshooting section it is listed as a requirement.

I don't understand when Zusatzantrag B would not be necessary. Wouldn't it be better if everyone just got Zusatzantrag B for cluster access?

tagging @stolpeo

Getting BEAST2 to see the BEAGLE libraries and GPU

Hi. I would like to be able to run BEAST2 with the BEAGLE libraries on GPU nodes. Are you aware of anyone who has done this successfully?

BEAST (and BEAST2) aren't terribly helpful in explaining how to do this. I have the BEAGLE libraries installed (via conda). If I put the dir of the libraries into env variables BEAGLE_EXTRA_LIBS and LD_LIBRARY_PATH and LIBRARY_PATH it doesn't help. That's the approach I've used before (on a different Linux cluster).

The last time I solved this I had to compile BEAST2 myself with a particular version of gcc and also build BEAGLE with the same version. There were also some modules I loaded (but I think these just set things like LD_LIBRARY_PATH and LIBRARY_PATH).

You can run BEAST2 as beast -beagle -beagle_GPU -beagle_info (and maybe -beagle_order ...?) and it if it manages to find the GPU, that info will appear at the end of the output. But when I try the above, all I see is the regular CPU.

I am in fact doing this on a GPU node (allocated by SLURM) :-)

Thanks for any help / thoughts. I've done this a couple of times in the past and it's always a hassle... but running BEAST2 on the GPU easily makes up for it. Maybe someone else in the BIH community is already doing this - would you know?

Terry

There is no partition named 'default'

https://bihealth.github.io/bih-cluster/overview/job-scheduler/ describes a partition named default, but that does not currently exist:

$ sinfo -O PartitionName
PARTITION           
debug               
medium              
long                
critical            
highmem             
gpu                 
mpi

Document HPC 4 Clinic

BWA path changed 0.7.15 => 0.7.17

In Tutorial

singularity has no command bash

for bih-cluster/bih-cluster/docs/how-to/software/singularity.md there is often written singularity bash but singularity has no command bash. I think it is shell but I am not an expert.

singularity bash docker://godlovedc/lolcow
Usage:
  singularity [global options...] <command>

Available Commands:
  build       Build a Singularity image
  cache       Manage the local cache
  capability  Manage Linux capabilities for users and groups
  config      Manage various singularity configuration (root user only)
  delete      Deletes requested image from the library
  exec        Run a command within a container
  inspect     Show metadata for an image
  instance    Manage containers running as services
  key         Manage OpenPGP keys
  oci         Manage OCI containers
  plugin      Manage Singularity plugins
  pull        Pull an image from a URI
  push        Upload image to the provided URI
  remote      Manage singularity remote endpoints
  run         Run the user-defined default command within a container
  run-help    Show the user-defined help for an image
  search      Search a Container Library for images
  shell       Run a shell within a container
  sif         siftool is a program for Singularity Image Format (SIF) file manipulation
  sign        Attach a cryptographic signature to an image
  test        Run the user-defined tests within a container
  verify      Verify cryptographic signatures attached to an image
  version     Show the version for Singularit

Document login node change

What is the license for this repository?

I think it's excellent documentation, but I can't find the license to reuse it elsewhere. Could you push the license file?
Thank you!

Document mamba

generally where software/conda is explained
in the tutorial

Document -p/--partition more clearly

Also, remove mentioning of --wckey from rosetta stone.

Document difference for sshfs mac/linux

Mac requires -o defer_permissions while linux should not have an -o parameters to sshfs (also -o follow_links should not be present on Linux).

Document host key change on reinstall

The gist: it's not harmful and with the Slurm cluster it won't happen again as all hosts have the same key distributed via xCat.

Document creation of symlinks from home to work for OnDemand

An example on how to resolve is here:

https://bihealth.github.io/bih-cluster/help/faq/#help-im-getting-a-quota-warning-email

Fix tutorial: conda activate

Instead of using source activate, we should use conda activate.

iRODS access & downloading collections

Hello & huge thx for your amazing work!

I would like to access a specific iRODS collection and download the data for further usage.
However, all my attempts have failed so far and I am not sure what the reason for the errors could be.
(I am also not so experienced with iRODS, so probably some mistakes are also on my side...)

First of all, I have recently been added to 2 projects on the SODAR system, and now I want to access the data.
However, using icd does apparently not work. Every time I use it, I end up in my home directory again.
Here's one example with the example project:

[kuechleo_c@hpc-transfer-1 2023-10-10]$ ils -l /sodarZone/projects/3d/3da44802-6151-4bc5-9f32-fa24d53896ad/sample_data/study_e1e52282-56e7-4b2d-ab4c-eb49da855c3a/assay_61dbfeb0-fa5f-49f6-a1cf-d35866b4d380/
/sodarZone/projects/3d/3da44802-6151-4bc5-9f32-fa24d53896ad/sample_data/study_e1e52282-56e7-4b2d-ab4c-eb49da855c3a/assay_61dbfeb0-fa5f-49f6-a1cf-d35866b4d380:

[kuechleo_c@hpc-transfer-1 2023-10-10]$ icd /sodarZone/projects/3d/3da44802-6151-4bc5-9f32-fa24d53896ad/sample_data/study_e1e52282-56e7-4b2d-ab4c-eb49da855c3a/assay_61dbfeb0-fa5f-49f6-a1cf-d35866b4d380/

[kuechleo_c@hpc-transfer-1 2023-10-10]$ ipwd
/sodarZone/home/kuechleo@CHARITE

When I try to download a collection I get an error, saying that the file does not exist. However, the collection exists, as was shown before.

[kuechleo_c@hpc-transfer-1 2023-10-10]$ ibun -fc test.tar /sodarZone/projects/3d/3da44802-6151-4bc5-9f32-fa24d53896ad/sample_data/study_e1e52282-56e7-4b2d-ab4c-eb49da855c3a/assay_61dbfeb0-fa5f-49f6-a1cf-d35866b4d380/

remote addresses: 172.16.96.169 ERROR: bunUtil: opr error for /sodarZone/projects/3d/3da44802-6151-4bc5-9f32-fa24d53896ad/sample_data/study_e1e52282-56e7-4b2d-ab4c-eb49da855c3a/assay_61dbfeb0-fa5f-49f6-a1cf-d35866b4d380, 
status = -520002 status = -520002 UNIX_FILE_MKDIR_ERR, No such file or directory

When I try to download a bigger folder per iget the process gets killed. (I guess, it's only allowed to transfer TAR-files directly when they get to big?)

[kuechleo_c@hpc-transfer-1 scratch]$ iget -Kr /sodarZone/projects/4c/4c6bee14-f5d3-4734-aba4-a7cfc13276c4/sample_data/study_2a9d8ca5-b4c5-465d-a9ec-25244d0c8889/assay_51f85bf7-dca5-4bad-889e-70e972f8fba8/
/usr/bin/iget: Zeile 3: 4055561 Killed                  apptainer run /opt/irods/singularity-images/irods-icommands-4.2.11.sif iget $@

Did I get something wrong with set-up of iRODS on the BIH-cluster?
Or is there some other reason, why I couldn't make it work so far?

Can I specify a set of cluster nodes?

Hi,
I am compiling code in C++ using a highly CPU level optimized (SIMD vectorized) API. Therefore, my binary is not guaranteed to work on all different cluster nodes due to varying CPU architectures. Is there a way for me to restrict slurm/snakemake/drmaa to only use a set of given cluster nodes?

Describe problems with GPU hogging

Add how-to tunnel SFTP through the jail node

The easiest way to handle multihop ssh connections is to set up a config

$ cat ~/.ssh/config
Host mdc-login
     User USER
     HostName ssh1.mdc-berlin.de
     IdentityFile ~/.ssh/id_rsa
     Port 22

Host bih_transfer
     User USER_M
     HostName transfer-1.research.hpc.bihealth.org
     IdentityFile   ~/.ssh/id_rsa
     ProxyCommand ssh mdc-login -W %h:%p
     Port 22

and then you can just do:
$ scp file bih_transfer:

Ask users to use --cpus-per-task=CORES instead of --ntasks=CORES

Properly document --export behaviour

The default is ALL, also in slurm-drmaa.
This deviates from the SGE behaviour in that srun will also keep your environment including $HOSTNAME (1).

... more?

We need to add a note to the tutorial, Slurm quickstart, and the SGE migration pages in the docs.

Errors (or lack of explanation?) in tutorial section 4

Hi,
I (and others) think that there is something wrong with section 4 of the first-steps tutorial (https://bihealth.github.io/bih-cluster/first-steps/episode-4/):

The snakemake -j 2 parameter tells Snakemake to use 2 cores,
but the drmaa -n 8 tells Slurm to give 8 cores to the run. Why would you do that if you tell Snakemake to only use 2 cores?
The tutorial further states that the job will use a total of 40 GB.
But according to the command line mem-per-core is 1000 MB and n is set to 8.
Slurm thus would reserve 1000 MB * 8 (mem-per-core * n) = 8 GB and not 40 GB.

Connecting documentation should highlight meaning of key passphrase

Multiple users get confused by this.

Fix documentation of cluster volumes

wrong here

https://bihealth.github.io/bih-cluster/overview/storage/#cluster-volumes-and-locations

correct here

https://bihealth.github.io/bih-cluster/overview/for-the-impatient/#locations-on-the-cluster

snakemake provile cubi-v1 not documented

default memory, format definition etc. is not documented for snakemake profile cubi-v1.

howto: jupyter has old nodenames in ssh commands

https://hpc-docs.cubi.bihealth.org/how-to/software/jupyter/#setup-and-running-jupyter-on-the-cluster

workstation:~$ ssh -N -f -L 127.0.0.1:8888:localhost:${PORT} med0${NODE}

would not work.

bihealth / bih-cluster Goto Github PK

bih-cluster's Introduction

BIH HPC Cluster Documentation

Building the Documentation Locally

bih-cluster's People

Contributors

Stargazers

Watchers

Forkers

bih-cluster's Issues

Recommend Projects

Recommend Topics

Recommend Org