Comments (6)
Can confirm that #1661 introduced a regression with accelerated optimized machines (e.g. a2). We are looking into a fix.
from hpc-toolkit.
I think it may have been introduced in #1661
from hpc-toolkit.
This looks like it's a problem beyond just #1661 - the regression has made it into the base image projects/schedmd-slurm-public/global/images/slurm-gcp-5-7-debian-11-1691089583
(latest for image family slurm-gcp-5-7-debian-11
) so appears even when deploying from an older version of ghpc (e.g. 1.19.1
)
from hpc-toolkit.
The previous image in the family, slurm-gcp-5-7-debian-11-1686755499
, does not have this issue. It also has none of the fixes released with #1661.
from hpc-toolkit.
Thank you for the base image update - that has definitely mitigated the issue for us
from hpc-toolkit.
Closing the issue, feel free to reopen though.
from hpc-toolkit.
Related Issues (20)
- Update ml-slurm blueprint to use updated base image for schedmd debian 11 HOT 1
- Unable to dynamically modify the number of nodes in a slurm cluster HOT 2
- Slurm nodes with hybrid controller module unable to configure correctly HOT 2
- error when use packer to build image in ml-slurm HOT 2
- Unable to configure Slurm due to failure to mount filestore HOT 5
- Feature request: support `hashicorp/google` and `hashicorp/google-beta` v5
- private_vpc_connection is not unique per cluster in slurm-sql module HOT 4
- ERROR: failed to sync instances when issuing `scontrol reboot` HOT 4
- Example of startup script with cluster without vm-instance? HOT 2
- Broken link HOT 1
- PMIx MPI support in Slurm HOT 16
- IP space of [gcp project subnet] is exhausted when deploying a GCP Slurm cluster HOT 2
- Packer custom image does not use specified service account email. HOT 3
- Upgrade to Ops Agent fails HOT 6
- HTCondor tutorial: add cloudresourcemanager.googleapis.com to the list of services to enable HOT 8
- Fail to consume shared reservations HOT 4
- No CUDA devices visible with A2 instances HOT 2
- Missing set credentials on fs creation triggered by validator HOT 5
- Rocky image failing due to 404 on lustre-client HOT 5
- Using a newer version of Terraform can lead to controller replacement on reconfigure for Slurm GCP v6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hpc-toolkit.