Comments (7)
Hi Federico,
As far as we know, that is a filestore issue. I will investigate that and come back to you shortly.
from hpc-toolkit.
Hi Federico,
I created a simple blueprint to try to reproduce the error and, here, the instances seem to be created in parallel.
I also had the impression of seeing this sequential creation before, could it have been circumstantial (based on the number of things being already created)? Could you try again with -parallelism=20 during the terraform apply?
FYI, the yaml I used for test now was:
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
---
blueprint_name: filestore-test
vars:
project_id: ## Set GCP Project ID Here ##
deployment_name: filestore-test
region: us-central1
zone: us-central1-c
deployment_groups:
- group: primary
modules:
- id: network1
source: modules/network/vpc
- id: homefs
source: modules/file-system/filestore
use: [network1]
settings:
local_mount: /home
- id: homefs2
source: modules/file-system/filestore
use: [network1]
settings:
local_mount: /home2
- id: homefs3
source: modules/file-system/filestore
use: [network1]
settings:
local_mount: /home3
- id: homefs4
source: modules/file-system/filestore
use: [network1]
settings:
local_mount: /home4
- id: compute
source: modules/compute/vm-instance
kind: terraform
use: [network1]
settings:
name_prefix: vm
instance_count: 1
And the output seems to be parallel.
What do you think?
from hpc-toolkit.
Hey Carlos,
The output is definitely in parallel but on GCP the Filestores go up one at the time.
The following test has just been done on 1.4.1
from hpc-toolkit.
We are still working to track down if the observed behavior is a result of the terraform provider implementation or google cloud implementation.
from hpc-toolkit.
I found the root cause of this serialization to be described here: hashicorp/terraform-provider-google#9007
I am pursuing both the terraform and the internal Filestore team to see if we can change back the implementation in the provider and fix the underlying issue. This will take time.
from hpc-toolkit.
GoogleCloudPlatform/magic-modules#6672 should address the issue.
from hpc-toolkit.
v4.41.0
of the Terraform provider for Google Cloud removes the constraint for serial creation of Filestore instances. Any new blueprint deployment should automatically get this release of the provider. You can upgrade an existing environment by running terraform init -upgrade
in an existing Terraform folder.
from hpc-toolkit.
Related Issues (20)
- SLURM 1.20 deployed and having node creation error HOT 40
- Slurm setup fails in deployed blueprint - possible error getting metadata HOT 7
- HPC toolkit no longer works with a2 instances HOT 6
- Partition a208 misconfigured in hpc-interprise-slurm.yaml HOT 4
- Update ml-slurm blueprint to use updated base image for schedmd debian 11 HOT 1
- Unable to dynamically modify the number of nodes in a slurm cluster HOT 2
- Slurm nodes with hybrid controller module unable to configure correctly HOT 2
- error when use packer to build image in ml-slurm HOT 2
- Unable to configure Slurm due to failure to mount filestore HOT 5
- Feature request: support `hashicorp/google` and `hashicorp/google-beta` v5
- private_vpc_connection is not unique per cluster in slurm-sql module HOT 4
- ERROR: failed to sync instances when issuing `scontrol reboot` HOT 4
- Example of startup script with cluster without vm-instance? HOT 2
- Broken link HOT 1
- PMIx MPI support in Slurm HOT 16
- IP space of [gcp project subnet] is exhausted when deploying a GCP Slurm cluster HOT 2
- Packer custom image does not use specified service account email. HOT 3
- Upgrade to Ops Agent fails HOT 6
- HTCondor tutorial: add cloudresourcemanager.googleapis.com to the list of services to enable HOT 8
- Fail to consume shared reservations HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hpc-toolkit.