Git Product home page Git Product logo

Comments (4)

mr0re1 avatar mr0re1 commented on July 4, 2024

Hi @maxveliaminov , could you please share your blueprint (exclude any sensitive information)?

from hpc-toolkit.

maxveliaminov avatar maxveliaminov commented on July 4, 2024

Hi @mr0re1 here is one,

blueprint_name: palm-model

vars:
  project_id: <PROJECT_ID>
  deployment_name: <PROJECT_ID>
  region: <REGION>
  zone: <ZONE>
  machine_type: <MACHINE_TYPE>
  node_count_dynamic_max: <NODE_COUNT_DYNAMIC_MAX>
  slurm_cluster_name: palm1
  disable_public_ips: true
  enable_shielded_vm: true

deployment_groups:
  - group: primary
    modules:
      - id: network1
        source: modules/network/vpc
        kind: terraform
      - id: appsfs
        source: community/modules/file-system/nfs-server
        kind: terraform
        use:
          - network1
        settings:
          machine_type: n2-standard-2
          auto_delete_disk: true
          local_mounts: ['/apps']
      - id: spack
        source: community/modules/scripts/spack-install
        settings:
          install_dir: /apps/spack
          spack_url: https://github.com/spack/spack
          spack_ref: v0.19.1
          log_file: /apps/spack.log
          spack_cache_url:
            - mirror_name: <SPACK_CACHE_NAME>
              mirror_url: <SPACK_CACHE_URL>
          configs:
            - type: file
              scope: defaults
              content: |
                modules:
                  default:
                    tcl:
                      hash_length: 0
                      all:
                        conflict:
                          - '{name}'
                      projections:
                        all: '{name}/{version}-{compiler.name}-{compiler.version}'
          compilers:
            - [email protected]%[email protected] target=x86_64
          environments:
            - name: palm
              content: |
                spack:
                  definitions:
                  - compilers:
                    - [email protected]
                  - mpis:
                    - [email protected]
                  - python:
                    - [email protected]
                  - python_packages:
                    - [email protected]
                    - [email protected]
                    - [email protected]
                    - [email protected]
                  - packages:
                    - [email protected]
                    - [email protected]
                    - [email protected]
                    - [email protected]
                    - [email protected]
                  - mpi_packages:
                    - [email protected]
                    - [email protected]
                    - [email protected]
                    - [email protected]
                  specs:
                  - matrix:
                    - - $packages
                    - - $%compilers
                  - matrix:
                    - - $python
                    - - $%compilers
                  - matrix:
                    - - $python_packages
                    - - $%compilers
                    - - $^python
                  - matrix:
                    - - $mpis
                    - - $%compilers
                  - matrix:
                    - - $mpi_packages
                    - - $%compilers
                    - - $^mpis

      - id: spack_startup
        source: modules/scripts/startup-script
        kind: terraform
        use:
          - network1
        settings:
          runners:
            - $(appsfs.mount_runner)
            - $(spack.install_spack_deps_runner)
            - $(spack.install_spack_runner)
            - type: data
              destination: /apps/palm/palm-install.yaml
              content: |
              123
            - type: data
              destination: /apps/spack/activate-palm-env.sh
              content: |
              456
            - type: data
              destination: /apps/palm/palm-install.sh
              content: |
              789
            - type: shell
              content: sudo chmod -R 777 /apps
              destination: chmod-apps-dir.sh
            - type: shell
              content: 'shutdown -h now'
              destination: shutdown.sh

      - id: spack_builder
        source: modules/compute/vm-instance
        kind: terraform
        use:
          - network1
          - appsfs
          - spack_startup
        settings:
          name_prefix: spack-builder
      - id: homefs
        source: community/modules/file-system/nfs-server
        kind: terraform
        use:
          - network1
        settings:
          machine_type: n2-standard-2
          auto_delete_disk: true
          local_mounts: ['/home']
      - id: debug_node_group
        source: community/modules/compute/schedmd-slurm-gcp-v5-node-group
        use:
          - network1
          - homefs
          - appsfs
        settings:
          node_count_dynamic_max: <DEBUG_MAX_NODE_COUNT>

      - source: community/modules/compute/schedmd-slurm-gcp-v5-partition
        kind: terraform
        id: debug_partition
        use:
          - network1
          - homefs
          - appsfs
          - debug_node_group
        settings:
          is_default: true
          enable_shielded_vm: null
          machine_type: null
          node_count_dynamic_max: null
          partition_name: debug

      - id: compute_node_group
        source: community/modules/compute/schedmd-slurm-gcp-v5-node-group
        use:
          - network1
          - homefs
          - appsfs

      - source: community/modules/compute/schedmd-slurm-gcp-v5-partition
        kind: terraform
        id: compute_partition
        use:
          - network1
          - homefs
          - appsfs
          - compute_node_group
        settings:
          enable_shielded_vm: null
          machine_type: null
          node_count_dynamic_max: null
          partition_name: compute

      - source: community/modules/scheduler/schedmd-slurm-gcp-v5-controller
        kind: terraform
        id: slurm_controller
        use:
          - network1
          - debug_partition
          - compute_partition
          - homefs
          - appsfs
        settings:
          machine_type: n2-standard-8

      - source: community/modules/scheduler/schedmd-slurm-gcp-v5-login
        kind: terraform
        id: slurm_login
        use:
          - network1
          - homefs
          - appsfs
          - slurm_controller
        settings:
          machine_type: n2-standard-8
          disable_login_public_ips: true

from hpc-toolkit.

mr0re1 avatar mr0re1 commented on July 4, 2024

@maxveliaminov , we found a root cause, working on the fix, expect it to be fixed in develop early next week.

from hpc-toolkit.

mr0re1 avatar mr0re1 commented on July 4, 2024

@maxveliaminov, #1406 contains a fix for this problem. Could you please build ghpc from develop branch and confirm if it fixes your problem? Please re-open the issue if needed.

from hpc-toolkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.