Git Product home page Git Product logo

Comments (7)

nick-stroud avatar nick-stroud commented on July 24, 2024 1

I have added retries that will hopefully prevent failure on this in the future. I have also added an integration test for the chrome-remote-desktop which will help to keep this installation robust to changes. I am going to consider this bug fixed. Please re-open if you feel the fix does not address the bug.

from hpc-toolkit.

proppy avatar proppy commented on July 24, 2024

This seems like a bug in the setup process since installing the chromoting tool seems to be part of the startup scripts of the module:

- name: Download and configure CRD
ansible.builtin.get_url:
url: https://dl.google.com/linux/direct/chrome-remote-desktop_current_amd64.deb
dest: /tmp/chrome-remote-desktop_current_amd64.deb
mode: "0755"
- name: Install CRD
ansible.builtin.apt:
deb: /tmp/chrome-remote-desktop_current_amd64.deb
environment:
DEBIAN_FRONTEND: noninteractive

from hpc-toolkit.

proppy avatar proppy commented on July 24, 2024

There seems to be a conflict wrt to apt/dpkg locking in the startup script:

Mar 13 14:03:27 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:27 +0000 2023 Info [1778]: === start executing runner: configure-grid-drivers.yml ===
Mar 13 14:03:27 radlab-remote-desktop-0 systemd[1]: Started Daemon for generating UUIDs.
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: PLAY [Ensure nvidia grid drivers and other binaries are installed] *************
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:28 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: TASK [Gathering Facts] *********************************************************
Mar 13 14:03:28 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:28 radlab-remote-desktop-0 dbus-daemon[635]: message repeated 4 times: [ [system] Reloaded configuration]
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: ok: [localhost]
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: TASK [Get kernel release] ******************************************************
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: ok: [localhost]
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:29 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: TASK [Install binaries for GRID drivers] ***************************************
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Starting Update APT News...
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Starting Update the local ESM caches...
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: apt-news.service: Deactivated successfully.
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Finished Update APT News.
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: esm-cache.service: Deactivated successfully.
Mar 13 14:03:31 radlab-remote-desktop-0 systemd[1]: Finished Update the local ESM caches.
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: message repeated 2 times: [ [system] Reloaded configuration]
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: Unknown username "rtkit" in message bus configuration file
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: Unknown username "rtkit" in message bus configuration file
Mar 13 14:03:32 radlab-remote-desktop-0 dbus-daemon[635]: [system] Reloaded configuration
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: fatal: [localhost]: FAILED! => {"cache_update_time": 1678716211, "cache_updated": true, "changed": false, "msg": "'/usr/bin/apt-get -y -o \"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"       install 'gdebi-core' 'mesa-utils' 'gdm3'' failed: E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 3339 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "rc": 100, "stderr": "E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 3339 (apt-get)\nE: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?\n", "stderr_lines": ["E: Could not get lock /var/lib/dpkg/lock-frontend. It is held by process 3339 (apt-get)", "E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), is another process using it?"], "stdout": "", "stdout_lines": []}
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: PLAY RECAP *********************************************************************
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: localhost                  : ok=2    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script:
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Info [1778]: === configure-grid-drivers.yml finished with exit_code=2 ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Error [1778]: === execution of configure-grid-drivers.yml failed, exiting ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Info [1576]: === passed_startup_script.sh finished with exit_code=2 ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script: Mon Mar 13 14:03:33 +0000 2023 Error [1576]: === execution of passed_startup_script.sh failed, exiting ===
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: startup-script exit status 2
Mar 13 14:03:33 radlab-remote-desktop-0 google_metadata_script_runner[1570]: Finished running startup scripts.

from hpc-toolkit.

proppy avatar proppy commented on July 24, 2024

maybe there is an option to have ansible try to acquire a lock on the dpkg stuff before running the recipe?

from hpc-toolkit.

proppy avatar proppy commented on July 24, 2024

/cc @nick-stroud

from hpc-toolkit.

nick-stroud avatar nick-stroud commented on July 24, 2024

I suspect this is coming from a conflict with unattended-upgrades holding the lock. We have seen similar before with startup scripts on debian based images. Historically our approach has been to add retries.

from hpc-toolkit.

nick-stroud avatar nick-stroud commented on July 24, 2024

Released in v1.16.0.

from hpc-toolkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.