Git Product home page Git Product logo

Comments (4)

misdoro avatar misdoro commented on June 26, 2024 1

@misdoro does this reproduce? Could you please share which interfaces are available after boot and which drivers are loaded?

For us it is 100% reproducible on an AMI image we are building internally, when started on t3 class aws instances.

the only non lo network adapter is managed by ena driver, and it gets recognized by the kernel a few seconds after cloud-init local is started.

For the moment we've implemented a work-around to delay the cloud-init local after the network adapter is recognized,
but I'm wondering if cloud-init could have a more official way to handle network adapters that appear late during the boot process.

The work-around in question:
/etc/systemd/system/cloud-init-local.service.d/10-wait-for-net-device.conf

# cloud-init-local must wait for at least one network interface device to exist
# before attempting to download EC2 instance metadata.
#
# These systemd unit directives implement this policy along with
# /etc/udev/rules.d/10-ec2imds.rules

[Unit]
Requires=dev-ec2imds.device
After=dev-ec2imds.device

/etc/udev/rules.d/10-ec2imds.rules

# cloud-init-local must wait for at least one network interface device to exist
# before attempting to download EC2 instance metadata.
#
# These udev rules implement this policy along with
# /etc/systemd/system/cloud-init.local.service.d/10-wait-for-net-device.conf

ACTION!="remove", SUBSYSTEM=="net", KERNEL!="lo", DRIVERS=="ena|vif", TAG+="systemd", ENV{SYSTEMD_ALIAS}+="/dev/ec2imds"

from cloud-init.

holmanb avatar holmanb commented on June 26, 2024

@misdoro does this reproduce? Could you please share which interfaces are available after boot and which drivers are loaded?

from cloud-init.

holmanb avatar holmanb commented on June 26, 2024

For us it is 100% reproducible on an AMI image we are building internally, when started on t3 class aws instances.

the only non lo network adapter is managed by ena driver, and it gets recognized by the kernel a few seconds after cloud-init local is started.

Good to know, thank you. How would one reproduce this? How can you ensure that only an ena interface is available?

For the moment we've implemented a work-around to delay the cloud-init local after the network adapter is recognized,
but I'm wondering if cloud-init could have a more official way to handle network adapters that appear late during the boot process.

Cloud-init should handle this better. Can you please share more of the log? The whole cloud-init.log would be best, but if you feel the need to redact, if there is a line like the following in the log, it would be good to know what it says:

2024-05-14 14:50:46,817 - stages.py[DEBUG]: applying net config names for {'version': 1, 'config': [{'type': 'physical', 'name': 'enp5s0', 'subnets': [{'type': 'dhcp', 'control': 'auto'}]}]}

from cloud-init.

holmanb avatar holmanb commented on June 26, 2024

@misdoro Thanks again for reporting. If you can share any additional data about your image (more complete logs, reproducer), that would be extremely helpful.

For EC2, and probably other datasources as well, cloud-init-local.service needs to wait until at least one interface is available prior to proceeding into ephemeral network setup.

Current state

Cloud-init already does something similar, but with a different intent and outcome. Cloud-init currently polls on configured interfaces when a network configuration is available and waits on those configured interfaces to exist. Once these are available, cloud-init manually does interface rename.

Problems

  1. Interface rename shouldn't actually be required in many cases (netplan, systemd, and friends are capable of doing rename). This logic predates current network backends.

  2. The Local service doesn't wait for physical devices to exist before attempting to bring up an ephemeral interface. This seems to work when kernel drivers are loaded by initramfs as a module or built into the kernel.

Proposed fix

  1. short term: add a poll for a single interface[1][2]

  2. long term: only do interface rename in renderers which require it (possibly eni, ifconfig, sysconfig?). Initially we should retain current functionality for untested renderers and potentially add an opt-out flag to allow testing the different network back ends for working rename support.

[1] not wanted for LXD, None, NoCloud, and any other datasources which do not require an interface to be available in Local stage
[2] udevadm settle causes unnecessary waiting. Polling at some frequency would probably be more appropriate.

from cloud-init.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.