What happened? I have a playground cluster setup with QEMEU/KVM. W

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Time Sync problems after restore VM/Node from snapshot about cri-o HOT 14 CLOSED

Jeansen commented on September 28, 2024

Time Sync problems after restore VM/Node from snapshot

from cri-o.

Comments (14)

sohankunkerkar commented on September 28, 2024 2

I see the problem here. I think this behavior is expected because containers inherit their initial time from the host system when they are created. However, once running, they maintain their own internal clock, which may drift from the host's time over time. When you restore a snapshot, the host's time gets reset, but the container's internal clock doesn't automatically reset. One thing we can do here is to add a watcher when CRI-O starts a container and look for changes to the host system time, and adjust the container's time accordingly.

from cri-o.

Jeansen commented on September 28, 2024 1

I see. So Containers create their own time namespace .... Thanks for the list of reference :-)

from cri-o.

sohankunkerkar commented on September 28, 2024

@Jeansen Thanks for reporting this issue. I was wondering if setting the timezone in CRI-O config will help in this case.

from cri-o.

Jeansen commented on September 28, 2024

@sohankunkerkar I don't think so. First, if I kill the Pods, the new ones pick up the right time. Second, as it is written in the timezone setting, it would pick up the hosts settings if not otherwise specified. And to elaborate on this a bit more. I reset my VMs on 12th. And a couple days later cert-manager complained about invalid certificates with a time difference of 4 days! After I refreshed the deployment, all was well again. So, no timezone issue here.

from cri-o.

kwilczynski commented on September 28, 2024

This is a known issue that has been affecting any container users who suspend or hibernate their machines, mainly desktop and workstation users, whether they use Linux, macOS, or Windows.

Most of our users run CRI-O on hosts that almost never are suspended or hibernated. I think Podman and Docker, for their desktop-targeting applications, solved this for the users—at least Docker did in the past.

Fix, or a workaround for the time being can be administered as follows:

Ensure chrony or ntpd updates time on the virtual machine host
Ensure chrony or ntpd updates time within the virtual machine (so container host)
Run hwclock -s inside, assuming that the binary is installed

You can turn this into a one-liner in the shell that iterates over running containers and runs crictl exec (with --sync) inside. It has to be run as a privileged user.

There are also smaller services to synchronise time, like the bespoke offering from linuxkit per:

host-timesync-daemon

There is also the systemd way and projects that use HTTP as time sources, etc.

Do let me know if the hwclock -s worked for you.

In the long term, as @sohankunkerkar said, we would need to take care of these within CRI-O.

An interesting problem that has avoided us for a long time, indeed. 😄

from cri-o.

Jeansen commented on September 28, 2024

Guys, thanks for the quick replies. Much appreciated! :-) Actually, I wasn't aware that only the initial time was set in a container. I thought this is taken every time from the host (since containers share the kernel...). But, obviously this is not the case.
I agree, this is some sort of a "special" case for home-lab guys, like myself. On the other hand, at work we do have all Nodes running in VMs and those VMs have snapshots, too. Only difference is, snapshots there do not keep the memory state, like in my case.
I also tried of simply draining each Node and rebooting it. Anyway, I'll try your tips, @kwilczynski . Should I close issue then?

from cri-o.

Jeansen commented on September 28, 2024

So, hwclock -s is not working for me, especially not for cert-manager Pods. I think I'll just iterate through the Nodes, drain them and reboot. But at some point in the future I'd definitely be happy to have some sort of a watch available in CRI-O.

from cri-o.

kwilczynski commented on September 28, 2024

@Jeansen, not working as in the time hasn't been updated, or there was an error?

from cri-o.

Jeansen commented on September 28, 2024

@kwilczynski Not working as in there is no shell in the Container (probably distroless) and no hwclock binary.

from cri-o.

kwilczynski commented on September 28, 2024

@Jeansen, ah yes. Good point! These have nothing inside aside from the binary and possibly time zone data or certificates.

Also, I think this approach wouldn't work for CRI-O. Back to the drawing board...

But yes, the problem is the lack of the clock being updated once the container is started.

from cri-o.

kwilczynski commented on September 28, 2024

Some bits for reference:

from cri-o.

kwilczynski commented on September 28, 2024

@Jeansen, it's more of a matter of CRI-O not using time namespaces as of yet, even though crun and runc added support some time ago. We never seen this as needed or as a priority. That said, I am not sure if containerd also added support.

from cri-o.

kwilczynski commented on September 28, 2024

@Jeansen, looking into this a little more in-depth.

Sadly, the new time namespace's limitations make this not really feasible. It probably would explain why there has yet to be a wider adoption of this new namespace.

Per the man page (time_namespaces(7)):

In a new time namespace that has had no member processes, the
clock offsets can be modified by writing newline-terminated
records of the same form to the timens_offsets file. The file
can be written to multiple times, but after the first process has
been created in or has entered the namespace, write(2)s on this
file fail with the error EACCES.

This means that updating the time-related offsets would not be possible with the processes running within a namespace.

As such, there isn't a better option now than restarting Pods.

from cri-o.

Jeansen commented on September 28, 2024

@kwilczynski Oh, too bad. Thank you very much for the details and your invested time. I'll simply have to be a bit more patient then, when I reset my cluster and do rolling restarts in addition ...

from cri-o.

Time Sync problems after restore VM/Node from snapshot about cri-o HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent