Git Product home page Git Product logo

Comments (15)

kcwitt avatar kcwitt commented on July 22, 2024 1

@ahjohannessen I don't know whether ZFS installed this way would survive a CoreOS update. I suspect that sometimes it would (ie. if the kernel is not updated), but that if there kernel is updated it may not. I need ZFS support, so for now, I have CoreOS automatic updates turned off.

I looked into how to get this script to handle CoreOS updates directly, but my conclusion (although I may be wrong) is that it creates too many "moving parts". CoreOS is not supposed to have moving parts. I think the point of CoreOS is that it doesn't have moving parts. I started going down the path of trying to handle updates and quickly realized the logical extension to that would be using CoreOS as a core for yet another linux distribution.

I recently had to update the script because the original version installed the CoreOS development overlay from binaries, but when I tried to install it on 1520 the binaries weren't available, so I had to update the script to install the CoreOS development overlay from source instead. This further convinced me that anything other than getting ZFS baked directly into CoreOS would not be reliable (since it would not be included in whatever pre-release testing is done).

My thinking right now is the only right path forward is to get CoreOS to bake this in.

There is a page at https://coreos.com/os/docs/latest/sdk-modifying-coreos.html about contributing to CoreOS, and I looked into it a little, but quickly felt that I was falling down another rabbit hole.

Having said that, if anyone has actually successfully submitted a patch for CoreOS and knows what would be required I am happy to work with them to try and get this baked in. For instance, I don't want to spend a lot of time on this, and then find out that CoreOS won't accept the patch because there is some validation procedure or something that the upstream ZFS software must pass before being included (for instance). Also, the ZFS software includes a set of unit tests (this script installs the tests so they can be run manually, but the script itself doesn't run them). I assume that there is some sort of continuous integration testing that each CoreOS release goes through before it is issued, and that these unit tests should be part of that.

from corezfs.

kcwitt avatar kcwitt commented on July 22, 2024

I looked into trying to automating the update, but ultimately decided it was not a good use of time, and that it would be better to lobby CoreOS to include ZFS directly.

My logic is that CoreOS is intentionally designed to be "locked-down", and externally adding ZFS completely breaks that and creates a non-standard installation. It is also possible that CoreOS makes an upstream change that breaks the script.

So, my strategy is to make ZFS available for CoreOS so people could try it in the hope that it would gain enough attention that CoreOS would bake it in (then updates won't be a problem at all), and to demonstrate that it shouldn't be hard for CoreOS to bake it in.

Having said that, if you have some ideas for how to manage the updates I am keen to pursue it (but still think the ultimate goal should be for CoreOS to "bake it in".

I am especially interested in ZFS in the wake of some of the recent encryption malware because I think (not 100% sure though) that with a COW system like ZFS which takes snapshots, it would be trivial to revert to the last snapshot in the event of an issue (AFAIK it wouldn't be possible for malware to touch the snapshots unless it was specifically targeting ZFS filesystems).

from corezfs.

CoRfr avatar CoRfr commented on July 22, 2024

Hi,
I'm guilty of not looking too much into the was this script works, but have you look into https://github.com/coreos/torcx?

It's currently available on stable and works quite well as it provides docker.
For instance I used it to move docker to 17.06 (https://github.com/CoRfr/torcx-docker-binaries).
I didn't had that much success to provide xen on CoreOS (had to build my own version as it brings quite a lot of dependencies such as python or Perl...).
However I wonder if it could be used for Zfs.

from corezfs.

edude03 avatar edude03 commented on July 22, 2024

Good thinking @CoRfr - it looks like doing torcx would be pretty straightforward, it looks like it's simply a mapping of what files you have, and where you want them. I imagine you'd just add the json and the binaries via ignition or cloud-init and you'd be off to the races. That said, the challenge seems to be actually building the files, it seems like we'd need a build farm to make a build for each coreos release

from corezfs.

CoRfr avatar CoRfr commented on July 22, 2024

Oh right, I guess this script creates the package directly on the host, but it certainly adds some overhead on first boot/version change and if you were to distribute this outside of your host it might be a headache.

I wonder how hard it would be to leverage free tools to create some delivery pipeline without paying a cent ...
I'm thinking about:

from corezfs.

ahjohannessen avatar ahjohannessen commented on July 22, 2024

I think that built-in support of zfs should be in Container Linux. Where is the best place to tell CoreOS people about this? Forums or github?

from corezfs.

ahjohannessen avatar ahjohannessen commented on July 22, 2024

I posted my thoughts about ZFS support here: https://groups.google.com/forum/m/#!topic/coreos-user/ImR-LCkOOKI

Really great work @kcwitt 👍

How do you guys test that Container Linux updates work with installed ZFS? Virtual env that is similar to production?

Also, how do you go about updating ZFS using this approach?

from corezfs.

ahjohannessen avatar ahjohannessen commented on July 22, 2024

@kcwitt I understand your point of view on this. I think that the first thing moving further in getting ZFS baked into Container Linux is approval from someone making those calls. If no go, I think setting up a CI to build binaries for, at least stable, Container Linux is the next step. WDYT?

Turning auto-update off is a bit against the philosophy of Container Linux :) Some sort of hook-in to upgrades that ensures compatible binaries of ZFS matching a release of Container Linux would be sensible.

from corezfs.

ahjohannessen avatar ahjohannessen commented on July 22, 2024

All of you that wish for native ZFS support in Container Linux, please share your thoughts here https://groups.google.com/forum/m/#!topic/coreos-user/ImR-LCkOOKI

from corezfs.

kcwitt avatar kcwitt commented on July 22, 2024

@ahjohannessen So the idea is that auto-updates would work like (@matjohn2 actually proposed this a while ago) :

  1. update-engine.service checks and confirms an update is available
  2. update-engine.service downloads the update to the passive partition
  3. update-engine.service triggers some sort of "update finished" hook
  4. this script builds zfs and copies it to the passive partition
  5. update-engine.service reboots server swapping active and passive partitions
  6. if reboot is unsuccessful (due to zfs or anything else) revert back to the original active partition

I suppose in theory this is possible. I don't know how hard it would be to find a proper hook, but the updates to the script would be almost trivial. It also has the added benefit that the zfs overlay wouldn't be needed (since I think we could write directly to the /usr dir in the passive partition).

It feels kludgy to have on overlay on initial install, and no overlay on updates. It also feels like CoreOS is evolving so fast it would take a lot of maintenance (note the changes I had to make between 14XX and 15XX).

I suppose that the initial install could copy the active partition to the passive partition and then install corezfs to the passive partition and reboot to the passive partition. But having this reboot isn't so good for automatic provisioning.

The biggest problem as that it would only be after zfs didn't work that we would know to go in and fix the script, so we would need to be extra careful to make sure that if zfs is not installed correctly the update would be cancelled (but what if it builds correctly, but doesn't work correctly after the reboot for some reason; we would need to force CoreOS to rollback to the previous installation which, at least in the past, has not always been easy).

We could set-up a CI pipeline as @CoRfr proposed, but at that point it is starting to feel like we are turning CoreOS into a customizable Linux distro.

This is starting to sound disproportionately complicated.

One of the reasons I haven't tried to make a pull request myself to incorporate this into CoreOS is because I find the CoreOS documentation very confusing. Most of it doesn't make sense to me until after hours and hours of trial and error (although once I start to understand what they have in mind, I find their documentation remarkably concise and useful as a reference). It would be great of someone could write the "CoreOS Missing Manual" to help us get to the point where their reference documentation makes sense.

But when I look at the customizing CoreOS page I can't tell if they are talking about contributing to CoreOS proper, or building custom in-house images. I am halfway thinking about delving into building custom images so I can make CoreOS exactly what I want, but that seems like a lonely path and I think that the current state of the computing is to work at a level that is generally applicable enough to build a community, but focused enough to do one thing and do it well.

Only time will tell if the CoreOS philosophy will work out well, but to me the philosophy is "it works as advertised on the tin (nothing more and nothing less), and the only bit of provisioning that should be required is an ignition file. To me it is the beauty of this simplicity that makes CoreOS such a good choice as a server host OS. Once we start bolting things on, the user community (and related testing, feedback, etc.) drops from hundreds of thousands (or millions of instances) to hundreds.

I have no doubt that ZFS will be native to CoreOS at some point; and if we don't communicate with the CoreOS developers eventually our script will be overwriting their native implementation (including any customizations they make to better integrate it).

I did search earlier for a contact at CoreOS to ask about how to contribute (including checking on irc), but I couldn't find one. I also tried searching the online documents about how to contribute to CoreOS, but was confused. Of course it is possible, that this information is clearly posted and I just missed it. I think CoreOS is completely open source and hosted on GitHub, but I can't even tell which of the hundreds of CoreOS projects on GitHub is the one that would need to be updated to add ZFS support.

I was really hoping that by posting this script on GitHub it would prove the concept and attract some attention so somebody from the CoreOS team would get involved and help us get this solved cleanly once and for all. Also if we could get it into the native CoreOS distribution I am sure we would also be able to run the ZFS test suite as part of the CoreOS CI, which the script doesn't currently do, as I didn't think it was worth the effort since [I hope] this is just a stop-gap solution.

I really hope someone from the CoreOS team will see this thread soon and point us to the location to add the 10 lines of ebuild code and how to make a pull request to get it added natively.

from corezfs.

kcwitt avatar kcwitt commented on July 22, 2024

In the event that we ever do get the attention of a CoreOS developer I also really want to push for them to make coreos (or at least a stable fork) completely self sufficient.

For exactly the same reasons why I believe ZFS needs to be baked into CoreOS, I also believe that everything CoreOS is advertised to do it must do without any external dependencies or downloads. For instance, the last time I checked, it pulled an image from the internet on the first launch of etcd.

This is not good for disaster recovery. I am not talking about a local disaster, I am talking about bootstrapping servers when the internet is down, or some portion of it (for various reasons) becomes arbitrarily blocked either intentionally or as an unintended consequence of something else.

I don't think anyone is concerned with the size of the CoreOS image. But if they are, I suspect they are working on hardware (such as routers, or IOT devices) and not servers.

For those of us working on servers I think the most important thing as that we want to be confident that we have a stable image that we know will work; without having some critical (but infrequently used) service which we think is working fail because our incredibly tight firewall isn't allowing us to pull images from repositories that we are unaware of (an may not necessarily even trust).

I know I am repeating myself (again), but I want to drive home the point that the beauty of CoreOS is the certainty that as long as I can get access to a CoreOS image and have an ignition file I can get my server up and running again (even if I only have partial internet access).

I know that I can spin-up a vanilla CoreOS image, customize it, and upload the image back to my cloud provider, but I believe part of disaster recovery is not being tied to a specific cloud provider. Most cloud providers will have at least a relatively recent vanilla CoreOS image. To me having to customize that CoreOS image beyond an ignition file (or by it not being complete and having to pull things from the internet on first use) moves CoreOS from the category of "rock solid reliable" to "something that probably works most of the time; but probably won't if anything unusual happens (which is when it is most important that it is rock solid reliable)".

Even the corezfs script works by downloading development containers and source code from external sources. None of us has any reliable way to know the robustness or security of those servers. The first several versions of the corezfs script didn't even check the gpg signing keys because I was just trying to get the thing to work.

But we (forgive me if I am using the term "we" too inclusively) trust CoreOS (for now) because it is open source, and we presume it has a large enough user base that, although nobody can guarantee it is secure, at least there are a lot of people sharing that concern and providing oversight.

If we get ZFS baked into CoreOS the CoreOS user base will grow marginally by the amount of people who have been putting of checking it out because they need ZFS.

If we create a side-stream ZFS implementation all of us trying to keep the wheels on the side-stream implementation will be distracted and we won't draw in as much of the ZFS using community to the wider CoreOS community.

ZFS is awesome in its way, and by baking ZFS into CoreOS I am confident that a lot of people who have never considered it will check-it out. The possibilities that snapshots and clones open up are limitless. And once you get a taste of using zfs send/receive to do remote backups of hundreds of GB of data it becomes really hard to go back to things such as rsync (hint: zfs can do it without scanning the filesystem for changes on either the source or target).

CoreOS is such as good candidate to become the de-facto host os for virtually any server use case; but the trick to it will be including just the right balance of features; and I am convinced that finding the right balance requires community feedback (which is why I posted this project to try and grow enough of a community that we could be heard).

I also implore CoreOS to stop trying to build Kubernetes into CoreOS and build it on-top of CoreOS (exactly the opposite argument of zfs).

So now I am proposing three forks to CoreOS:

  1. minimal image that pulls things on demand for hardware developers (who will only pull once and then just copy to each device in the factory) to keep the image file small
  2. standard all inclusive image without Kubernetes (for when we don't need Kubernetes)
  3. standard all inclusive image with Kubernetes (for when we want Kubernetes)

But now with all these forks it is starting to get complex again. I am not even sure this is the right direction, but I would love to see a healthy public discussion about it.

What an exciting time to be involved in software.

@ahjohannessen I appreciate you setting up the google group, but I am in a location where google is blocked; which is why why I am still posting here (and also why I am so focused on these reliability and disaster recovery issues).

from corezfs.

ahjohannessen avatar ahjohannessen commented on July 22, 2024

@kcwitt One thing that I learned is that when zfs-overlay is enabled and active, then update_engine does not work and complains:

update_engine[1059]: unable to find match
update_engine[1059]: W1026 10:14:46.335265  1059 utils.cc:427] rootdev found a device name with no device node
update_engine[1059]: E1026 10:14:46.335278  1059 omaha_response_handler_action.cc:105] utils::StringHasPrefix(boot_dev, "/dev/") failed.
update_engine[1059]: E1026 10:14:46.335285  1059 omaha_response_handler_action.cc:68] GetInstallDev( install_plan_.old_partition_path, &install_plan_.partition_path) failed.
update_engine[1059]: I1026 10:14:46.335295  1059 action_processor.cc:68] ActionProcessor::ActionComplete: OmahaResponseHandlerAction action failed. Aborting processing.
update_engine[1059]: I1026 10:14:46.335304  1059 action_processor.cc:73] ActionProcessor::ActionComplete: finished last action of type OmahaResponseHandlerAction
update_engine[1059]: I1026 10:14:46.335314  1059 update_attempter.cc:290] Processing Done.
update_engine[1059]: I1026 10:14:46.335355  1059 update_attempter.cc:326] No update.

If I disable zfs-overlay then the update_engine does what it is supposed to do. So I guess it has something to do with /usr overlay.

from corezfs.

kcwitt avatar kcwitt commented on July 22, 2024

@ahjohannessen This is purely a guess; but I guess it is because the zfs overlay is mounted directly over the /usr directory (bearing in mind that "updatable" part of CoreOS is 100% contained in the /usr directory.

The command mount |grep /usr shows:

/dev/mapper/usr on /usr type ext4 (ro,relatime,seclabel,block_validity,delalloc,barrier,user_xattr,acl)
overlay on /usr type overlay (ro,relatime,lowerdir=/opt/corezfs/usr:/usr)

It seems that the update-engine is upset that the overlay (which it does not consider a "device") is the highest layer mounter on /usr instead of "/dev/mapper/usr".

The only explanation I can think of why this would matter is if the update-engine is looking at the existing /usr directory to see what version it upgrading from; and gets confused when it sees /usr as something other than /dev/mapper/usr (since the highest level of /usr is an overlay).

On a separate note, I still can't figure out how to get the overlay to mount at the right time (ie. still trying to tweak the zfs-overlay.service file). The problem is that I want it mounted by the time local-fs.target is reached, but certain portions of local-fs.target need to be completed in order for the root fs to be mounted before the overlay is applied. All of my data is stored on zfs, but the related services (such as nginx) are failing at boot because they are starting before zfs is fully set-up. I am confused about this because I made the zfs-overlay.service RequiredBy=local-fs.service and Before=local-fs.service which I though meant that I could just put After=local-fs.service into all of the other service unit files that need zfs, but in reality it is not working this way and the services are failing at boot (because they are being started before zfs is set-up). The zfs mounts are being set-up; just too late.

from corezfs.

kcwitt avatar kcwitt commented on July 22, 2024

I have abandoned this project in favor of another project at https://github.com/varasys/paczfs which sets up a CoreOS type minimalist ArchLinux server image using zfs as the root filesystem. I think there are some interesting possibilities regarding overall cluster OS root/container filesystem management based on sending/receiving/verifying zfs differential root filesystem snapshots. Also some limited disaster recovery opportunities, such as snapshoting the rootfs on each successful boot (and then trying them in reverse order to revover). Also, ArchLinux is just a joy to use (after you start to understand it). I would love to hear your comments.

Since I don't use CoreOS anymore, I am happy to turn this over to anyone who wants it (I have some extra unfinished code that was never posted but is somewhat cleaner and more reliable than this).

from corezfs.

johnmmcgee avatar johnmmcgee commented on July 22, 2024

Is this still working?

from corezfs.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.