Git Product home page Git Product logo

Comments (37)

egernst avatar egernst commented on August 20, 2024 5
  1. For networking, I would call out support for all* CNI and CNM plugins (* - some of these won't be feasible, but the key ones need to be compatible and regularly tested)
  2. For networking, I would call out support for vhost-user (ie: DPDK) and SRIOV support
  3. For networking (longer term?), we should support IPV6.
  4. For advanced I/O - generic device handling via vhost-user sockets (SCSI, block device, for example)
  5. For advanced I/O -- generic device pass through (for devices like and including SRIOV, RDMA, QAT, Graphics, etc, etc).

from runtime.

tallclair avatar tallclair commented on August 20, 2024 1

Can you clarify the section on CRI support? It sounds like it's not saying that the runtime should be a complete implementation of the CRI, but rather that an implementation should be possible using it? Is OCI not sufficient for that?

from runtime.

jodh-intel avatar jodh-intel commented on August 20, 2024 1

Thanks @sameo.

Regarding logging, ideally the runtime would use structured logging (as provided by https://godoc.org/github.com/sirupsen/logrus, for example), such that one of the log fields would specify "runtime" or "kata-runtime" to allow consumers of the system log to determine that an error was being generated by the runtime (as opposed to the shim/proxy/agent/hypervisor).

Also, every time an OCI command is called, it would be extremely useful if the runtime could also log its version and the commit it was built with. That shouldn't necessarily be required for every log call but since an OCI command delineates a block of log calls, that should be sufficient.

In summary: a quick sudo journalctl -a | grep level=error should be sufficient to establish:

a) if there were any errors.
b) if errors occurred, which component first detected them.

from runtime.

sameo avatar sameo commented on August 20, 2024 1

@gnawux It seems that dockershim is indeed moving out, but afaik CRI is now mandatory for kubelet so the legacy Docker runtime is no longer supported.
So I'm fine with supporting dockershim, but not Docker itself.

from runtime.

gnawux avatar gnawux commented on August 20, 2024

I added @tallclair @WeiZhang555 @sameo in the doc with edit permission, and you can invite others.

from runtime.

sameo avatar sameo commented on August 20, 2024

@gnawux Maybe it's a good idea to start discussing about those requirements here instead?

Here is my proposal:

[DRAFT v5] [See changelog at the bottom of this page]

Mandatory requirements

The Kata Containers runtime MUST fulfill all requirements below:

OCI compatibility

The Kata Containers runtime MUST implement the OCI runtime specification and support all the OCI runtime operations.

runc CLI compatibility

In theory, being OCI compatible should be enough. In practice the a Kata Containers runtime should comply the latest stable runc CLI. In particular, it MUST implement with the following runc commands:

  • create
  • delete
  • exec
  • kill
  • list
  • pause
  • ps
  • state
  • start
  • version

and the following command line options:

  • --console-socket
  • --pid-file

CRI and Kubernetes support

The Kata Containers project MUST provide 2 interfaces for CRI shims to be able to manage hardware virtualization based Kubernetes pods and containers:

  • An OCI and runc compatible command line interface, as decribed in the previous section. This interface is used by e.g. the CRI-O and cri-containerd CRI implementations
  • A hardware virtualization runtime library API for CRI shims to consume and provide a more CRI native implementation. The frakti CRI shim is one example of such consumer.

Multiple hardware architectures support

The Kata Containers runtime MUST NOT be architecture specific. It should be able to support multiple hardware architectures and provide a pluggable and flexible design for adding support for additional ones.

Multiple hypervisor support

The Kata Containers runtime MUST NOT be tied to any specific hardware virtualization technology, hypervisor or virtual machine monitor implementation.
It should support multiple hypervisors and provide a pluggable and flexible design for adding support for additional ones.

Nesting

The Kata Containers runtime MUST support nested virtualization environments.

Networking

  • The Kata Containers runtime MUST be able to support any CNI plugin.
  • The Kata Containers runtime MUST be able to support both legacy and IPv6 networks.

I/O

Devices direct assignment

In order for containers to directly consume host hardware resources, the Kata Containers runtime MUST provide containers with secure pass through to generic devices like e.g. GPUs, SRIOV, RDMA, QAT, by leveraging I/O virtualization technologies (IOMMU, interrupt remapping, etc...).

Acceleration

The Kata Containers runtime MUST support accelerated and user space based I/O operations for networking (e.g. DPDK) and storage through vhost-user sockets.

Scalability

The Kata Containers runtime MUST support scalable I/O through the SRIOV technology.

Virtualization overhead reduction

One of the compelling aspects of containers is its minimal overhead compared to bare metal applications. A container runtime should strive for keeping that overhead to a minimum in order to provide the expected user experience.
As a consequence the Kata Containers runtime implementation should be optimized for:

  • Minimal workload boot and shutdown times
  • Minimal workload memory footprint
  • Maximal networking throughput
  • Minimal networking latency

Testing and debugging

Continuous Integration

Each Kata Containers runtime pull request MUST pass a set of container related tests:

  • Unit tests: (runtime unit tests coverage ?)
  • Functional tests: The entire runtime CLI and APIs
  • Integration tests: Docker and Kubernetes.

Debugging

The Kata Containers runtime implementation MUST use structured logging in order to namespaced log messages to facilitate debugging.

Optional Requirements

TBD

ChangeLog

v1 -> v2

  • Changed the Runtime API section wording as suggested by @bergwolf
  • Split requirements into 2 sections: mandatory and optional
  • Made the CRI support section a little clearer

v2 -> v3

  • Merged the Runtime API section into the CRI support one

v3 -> v4

  • Added the omitted state runc command
  • Added runc cli options

v4 -> v5

  • Added a networking section: CNI + IPv6
  • Added an I/O section: Acceleration and scalability
  • Renamed the Performance optimization section to Virtualization overhead reduction
  • Added a logging subsection

from runtime.

sameo avatar sameo commented on August 20, 2024

cc @jessfraz @sboeuf @jodh-intel @egernst @devimc @amshinde @mcastelino @grahamwhaley

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@sameo you could just edit the google doc, it is not easy to collaborate writing a document in an issue.

from runtime.

sameo avatar sameo commented on August 20, 2024

@gnawux Your suggestion about opening a PR for adding this as a document would be good. Folks can then comment on it and it can be rebased to take input into account.

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@sameo feel it is a bit early to put a PR, others may input contents directly on the google doc.

However, if you think it is the time, I could file a PR and merge both our drafts into it.

from runtime.

sameo avatar sameo commented on August 20, 2024

@gnawux Since this is going to be part of our documentation, using a google doc for this sounds like an avoidable additional step. If this document would for sure not be merged to our documentation then a google doc would make sense, but here a PR makes more sense imho.

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@sameo Though I think it is better to make the doc looks ready before convert to a PR, It's not a big problem. I will submit the PR firstly for discussion.

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@sameo and all

Do you guys think the requirement docs should be put into docs directory of this repo, or the document repo? If the latter, do we have a dir hierarchy of the repo yet?

from runtime.

jessfraz avatar jessfraz commented on August 20, 2024

I kinda think the last part of that doc is very accurate, can we remove as many shim layers as possible and try to keep it simple :)

from runtime.

egernst avatar egernst commented on August 20, 2024

I think the document is trending towards describing architecture rather than defining requirements (ie, OCI compliance, provide an API/library suitable for a CRI like frakti, device hotplug, etc.). If/when I get permissions for editing I can help make some edits/add clarity.

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@jessfraz yes, this is what we do in production, and we should keep the scenario working in kata.

@egernst 👍 permission granted.

from runtime.

sameo avatar sameo commented on August 20, 2024

@tallclair Yes, that's what I was trying to say indeed. OCI is enough for CRI implementations like CRI-O or cri-containerd but e.g. Frakti does not rely on the OCI interface but rather on the runV API. The current docker-shim also does not rely on OCI.
So for implementing a CRI server, OCI may be sufficient but it's not necessary.

from runtime.

bergwolf avatar bergwolf commented on August 20, 2024

@sameo @tallclair while OCI can implement CRI support, its runtime spec may not be efficient enough for Kata Containers due to lack of proper storage description. When Kata Containers support cri-containerd/CRI-O via OCI (aka. the runc cli interfaces), it relies on 9pfs (which itself is slow and problematic -- we even have to hack 9pfs kernel module to reach POSIX compliance) to map local storage to the guest, and there is no description for remote storage in the spec.

OTOH, the runV library API is more native for VM-based containers and favors CRI from design. In fact it was designed together with CRI. In that sense, runV is compatible with the OCI spec and provides extended APIs to better suit the need of CRI and VM-based containers. With the runv API, frakti is able to use both local block storage and remote storage more efficiently.

So to amend the last paragraph of @sameo 's requirements list, I would suggest we change the requirement of Runtime API from

Runtime API
Some CRI implementations (e.g. Frakti) may rely on the runtime API instead of the CLI. The Kata Containers should provide a runtime API definition and a runtime library to support those cases.

To
"

Runtime Library API

While CRI-O and cri-containerd rely on runc compatible CLI, some CRI implementations like frakti rely on the runtime library API instead. The Kata Containers MUST provide a runtime library API favoring CRI design to support them.
"

And it should be moved up right after the CRI support section, instead of being put at the last of the list, which might give the impression that it is a minor requirement that can be dismissed.

from runtime.

resouer avatar resouer commented on August 20, 2024

@tallclair as we mentioned CRI things, I would like to provide a detailed research of potential integration options of Kata and CRI shims in next sig-node meeting.

Let's put all these problems on the table and see how they can be fixed.

from runtime.

sameo avatar sameo commented on August 20, 2024

@bergwolf

When Kata Containers support cri-containerd/CRI-O via OCI (aka. the runc cli interfaces), it relies on 9pfs (which itself is slow and problematic -- we even have to hack 9pfs kernel module to reach POSIX compliance) to map local storage to the guest.

That is not entirely correct I'm afraid. With cc-runtime we do hotplug block based devices as local storage when the container overlay filesystem allows for it. So 9pfs is a fallback, not the default.

there is no description for remote storage in the spec.

Yes, there is no such description in the OCI spec. But out of curiosity, is that specified in the current CRI spec? I don't see it but I may very well be missing something.

So to amend the last paragraph of @sameo 's requirements list, I would suggest we change the requirement of Runtime API from

Done, thanks for the input. I did not change the order because I think all those requirements are mandatory. Instead I created an Optional requirements section to explicitly state which ones are mandatory and the runtime API is part of it.

from runtime.

bergwolf avatar bergwolf commented on August 20, 2024

@sameo, thanks for updating the list! I still think we can put all CRI related requirements together instead of scattering them all over the doc. But that can be done in future updating.

That is not entirely correct I'm afraid. With cc-runtime we do hotplug block based devices as local storage when the container overlay filesystem allows for it. So 9pfs is a fallback, not the default.

Well, it's true that you can hotplug block devices if they are specified in theSpec.Linux.Devices section of the OCI spec. That is true for any pluggable devices. And what you present to the container process is the device itself rather than any file system directory.

The problem is that with an OCI spec, rootfs and volumes are specified in Spec.Root and Spec.Mount. Then you do not know if the rootfs and volumes are block based devices or not, not to mention which device you need to hotplug to the guest.

But out of curiosity, is that specified in the current CRI spec? I don't see it but I may very well be missing something.

That is not defined by the CRI spec but can be supported via flexvolume. And there is ongoing change to support it in the CSI spec.

from runtime.

sameo avatar sameo commented on August 20, 2024

@bergwolf

I still think we can put all CRI related requirements together instead of scattering them all over the doc.

Yes, that makes sense. I've merged the runtime api into the CRI support section. Please let me know how it looks now.

Well, it's true that you can hotplug block devices if they are specified in theSpec.Linux.Devices section of the OCI spec. That is true for any pluggable devices. And what you present to the container process is the device itself rather than any file system directory.

The problem is that with an OCI spec, rootfs and volumes are specified in Spec.Root and Spec.Mount. Then you do not know if the rootfs and volumes are block based devices or not, not to mention which device you need to hotplug to the guest.

Sorry for the confusion, I should not have mentioned the hotplugging side of things here. We do vmm hotplug, but only for efficiency reasons. But cc-runtime and virtcontainers dynamically detect if a rootfs is block based device or not and present it as a disk inside the VM or as a 9pfs mount point respectively. As you pointed out, the performance and posix compatibility are significantly different between the 2. So my point was that you can do proper block based IO with or without the current OCI spec help.

from runtime.

sameo avatar sameo commented on August 20, 2024

@egernst @tallclair I've updated the CRI support section. Hopefully it reads better now.

from runtime.

WeiZhang555 avatar WeiZhang555 commented on August 20, 2024

I read this whole thread and I think there are some requirements none of you mentioned.

  1. first big part is "docker API" support, though "docker API" is also going through "runc compatible CLI " API, there're still some tricks to make "kata-runtime" work from docker command line. One example is "how to compose a POD", CRI is using specific labels to classify POD or Container, but docker isn't. And "K8S->docker->kata" is still an important scenario though we already has CRI-O as a replacement.

  2. K8S Ecosystem support:
    2.1 CNI network support
    2.2 Monitoring: how to use cAdvisor to monitor kata container resource usage.
    2.3 Logs: K8S monitor container logs. As I know, it's using volume as log transfter channel, if so kata-runtime should support this natively, but I can't be sure

These are also important for working with K8S, and I know both cc and runv have experience on these. It's better to add some illustrations for them.

from runtime.

sameo avatar sameo commented on August 20, 2024

@WeiZhang555

And "K8S->docker->kata" is still an important scenario

Do you mean we want to support the dockershim CRI implementation, i.e. "K8S->dockershim->kata" ?

from runtime.

jodh-intel avatar jodh-intel commented on August 20, 2024

Looks good. I'm a little unclear about the OCI compat / runc compat sections, specifically why those particular runc commands are listed but not some others. For example, delete isn't mentioned, nor is state. There other runc commands, but they do not form part of the OCI runtime spec.

Another point - runc has particular options (like --console-socket and --pid-file) which are not part of the OCI spec, so should those also be listed?

Also, how about adding something about logging / problem determination / debugging? Specifically the ability to determine easily that a problem originates in the runtime rather than one of the other components.

We might benefit from some input from @chavafg for thoughts on Testing requirements.

from runtime.

WeiZhang555 avatar WeiZhang555 commented on August 20, 2024

@sameo dockershim could be enough as it will be formal implementation in K8S.

@jodh-intel

Also, how about adding something about logging / problem determination / debugging? Specifically the ability to determine easily that a problem originates in the runtime rather than one of the other components.

We might benefit from some input from @chavafg for thoughts on Testing requirements.

Good point! This is also important part I really care about, and can be strong guarantee for quality and give more confidence to kata container developers and users.

from runtime.

sameo avatar sameo commented on August 20, 2024

@jodh-intel delete was already there, and state was just an unintentional omission. I added it now.
I also added the 2 options you mentioned.

Could you elaborate a little more on the logging/debugging ?

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@WeiZhang555 dockershim is moving out from kubelet, but I think a docker support is still a valid requirement.

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@sameo what's the difference between dockershim and docker from kata's view?

One more thing, I think one of the most significant differences between different client could be the networking part. @WeiZhang555 could you have some input on the networking related requirements?

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@egernst yes, at least, we need CNI at first, and 2-5 looks no problem.

from runtime.

WeiZhang555 avatar WeiZhang555 commented on August 20, 2024

@egernst This looks fine.

CNI should be more important for K8S integration.
From command line interface point of view, I saw that every "smart" or "auto-detect" implementation for CNI failed in cc and runv, it's hard to support both CNM and CNI with one implementation.
So CNI first, as a suggestion, we can use a more direct way to support CNI, just provide something like kata-runtime interface/route command to manipulate network interfaces and routes for a lite VM, and provide a new CNI plugin binary as a example in kata. This had been proven to be efficient in my test.

We can discuss more about networking part later, to find a good way for doing this.

from runtime.

gnawux avatar gnawux commented on August 20, 2024

@WeiZhang555 yes, for kubernetes faced scenarios, we don't need CNM. And all the existing CNM implementations for runV or CC could be summarized as "workaround" so far.

from runtime.

sameo avatar sameo commented on August 20, 2024

@gnawux

@sameo what's the difference between dockershim and docker from kata's view?

From kata's view it's almost identical as the kata runtime ends up being called as a Docker runtime. But from a development/integration perspective it's different: dockershim is a CRI implementation and by adding hardware virtualization awareness to the spec, we can have dockershim taking it into account and eventually calling Docker with e.g. the right annotations.

from runtime.

sameo avatar sameo commented on August 20, 2024

@egernst @WeiZhang555 @jodh-intel I think I captured all your input, please let me know if that's not the case yet.

from runtime.

jodh-intel avatar jodh-intel commented on August 20, 2024

Thanks @sameo - lgtm

from runtime.

egernst avatar egernst commented on August 20, 2024

Adding to documentation repo -further review should happen there. Please see kata-containers/documentation#17

from runtime.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.