nolar / kopf Goto Github PK

A Python framework to write Kubernetes operators in just a few lines of code

License: MIT License

Python 99.86% Shell 0.14%

kubernetes kubernetes-operator kubernetes-operators python python3 framework asyncio operator operators python-framework

kopf's Introduction

Kubernetes Operator Pythonic Framework (Kopf)

Kopf —Kubernetes Operator Pythonic Framework— is a framework and a library to make Kubernetes operators development easier, just in a few lines of Python code.

The main goal is to bring the Domain-Driven Design to the infrastructure level, with Kubernetes being an orchestrator/database of the domain objects (custom resources), and the operators containing the domain logic (with no or minimal infrastructure logic).

The project was originally started as zalando-incubator/kopf in March 2019, and then forked as nolar/kopf in August 2020: but it is the same codebase, the same packages, the same developer(s).

As of now, the project is in maintenance mode since approximately mid-2021: Python, Kubernetes, CI tooling, dependencies are upgraded, new bugs are fixed, new versions are released from time to time, but no new big features are added — there is nothing to add to this project without exploding its scope beyond the "operator framework" definition (ideas are welcome!).

Documentation

https://kopf.readthedocs.io/

Features

Simple, but powerful:
- A full-featured operator in just 2 files: a Dockerfile + a Python file (*).
- Handling functions registered via decorators with a declarative approach.
- No infrastructure boilerplate code with K8s API communication.
- Both sync and async handlers, with sync ones being threaded under the hood.
- Detailed documentation with examples.
Intuitive mapping of Python concepts to Kubernetes concepts and back:
- Marshalling of resources' data to the handlers' kwargs.
- Marshalling of handlers' results to the resources' statuses.
- Publishing of logging messages as Kubernetes events linked to the resources.
Support anything that exists in K8s:
- Custom K8s resources.
- Builtin K8s resources (pods, namespaces, etc).
- Multiple resource types in one operator.
- Both cluster and namespaced operators.
All the ways of handling that a developer can wish for:
- Low-level handlers for events received from K8s APIs "as is" (an equivalent of informers).
- High-level handlers for detected causes of changes (creation, updates with diffs, deletion).
- Handling of selected fields only instead of the whole objects (if needed).
- Dynamically generated or conditional sub-handlers (an advanced feature).
- Timers that tick as long as the resource exists, optionally with a delay since the last change.
- Daemons that run as long as the resource exists (in threads or asyncio-tasks).
- Validating and mutating admission webhook (with dev-mode tunneling).
- Live in-memory indexing of resources or their excerpts.
- Filtering with stealth mode (no logging): by arbitrary filtering functions, by labels/annotations with values, presence/absence, or dynamic callbacks.
- In-memory all-purpose containers to store non-serializable objects for individual resources.
Eventual consistency of handling:
- Retrying the handlers in case of arbitrary errors until they succeed.
- Special exceptions to request a special retry or to never retry again.
- Custom limits for the number of attempts or the time.
- Implicit persistence of the progress that survives the operator restarts.
- Tolerance to restarts and lengthy downtimes: handles the changes afterwards.
Awareness of other Kopf-based operators:
- Configurable identities for different Kopf-based operators for the same resource kinds.
- Avoiding double-processing due to cross-pod awareness of the same operator ("peering").
- Pausing of a deployed operator when a dev-mode operator runs outside of the cluster.
Extra toolkits and integrations:
- Some limited support for object hierarchies with name/labels propagation.
- Friendly to any K8s client libraries (and is client agnostic).
- Startup/cleanup operator-level handlers.
- Liveness probing endpoints and rudimentary metrics exports.
- Basic testing toolkit for in-memory per-test operator running.
- Embeddable into other Python applications.
Highly configurable (to some reasonable extent).

(*) Small font: two files of the operator itself, plus some amount of deployment files like RBAC roles, bindings, service accounts, network policies — everything needed to deploy an application in your specific infrastructure.

Examples

See examples for the examples of the typical use-cases.

A minimalistic operator can look like this:

import kopf

@kopf.on.create('kopfexamples')
def create_fn(spec, name, meta, status, **kwargs):
    print(f"And here we are! Created {name} with spec: {spec}")

Numerous kwargs are available, such as body, meta, spec, status, name, namespace, retry, diff, old, new, logger, etc: see Arguments

To run a never-exiting function for every resource as long as it exists:

import time
import kopf

@kopf.daemon('kopfexamples')
def my_daemon(spec, stopped, **kwargs):
    while not stopped:
        print(f"Object's spec: {spec}")
        time.sleep(1)

Or the same with the timers:

import kopf

@kopf.timer('kopfexamples', interval=1)
def my_timer(spec, **kwargs):
    print(f"Object's spec: {spec}")

That easy! For more features, see the documentation.

Usage

Python 3.8+ is required: CPython and PyPy are officially supported and tested; other Python implementations can work too.

We assume that when the operator is executed in the cluster, it must be packaged into a docker image with a CI/CD tool of your preference.

FROM python:3.12
ADD . /src
RUN pip install kopf
CMD kopf run /src/handlers.py --verbose

Where handlers.py is your Python script with the handlers (see examples/*/example.py for the examples).

See kopf run --help for other ways of attaching the handlers.

Contributing

Please read CONTRIBUTING.md for details on our process for submitting pull requests to us, and please ensure you follow the CODE_OF_CONDUCT.md.

To install the environment for the local development, read DEVELOPMENT.md.

Versioning

We use SemVer for versioning. For the versions available, see the releases on this repository.

License

This project is licensed under the MIT License — see the LICENSE file for details.

Acknowledgments

Thanks to Zalando for starting this project in Zalando's Open-Source Incubator in the first place.
Thanks to @side8 and their k8s-operator for inspiration.

kopf's People

Contributors

Stargazers

Watchers

Forkers

jscaltreto cliffburdick markush lack psontag tinyzimmer turbaszek pacificdynamics enixdark pjcafonso nemethf dkul108 doug-fish blomquistr yangy2000 mareckii nobusugi246 clive-jevons aland-zhang tmcsantos samgiles rmegens fabiofi numerology ztaylor54 amadev leaddevops zer0n1 alexeyzam chenxuejian rahul1804 brennerm powerexploit cwyang sergeytsaplin nangal lefthander doytsujin nnazeer pshchelo cjbaar tavaresrodrigo caffeinism baijum paxbit syllogy dibisis sobolevn leroyshirto mbeacom chamcca samj1912 nashant jkupferer casualuser kp-forks jpoley dinesh14nish alexrogalskiy alexdyas bradfair gregsterin vad1mo hanssto lcardonag guanyan1996 modulo1982 dlax showjason bobh66 gang01 rubyisbeautiful muralidharranga akainocat hramezani theonlykingpin m-javadheydarpour 521hellogithub mu-l jackwiy sanyuesiyuewuyue xiuyuanjun antonizhubar kspine jeff4747 erikengerd xielonglong744 sirain w0028long fsismondi wjsi askainet bignuoli mboutet mikespub-org edwardchenchen alitfada codewest thoreg lgtm-migrator

kopf's Issues

[PR] Fix the docs for the RBAC examples, and add "get" permission on peering

A pull request by nolar at 2019-04-29 10:02:58+00:00
Original URL: zalando-incubator/kopf#50
Merged by nolar at 2019-04-29 14:57:52+00:00

Issue : #12, #49

The docs were missing the cluster-vs-namespaced object separation.

And there were no "GET" permission on the peering object by its name. Now, it is added to address our incident (see #49 background).

These docs will be later affected by #36.

Configurable field for the status storage

An issue by nolar at 2019-04-06 06:51:22+00:00
Original URL: zalando-incubator/kopf#23

Actual Behavior

Currently, Kopf stores the internal status of the handlers in status.kopf (hard-coded). It is used for exchanging the information cross the event cycles.

This is done in the assumption that there is only one operator/controller per resource kind.

If two or more different Kopf-based operators/controllers are handling the resource, especially the reusable resource, such as Pods, they can collide.

Expected Behavior

The field of the internal status must be configurable. For example, in the handler declaration:

@kopf.on.update('', 'v1', 'pods', status='status.kopf.some-other-field')
def pod_updated(**_):
    pass

The controlling parts of the custom resource A can have a convention to use its own resource kinds as the field:

@kopf.on.update('', 'v1', 'pods', status='status.kopf-for-resource-a')
def pod_updated(**_):
    pass

If explicitly set to None, the status is not persisted. Which implies that a different flow should be used (all-at-once lifecycle, errors ignored):

@kopf.on.create('', 'v1', 'pods', status=None)
@kopf.on.update('', 'v1', 'pods', status=None)
@kopf.on.delete('', 'v1', 'pods', status=None)
def pod_event(**_):
    pass

PS: There is also the metadata.annotations.last-seen-state. It should be turned off when the status is turned off. It makes no sense to store the last-seen-state and to calculate the diff, since with the status not persisted, there will be no multiple handler calls.

To-dos

An issue by perploug at 2019-03-26 13:43:42+00:00
Original URL: zalando-incubator/kopf#1

CONTRIBUTING.md updated
CONTRIBUTORS.md updated with names of external contributors
CODEOWNERS updated with usernames of who review which PRs
MAINTAINERS updated with team member contact info
CODE_OF_CONDUCT.md reviewed
SECURITY.md reviewed
Pull request template reviewed
Issue template reviewed
Readme updated

Operator freezes while exiting after an error in the watching/queueing cycle

An issue by nolar at 2019-04-08 12:13:15+00:00
Original URL: zalando-incubator/kopf#25

Expected Behavior

When an error happens in the watching/queueing coroutines, the process exits, and the pod is restarted by Kubernetes (or it just exists if executed locally).

Actual Behavior

In some cases, the process freezes after the exception, and no new events are handled, nothing is logged.

Steps to Reproduce the Problem

Uncertain, but:

Simulate an error in the watching cycle, e.g. such as #10

Commented by nolar at 2019-04-16 14:59:09+00:00

The issue is presumably fixed in #27.

"Presumably" means that it was one certain way of freezing the operator forever with no reaction, and the simulated symptoms match with the observed symptoms. This way is now fixed.

But it is unclear how this way could be triggered and activated: the kubernetes.watch.Watch().stream() call never ends normally, since it has while True inside.

Which, in turn, means that there could be other reasons and ways of freezing. We need to catch them first, and investigate if it happens again.

Auto-guessing the peering mode

An issue by nolar at 2019-04-21 19:37:09+00:00
Original URL: zalando-incubator/kopf#33

Current Behaviour

Currently, the peering object is needed by default, unless --standalone option is used, which disables the peering completely.

This causes the confusion for the first intro and following the tutorial — in case the cluster is not configured yet (no peering objects created). See: #31.

If standalone mode is made the default, there is a negative side-effect: if somebody runs 2+ operators —e.g. one in-cluster, another in the dev-mode on the external workstations— these operators will collide and compete for the objects without knowing this. The peering was invented exactly for the purpose of not hitting this issue in the dev-mode, and gracefully "suppressing" other operators.

Expected Behaviour

The peering should be considered as a side-feature for extra safety, it should not be a showstopper for the quick-start guides or tutorials.

It would be better to have 3 modes:

with ~~--peering or~~ --peering=something, the peering is enforced, the operator fails to start if peering is not accessible (as it is now).
with --standalone, the peering is ignored (as it is now).
with no options (the new default), the auto-detection mode is used: if the "metadata.name: default" peering object is found, use it (either cluster-scoped or namespace-scoped, depending on --namespace=); if not found, log a big-letter warning of possible conflicts and collisions, and continue as if in the standalone mode.

Relevant: #32.

Todos:

Documentation:
- CLI options.
- Peering page.
Tests.

Commented by psycho-ir at 2019-04-22 10:26:50+00:00

nolar I think it's better to not support --peering as flag. It might bring ambiguity in cases like this:
run ../examples/01-minimal/example.py --peering --verbose
It can be interpreted as a peering with name --verbose.
IMO it's safe to switch to auto-mode if --peering does not exist and just show the warning to the user.
wdyt?

Commented by nolar at 2019-04-22 21:25:28+00:00

I think it's better to not support --peering as flag.

psycho-ir Agree. Also, it is unclear what is meant if it is just a flag: which peering object to use.

I've fixed the issue text.

Use --standalone option in the readme of the examples

An issue by psycho-ir at 2019-04-21 17:14:00+00:00
Original URL: zalando-incubator/kopf#31

Hi,

First of all thanks a lot for initiating this project, I think it will be a very useful project for the community.

So about the issue, I had some difficulties to run the examples as I didn't setup the peering before jumping to the examples.

Firstly I think it would be better to use --standalone parameter in the readme of the examples, so that users who haven't setup the peering yet won't get error in first run.

Secondly I guess returning a more explicit error when the peering is not setup and referring to the relevant doc makes sense, I spent some time to debug the code and find the reason and fixed it. then I noticed it's documented here :D
https://kopf.readthedocs.io/en/latest/install/

wdyt? I woud be more than happy to contribute to it if you think these suggestions are sensible.

Commented by nolar at 2019-04-21 19:37:13+00:00

psycho-ir Thanks for your feedback. I am glad that you like the idea of such a framework.

I think, your suggestion totally makes sense. If your suggestion is to fix the the docs & ./examples/*/README.md files only, feel free to do it — I welcome your contribution.

If you want to change the framework behaviour, please take a look at #32 & #33 first — I have just written down some of my thoughts on this topic.

The only thing that stops me from quickly implementing this, is the absence of tests. The framework has just recently left the proof-of-concept stage, and I cover it with the tests component-by-component, module-by-module (not fast enough, sorry).

Commented by psycho-ir at 2019-04-21 19:47:54+00:00

I just read them, yep they make totally sense to me.
I think by implementing #33 we don't even to change the doc and adding the --standalone to the docs.

I can work on it this week, or you would rather it's implemented after #32? to have support of the namespaced peerings in the first run.

Commented by nolar at 2019-04-21 21:46:51+00:00

psycho-ir I would say they are independent (unless there are some tricky details which I do not foresee).

You can just keep the always-cluster-scoped concept for now, and implement the logic for the present/absent peering object.

Commented by nolar at 2019-04-22 02:03:16+00:00

psycho-ir Please, take a look at #36 & #37 — regarding the #32 (namespace isolation of the served objects and of the peering objects). Seems working as expected — I will double-check tomorrow, with fresh mind.

You can send your implementation for #33 as is — I will rebase&resolve these two PRs later (they are less important than a working tutorial).

Commented by nolar at 2019-04-26 08:51:45+00:00

Indirectly fixed by #33 & #38 (auto-detection of the peering mode) — the tutorial does not break if the cluster is not configured with the peering object.

Set up Travis

An issue by samurang87 at 2019-03-27 10:08:03+00:00
Original URL: zalando-incubator/kopf#5

Set up the CI for this project

Commented by nolar at 2019-04-08 12:34:23+00:00

Done.

[PR] [GH-33] implement auto detection mode for peering.

A pull request by psycho-ir at 2019-04-22 10:39:04+00:00
Original URL: zalando-incubator/kopf#38
Merged by nolar at 2019-04-24 15:56:01+00:00

Issue : #33 (only if appropriate)

nolar I will write the tests and document it once we agreed on the implementation.

Commented by nolar at 2019-04-22 19:24:22+00:00

psycho-ir PS: Also, feel free to remove the clutter from the PR body if you want. (I think, the PR template must be reduced to only an issue reference (it is mandatory), with no implicit structure. — Will do later.)

Commented by psycho-ir at 2019-04-22 20:05:40+00:00

I've tried it locally. There is a little bug with the ...cls(peering=None,... — crashes on start (see comments). Once fixed, it works nice in all modes, exactly as intended.

Would you like to extend the docs (docs/peering.rst → https://kopf.readthedocs.io/en/latest/peering/) in this PR? Or I can do this in the following PRs, together with other peering doc changes.

Sure, I will extend the docs tomorrow and try to add some tests for it.

Commented by nolar at 2019-04-24 16:11:36+00:00

psycho-ir So, it is merged and released as kopf==0.9. Congratulation! And big thanks for your contribution!

Commented by psycho-ir at 2019-04-24 16:46:41+00:00

nolar yaaaaaay. what is the next milestone? Would be happy to contribute more in this project.

Commented by nolar at 2019-04-26 10:06:18+00:00

psycho-ir

Currently, the milestone 1 is this:

Tests, tests, tests — to bring the repo to a healthy state, so that I am not afraid to introduce new changes without breaking things. 90% of them are done, just in the PRs or in my local branches waiting for some PRs to be merged.
Silent spies on the events (see #30 ) — to react to the events in pods, persistent volume claims, etc, without storing the handler status. Already implemented in my local branch, waiting for the tests.
Finish the tutorial in the docs, so that the kind: EphemeralVolumeClaim becomes a real example operator in its own repo, uploaded to the DockerHub, etc. Partially drafted in the docs (in pieces), though not actually tested in real cluster. Waiting for the missing feature of the silent spies.

Since then, the framework is sufficiently feature-rich for the first stage (it is now, actually, just the docs do not feel complete), and can be advertised in public: meetups, blog posts, so on.

Based on the real-world feedback, the next milestones can be defined.

The real-world usage is the most important goal now. I.e., getting the operators implemented with this framework (and preferably shared).

Meanwhile, I write down all the ideas that come to my mind as the issues. If you have some suggestions, feel free to create the issues too. Examples: #44, #45, #46.

[PR] Test the callback invocation protocol

A pull request by nolar at 2019-04-21 21:17:17+00:00
Original URL: zalando-incubator/kopf#34
Merged by nolar at 2019-04-29 15:53:34+00:00

The handlers, both sync and async, so as the lifecycle callbacks are called via the same protocol. This PR extracts this callback invocation protocol into a separate module (to reduce the complexity of kopf.reactor.handling), and add some tests for the invocation protocol.

Issue : #13

The "invocation protocol" means that all expected kwargs (a dozen of them) will be indeed passed to the handlers, and that none of them will ever be accidentally "lost", thus breaking the operators — the framework's public promise (https://kopf.readthedocs.io/en/latest/handlers/#arguments).

In addition, sync & async callbacks are tested to behave as promised in the documentation (https://kopf.readthedocs.io/en/latest/async/):

The async callbacks are expected to be called in the same asyncio loop, which means in the same thread and the full stack is available to the debuggers and exceptions.
The sync callbacks are called from the default executor (usually a thread pool) to prevent blocking of the asyncio loop, so they do not see the stack.

The partials and wrappers are unfolded to their real functions if possible. These ones are actually used and are really needed (especially the decorated wrappers).

The lambdas are also supported, but limited. The lambdas are not yet used in the operators or docs or examples, but it is better to not be surprised and not to fail when and if they are used (as they are, technically, also the callables).

The module extracted is performed "as is", no logic is changed.

[PR] Replace built-in StopIteration with custom StopStreaming for API calls

A pull request by nolar at 2019-04-12 08:11:24+00:00
Original URL: zalando-incubator/kopf#27
Merged by nolar at 2019-04-16 11:53:26+00:00

Prevent the operator from freezing on the end of the watch-event stream.

Issue : #25

Python3 has issues with StopIteration exception from inside the async coroutines. This issue is addressed in PEP-479.

Briefly, this code works and raises a RuntimeError as per PEP-479:

import asyncio

async def fn(src):
    print(next(src))
    print(next(src))  # raises StopIteration

loop = asyncio.get_event_loop()
coro = fn(iter([100]))
loop.run_until_complete(coro)

RuntimeError: coroutine raised StopIteration

But this code prints an exception and hangs forever (here, we execute synchronous next in a thread pool aka asyncio "executor"):

import asyncio

async def fn(src):
    print(await loop.run_in_executor(None, next, src))
    print(await loop.run_in_executor(None, next, src))  # raises StopIteration

loop = asyncio.get_event_loop()
coro = fn(iter([100]))
loop.run_until_complete(coro)

TypeError: StopIteration interacts badly with generators and cannot be raised into a Future

After which it hangs forever. Therefore, using this next-in-executor technique is not safe and should be avoided.

This PR replaces the StopIteration with custom StopStreaming, and handles it accordingly.

[PR] Restrict the API calls in namespaced mode

A pull request by nolar at 2019-04-22 01:54:43+00:00
Original URL: zalando-incubator/kopf#37
Merged by nolar at 2019-05-14 16:01:58+00:00

Issue : #32, related #31

Reimplement the namespace filtering of --namespace option to use the appropriate API calls instead of object field checking.

First, this will remove the unnecessary receiving and silent ignoring of the irrelevant objects from other namespaces.

Second, this can simplify the RBAC configs for the per-namespace operators (no cluster calls are made, thus no cluster roles are needed — see docs preview: https://kopf.readthedocs.io/en/peering-scopes/deployment/#rbac).

The namespace separation of regular objects is tested both manually and with limited set of unit-tests. More tests will be added when the whole watching-queueing-handling subsystem will be covered with tests (as part of #13).

See also #36 for the namespace isolation of the peering objects.

Documentation

An issue by nolar at 2019-04-01 17:25:35+00:00
Original URL: zalando-incubator/kopf#12

The project is now in the proof-of-concept stage, and lacks the documentation (beside the docstrings and examples).

It needs some normal hosted HTML docs (e.g. on ReadTheDocs).

Hosting configured.
Builds configured.

Specifically, some topics to not forget:

[PR] Import the project as it is right now, with the squashed history

A pull request by nolar at 2019-03-27 12:37:12+00:00
Original URL: zalando-incubator/kopf#7
Merged by nolar at 2019-03-27 12:54:01+00:00

Initial import of the squashed repository as it is.

Issues: #1

Log handling from pods

An issue by nolar at 2019-04-26 09:53:10+00:00
Original URL: zalando-incubator/kopf#46

Having the silent handlers (spies) on the built-in Kubernetes objects (#30), the next step would be to silently watch over the pod's logs.

An example use-case: monitor the logs for specific lines (by pattern), and extract the KPIs of the process in them, or their status, which can then be put on the Kubernetes object's status:

import kopf

@kopf.on.log('', 'v1', 'pods',
             regex=r'model accuracy is (\d+\.\d+)%')
def accuracy_log(namespace, meta, patch, log, match, **kwargs):
    model_name = meta.get('labels', {}).get('model')
    accuracy = float(match.group(1))
    accuracy_str = f'{accuracy:2f}%'

    api = kubernetes.client.CustomObjectsApi()
    api.patch_namespaced_custom_object(
        group='zalando.org', 
        version='v1',
        plural='trainingjobs',
        namespace=namespace,
        name=model_name, 
        body={'status': {'accuracy': accuracy_str}},
    )

@kopf.on.log('', 'v1', 'pods',
             regex=r'Traceback (most recent call last):')
def error_log(namespace, meta, patch, log, match, **kwargs):
    model_name = meta.get('labels', {}).get('model')
    api = kubernetes.client.CustomObjectsApi()
    api.patch_namespaced_custom_object(
        group='zalando.org', 
        version='v1',
        plural='trainingjobs',
        namespace=namespace,
        name=model_name, 
        body={'status': {'training': 'FAILED'}},
    )

Important: Perhaps, some filtering by the labels is needed, so that we do not watch over all the pods (there can be a lot of them), but only those of our interest. E.g., by the presence of model label in the examples above, so that only the model-pods are taken into account. See #45.

Such a TrainingJob custom resource can the be defined as follows:

spec:
  ………
  additionalPrinterColumns:
    - name: Accuracy
      type: string
      priority: 0
      JSONPath: .spec.accuracy

When listed, the objects will print their accuracy:

$ kubectl get TrainingJob
NAME             ACCURACY
model-1          87.23%

[PR] Test the CLI invocation, including logging in & preloading

A pull request by nolar at 2019-04-22 11:40:56+00:00
Original URL: zalando-incubator/kopf#39
Merged by nolar at 2019-04-30 18:27:01+00:00

Issue : #13

It is straightforward: no explanations needed. Just the tests and few non-critical fixes detected during the tests.

Commented by nolar at 2019-04-30 10:09:48+00:00

samurang87 And the last time please, with all the merge conflicts with the new features resolved (and just rebased on master) ;-)

Kopf-based operator fails with KeyError ['uid']

An issue by nolar at 2019-03-27 13:35:55+00:00
Original URL: zalando-incubator/kopf#10

Actual Behavior

There is a stacktrace in the operator written with Kopf:

Traceback (most recent call last):
  ………
  File "/usr/local/lib/python3.7/dist-packages/kopf/cli.py", line 19, in wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/kopf/cli.py", line 50, in run
    peering=peering,
  File "/usr/local/lib/python3.7/dist-packages/kopf/reactor/queueing.py", line 248, in run
    task.result()
  File "/usr/local/lib/python3.7/dist-packages/kopf/reactor/queueing.py", line 83, in watcher
    key = (resource, event['object']['metadata']['uid'])
KeyError: 'uid'

Expected Behaviour

No sporadic errors.

Steps to Reproduce the Problem

(unknown)

Specifications

Version: 0.5

[PR] Handle only the latest event with the most up to date state

A pull request by nolar at 2019-04-26 08:41:51+00:00
Original URL: zalando-incubator/kopf#43
Merged by nolar at 2019-04-26 15:15:01+00:00

Issue : #42

With this change, if the queue contains a batch of events, react only to the latest one — to follow the "eventual consistency" principle of Kubernetes.

The batch is time-framed to 0.1s, so that it is fast enough for all normal cases when only one event arrives, but slow enough for the initial object listing when multiple events arrive.

Still, some minimal time-frame is needed, as the streaming and parsing of the events inside of the kubernetes library is not immediate.

Commented by samurang87 at 2019-04-26 12:58:44+00:00

So basically if the fetching manages to get any events, you restart it to make sure there is only one event fetched? 🤔

Commented by nolar at 2019-04-26 14:23:39+00:00

samurang87 Yes. And getting all the events till nothing is left. All events except the last one are therefore ignored — as intended.

Commented by parking52 at 2019-04-26 14:31:19+00:00

Why will the event we are interested in be fetched rapidly ? And why will the number of event decrease ?

Commented by nolar at 2019-04-26 14:38:54+00:00

parking52 samurang87 I didn't get your last comments. What number of events decreased? What exactly is fetched rapidly?

Commented by nolar at 2019-04-26 14:39:49+00:00

parking52 samurang87 See the examples in #42 — there are few objects with few events (e.g. "mycrd-expr1"), of which only the latest is of our interest.

Commented by nolar at 2019-04-26 15:14:04+00:00

parking52 samurang87 This is efficiently an equivalent of this logic:

while queue.qsize() > 0:
    event = await queue.get()

Except that:

threading.Queue().qsize() on Mac OS (it is not supported), so I used to avoid qsize() and just to not think about it. It is promised to work normally in asyncio.Queue though.
If the queue is empty (qsize == 0), we should wait for 5.0 seconds until something appears. So, while queue.qsize() > 0: is not a suitable criterion here, unless first_time is introduced. But it makes it same as not nice as double tries. In that case, it would look like this:

event = None
first_time = True
try:
    while first_time or queue.qsize() > 0:
        first_time = False
        event = await asyncio.wait_for(queue.get(), timeout=5.0)
except asyncio.TimeoutError:
    if event is None:
        break

# when (event is not None) or (no timeout on the last event), continue.

Add the RBAC examples for deployments

An issue by nolar at 2019-04-02 09:40:44+00:00
Original URL: zalando-incubator/kopf#17

Kopf is just a framework, but the Kopf-based operators must be deployed to the cluster. For that, they would need the RBAC (role-based access control) templates and examples.

Add and document some common templates with the RBAC objects (roles, rolebindings, etc).

Commented by nolar at 2019-04-29 09:45:17+00:00

The RBAC generation idea is extracted to #49. This issue was originally about the docs only, and it was done some time ago: https://kopf.readthedocs.io/en/stable/deployment/#rbac

Call handlers by time

An issue by nolar at 2019-04-02 09:48:30+00:00
Original URL: zalando-incubator/kopf#19

Expected Behavior

Some of the handlers must be called regularly on the schedule (e.g. cron-like).

For example, to monitor the actual state of the system (as changed out of scope of the operator) with the declared state (as in the yaml files).

Actual Behavior

Only the objects changes cause the handlers execution.

The objects are not always changed when something else (unmonitored) happens in the cluster.

Commented by nolar at 2020-04-01 11:46:59+00:00

Docs:

Field-handler should receive field-diffs, not the object-diffs

An issue by nolar at 2019-04-04 21:42:42+00:00
Original URL: zalando-incubator/kopf#21

Expected Behavior

The field-handler (@kopf.on.field) receives the diff object with the diffs specific for that field, same as the old & new values. And the field paths should be relative to the handled field.

Actual Behavior

The diff object contains the diffs for the whole object (all fields), and relative to the object's root.

Detected while writing the documentation (though was marked with a TODO in the source code).

JSON logging

An issue by nolar at 2019-04-26 09:08:55+00:00
Original URL: zalando-incubator/kopf#44

Currently, Kopf logs in text mode, one line per event. In the multi-line events (e.g. with the data dumps), it is intentionally flattened to remain in one line — to be friendly with the logging systems such as Scalyr. This makes the log reading difficult.

Instead, Kopf should log the JSON objects per logging event, so that they could be consumed by the logging systems such as Scalyr, and delivered to the logging discovery tools with all these fields, where they are searchable/filterable.

kopf run --log-json ...

The fields needed:

Logging message.
Logging level as a number and as a name.
All other built-in fields of logging (timestamp, etc).
All extras, such as the namespace and name in the per-object loggers.
New: the uid and kind of the object.
New: the id of the operator (ourselves in peering).

When in the JSON logging mode, the data dumps of the objects should be made multi-line pretty-printed, so that they are readable in the logging tools (e.g. Scalyr).

[PR] Test diffs calculation and manipulation

A pull request by nolar at 2019-04-22 23:32:57+00:00
Original URL: zalando-incubator/kopf#40
Merged by nolar at 2019-04-26 08:05:22+00:00

Issue : #13

Diffs are used in the kopf.on.update handlers, and then in the per-field handlers. They are very minimalistic and very simple structures, easy to test.

[PR] Persist the progress status of the last handled event/cause

A pull request by nolar at 2019-03-27 13:21:41+00:00
Original URL: zalando-incubator/kopf#9

DO NOT MERGE: The idea of persistent status is questionable. Need some feedback from the early usage in our team projects.

The status is seen by kubectl describe ObjType obj-id or kubectl describe -f obj.yaml (in a fancy reformatted capitalised form), or by kubectl get -f obj.yaml -o yaml (in the original yaml form).

Unlike the k8s events, which are garbage-collected after some time (~1 hour?), the status is persisted on the object for the whole object lifecycle. Moreover, the status can be used in the additional CRD "printer columns", as shown by kubectl get ObjType.

Originally, the handlers status (also called the "progress") was only stored during the handling cycle (there could be multiple handlers/subhandlers, provoking multiple patches & watch-events), and the presence of the progress meant that the handling cycle is ongoing. Once the handling cycle was done, the progress was purged, and the kopf-specific status was removed.

This made the debugging harder, as it required to read the verbose log messages on the handlers/subhandlers progression, rather than checking the object itself.

Now, with this change, the progress of the last handling cycle will be persisted on the object. It will, however, be replaced by the new progress, if the new change happens (e.g. the object is edited or deleted), and the new cycle begins.

As a downside of this change, the object remains polluted with the internal states of the framework, which can look not so nice, since it exposes the internal logic and abstractions to the user — i.e. the classical leaky abstraction anti-pattern. On the other hand, this leakage of the abstractions can be helpful to the developers of the operators or of the framework.

To achieve that goal, the handling logic was changed and complicated: e.g., the "hash-digest of the current state" was introduced to distinguish the real object changes from the internally provoked changes (previously was distinguished by the existence/non-existence of the progress status); and an additional patching cycle was introduced to store the hash-digest on the beginning of each handling cycle. These additional complications also do not feel nice.

This is why it goes as a separate PR — to see, if this approach helps at all, or does not help, and should be avoided.

Commented by nolar at 2019-05-15 13:32:06+00:00

No decision after some time, so I'm closing it — also to delete the branch from the main repo (I keep it in my fork).

[PR] Test registry and matching of handlers by causes

A pull request by nolar at 2019-04-14 23:31:11+00:00
Original URL: zalando-incubator/kopf#29
Merged by nolar at 2019-04-18 12:07:25+00:00

These tests "freeze" the public interface of the library regarding the registries and matching of handlers by causes (i.e. resources-events-fields) — to prevent accidental regressions.

Issue : #13

Code linting on build

An issue by nolar at 2019-04-26 09:59:58+00:00
Original URL: zalando-incubator/kopf#47

Code should be automatically linted on every push, as part of the building process.

The coding guidelines should be some defaults of pylint, flake8, or both. One exception:

Line length is 100 chars.

No linting scripts, no CLI options, just the standard CLI tool should already take all this into account — i.e. auto-detectable configs of these tools should be used (same as by the IDE).

[PR] Pass the field diff to the field handlers, not the obj diff

A pull request by nolar at 2019-04-04 21:43:28+00:00
Original URL: zalando-incubator/kopf#22
Merged by nolar at 2019-04-12 13:53:14+00:00

The diff field for the @kopf.on.field handler contains the whole obj diff, but should contain the field diff: i.e. only the records related to the handled field, with the field names relative to the field's root, not to the object's root.

Issue : #21

[PR] Cleanup the boilerplate files

A pull request by nolar at 2019-03-26 16:09:37+00:00
Original URL: zalando-incubator/kopf#3
Merged by nolar at 2019-03-26 16:40:23+00:00

Update the initial files of the git repo.

Issue : #1

Ignore the events from the past

An issue by nolar at 2019-04-26 08:19:25+00:00
Original URL: zalando-incubator/kopf#42

Actual Behaviour

Kopf-based operator reacts to the object creation events in some cases. In more details, it is described in kubernetes-client/python#819.

Briefly: it is caused by how a kubernetes client library is implemented: it remembers the last seen resource version among all objects as they are listed on the initial call. Kubernetes lists them in arbitrary order, so the old ones can be the latest in the list. Then, the client library uses that old resource version to re-establish the watch connection, which replays all the old events since that moment in time when this resource version was the latest. This also includes the creation, modification, and even the deletion events for the objects that do not exist anymore.

In practice, it means that the operator will call the handlers, which can potentially create the children objects and do some other side effects. In our case, it happened every day when some cluster events were executed.; but it could happen any time the existing watch connection is re-established.

Expected Behaviour

The operator framework should follow the "eventual consistency" principle, which means that only the last state (the latest resource version, the latest event) should be handled.

Since the events are streaming, the "batch of events" can be defined as a time-window of e.g. 0.1s — fast enough to not delay the reaction in normal cases, but slow enough to process all events happening in a row.

Steps to Reproduce the Problem

Create some amount (10-20) of objects.

Example for my custom resource kind:

In [59]: kubernetes.config.load_kube_config()  # developer's config files
In [60]: api = kubernetes.client.CustomObjectsApi()
In [61]: api_fn = api.list_cluster_custom_object
In [62]: w = kubernetes.watch.Watch()
In [63]: stream = w.stream(api_fn, 'example.com', 'v1', 'mycrds')
In [64]: for ev in stream: print((ev['type'], ev['object'].get('metadata', {}).get('name'), ev['object'].get('metadata', {}).get('resourceVersion'), ev['object'] if ev['type'] == 'ERROR' else None))

('ADDED', 'mycrd-20190328073027', '213646032', None)
('ADDED', 'mycrd-20190404073027', '222002640', None)
('ADDED', 'mycrd-20190408065731', '222002770', None)
('ADDED', 'mycrd-20190409073007', '222002799', None)
('ADDED', 'mycrd-20190410073012', '222070110', None)
('ADDED', 'mycrd-20190412073005', '223458915', None)
('ADDED', 'mycrd-20190416073028', '226128256', None)
('ADDED', 'mycrd-20190314165455', '233262799', None)
('ADDED', 'mycrd-20190315073002', '205552290', None)
('ADDED', 'mycrd-20190321073022', '209509389', None)
('ADDED', 'mycrd-20190322073027', '209915543', None)
('ADDED', 'mycrd-20190326073030', '212318823', None)
('ADDED', 'mycrd-20190402073005', '222002561', None)
('ADDED', 'mycrd-20190415154942', '225660142', None)
('ADDED', 'mycrd-20190419073010', '228579290', None)
('ADDED', 'mycrd-20190423073032', '232894099', None)
('ADDED', 'mycrd-20190424073015', '232894129', None)
('ADDED', 'mycrd-20190319073031', '207954735', None)
('ADDED', 'mycrd-20190403073019', '222002615', None)
('ADDED', 'mycrd-20190405073040', '222002719', None)
('ADDED', 'mycrd-20190415070301', '225374502', None)
('ADDED', 'mycrd-20190417073005', '226917625', None)
('ADDED', 'mycrd-20190418073023', '227736631', None)
('ADDED', 'mycrd-20190327073030', '212984265', None)
('ADDED', 'mycrd-20190422061326', '230661413', None)
('ADDED', 'mycrd-20190318070654', '207313230', None)
('ADDED', 'mycrd-20190401101414', '216222726', None)
('ADDED', 'mycrd-20190320073041', '208884644', None)
('ADDED', 'mycrd-20190326165718', '212611027', None)
('ADDED', 'mycrd-20190329073007', '214304201', None)
('ADDED', 'mycrd-20190325095839', '211712843', None)
('ADDED', 'mycrd-20190411073018', '223394843', None)
^C

Please note the random order of resource_versions. Depending on your luck and current state of the cluster, you can get either the new enough, or the oldest resource in the last line.

Let's use the latest resource_version 223394843 with a new watch object:

In [76]: w = kubernetes.watch.Watch()
In [79]: stream = w.stream(api_fn, 'example.com', 'v1', 'mycrds', resource_version='223394843')
In [80]: for ev in stream: print((ev['type'], ev['object'].get('metadata', {}).get('name'), ev['object'].get('metadata', {}).get('resourceVersion'), ev['object'] if ev['type'] == 'ERROR' else None))

('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})
('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})
('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})
('ERROR', None, None, {'kind': 'Status', 'apiVersion': 'v1', 'metadata': {}, 'status': 'Failure', 'message': 'too old resource version: 223394843 (226210031)', 'reason': 'Gone', 'code': 410})

……… repeated infinitely ………

Well, okay, let's try the recommended resource_version, which is at least known to the API:

In [83]: w = kubernetes.watch.Watch()
In [84]: stream = w.stream(api_fn, 'example.com', 'v1', 'mycrds', resource_version='226210031')
In [85]: for ev in stream: print((ev['type'], ev['object'].get('metadata', {}).get('name'), ev['object'].get('metadata', {}).get('resourceVersion'), ev['object'] if ev['type'] == 'ERROR' else None))

('ADDED', 'mycrd-expr1', '226370109', None)
('MODIFIED', 'mycrd-expr1', '226370111', None)
('MODIFIED', 'mycrd-expr1', '226370116', None)
('MODIFIED', 'mycrd-expr1', '226370127', None)
('MODIFIED', 'mycrd-expr1', '226370549', None)
('DELETED', 'mycrd-expr1', '226370553', None)
('ADDED', 'mycrd-20190417073005', '226917595', None)
('MODIFIED', 'mycrd-20190417073005', '226917597', None)
('MODIFIED', 'mycrd-20190417073005', '226917605', None)
('MODIFIED', 'mycrd-20190417073005', '226917614', None)
('MODIFIED', 'mycrd-20190417073005', '226917625', None)
('ADDED', 'mycrd-20190418073023', '227736612', None)
('MODIFIED', 'mycrd-20190418073023', '227736613', None)
('MODIFIED', 'mycrd-20190418073023', '227736618', None)
('MODIFIED', 'mycrd-20190418073023', '227736629', None)
('MODIFIED', 'mycrd-20190418073023', '227736631', None)
('ADDED', 'mycrd-20190419073010', '228579268', None)
('MODIFIED', 'mycrd-20190419073010', '228579269', None)
('MODIFIED', 'mycrd-20190419073010', '228579276', None)
('MODIFIED', 'mycrd-20190419073010', '228579286', None)
('MODIFIED', 'mycrd-20190419073010', '228579290', None)
('ADDED', 'mycrd-20190422061326', '230661394', None)
('MODIFIED', 'mycrd-20190422061326', '230661395', None)
('MODIFIED', 'mycrd-20190422061326', '230661399', None)
('MODIFIED', 'mycrd-20190422061326', '230661411', None)
('MODIFIED', 'mycrd-20190422061326', '230661413', None)
('ADDED', 'mycrd-20190423073032', '231459008', None)
('MODIFIED', 'mycrd-20190423073032', '231459009', None)
('MODIFIED', 'mycrd-20190423073032', '231459013', None)
('MODIFIED', 'mycrd-20190423073032', '231459025', None)
('MODIFIED', 'mycrd-20190423073032', '231459027', None)
('MODIFIED', 'mycrd-20190423073032', '232128498', None)
('MODIFIED', 'mycrd-20190423073032', '232128514', None)
('MODIFIED', 'mycrd-20190423073032', '232128518', None)
('ADDED', 'mycrd-20190424073015', '232198227', None)
('MODIFIED', 'mycrd-20190424073015', '232198228', None)
('MODIFIED', 'mycrd-20190424073015', '232198235', None)
('MODIFIED', 'mycrd-20190424073015', '232198247', None)
('MODIFIED', 'mycrd-20190424073015', '232198249', None)
('MODIFIED', 'mycrd-20190423073032', '232894049', None)
('MODIFIED', 'mycrd-20190423073032', '232894089', None)
('MODIFIED', 'mycrd-20190424073015', '232894093', None)
('MODIFIED', 'mycrd-20190423073032', '232894099', None)
('MODIFIED', 'mycrd-20190424073015', '232894119', None)
('MODIFIED', 'mycrd-20190424073015', '232894129', None)
('ADDED', 'mycrd-20190425073032', '232973618', None)
('MODIFIED', 'mycrd-20190425073032', '232973619', None)
('MODIFIED', 'mycrd-20190425073032', '232973624', None)
('MODIFIED', 'mycrd-20190425073032', '232973635', None)
('MODIFIED', 'mycrd-20190425073032', '232973638', None)
('MODIFIED', 'mycrd-20190314165455', '233190859', None)
('MODIFIED', 'mycrd-20190314165455', '233190861', None)
('MODIFIED', 'mycrd-20190314165455', '233254055', None)
('MODIFIED', 'mycrd-20190314165455', '233254057', None)
('MODIFIED', 'mycrd-20190314165455', '233262797', None)
('MODIFIED', 'mycrd-20190314165455', '233262799', None)
^C

All this is dumped immediately, nothing happens in the cluster during these operations. All these changes are old, i.e. not expected, as they were processed before doing list...().

Please note that even the deleted non-existing resource are yielded ("expr1").

Specifications

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:52:13Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.7", GitCommit:"6f482974b76db3f1e0f5d24605a9d1d38fad9a2b", GitTreeState:"clean", BuildDate:"2019-03-25T02:41:57Z", GoVersion:"go1.10.8", Compiler:"gc", Platform:"linux/amd64"}

Python version:

Python 3.6.4

Python packages installed: (use pip freeze --all)

kubernetes==9.0.0
kopf==0.7

Commented by nolar at 2019-04-26 17:08:17+00:00

Released as 0.10

[PR] Test the lifecycles

A pull request by nolar at 2019-04-21 21:31:36+00:00
Original URL: zalando-incubator/kopf#35
Merged by nolar at 2019-05-03 14:53:34+00:00

Issue : #13

Test and fix the lifecycle callbacks. They are used to select the handlers to be executed on each handling cycle.

By default, asap is used, and it was slightly broken due to hard-coded structure of the status field (the change was done before importing to GitHub, but not reflected in this line). Now, it uses the provided function to get the retry count — from the same module where this value is set/updated.

Add health checks

An issue by nolar at 2019-04-02 09:45:43+00:00
Original URL: zalando-incubator/kopf#18

Expected Behavior

The operator is restarted by Kubernetes if it becomes irresponsive.

Actual Behavior

The operator can stuck for any reason (e.g. bugs), and nobody will notice — except as by no reaction to the added/deleted objects.

Steps to Reproduce the Problem

Put a synchronous time.sleep(300) anywhere in the async handler (async def ...).
Let it run.
Observe how the operator is blocked for 5 mins for all objects.

[PR] Prohibit PyPI uploads for the local versions (not the releases)

A pull request by nolar at 2019-03-27 11:24:57+00:00
Original URL: zalando-incubator/kopf#6
Merged by nolar at 2019-03-27 12:03:34+00:00

Following #4, disable the local versions (0.1.dev2+g123456) to be uploaded to PyPI, as PyPI does not accept them (log):

Uploading kopf-0.1.dev9+g21a7230-py3-none-any.whl
HTTPError: 400 Client Error: '0.1.dev9+g21a7230' is an invalid value for Version. Error: Can't use PEP 440 local versions. See https://packaging.python.org/specifications/core-metadata for url: https://upload.pypi.org/legacy/

Sadly, the PYPI uploads can only be tested on the master branch, i.e. after the merges.

[PR] Initial documentation

A pull request by nolar at 2019-04-01 18:37:53+00:00
Original URL: zalando-incubator/kopf#14
Merged by nolar at 2019-04-18 13:40:05+00:00

Issue : #12

Add some documentation to start with, including the concepts and Kopf's self-positioning in the ecosystem.

More pages will follow to this PR as soon as they are written.

This PR can be previewed at https://kopf.readthedocs.io/en/docs/

Commented by nolar at 2019-04-04 22:03:53+00:00

Finally, some consistent documentation with the examples and story-telling is ready.

There are also few additional pages & sections prepared locally — "as I would like to have it" style — but those ones require few extra functions in the code, and I prefer to do this after the tests are added.

Commented by nolar at 2019-04-05 12:34:31+00:00

samurang87 Thanks. All has been fixed (I hope). See the last commit (or a diff).

Tests automation

An issue by nolar at 2019-04-01 17:26:46+00:00
Original URL: zalando-incubator/kopf#13

The project is now in the proof-of-concept stage, and lacks the tests (beside the examples, manually executed during the development).

It needs some good unit tests to freeze its current state, so that we could continue to add new features safely.

Separate the tests into the internal tests and the external promises of the interface.

Add the coverage measurements.

An estimation of topics to cover with the tests:

[PR] adding documentation tests to test_diff

A pull request by parking52 at 2019-04-26 10:18:04+00:00
Original URL: zalando-incubator/kopf#48
Merged by parking52 at 2019-04-26 12:10:29+00:00

Signed-off-by: Melchior Fracas [email protected]

One-line summary

Issue : #13 (only if appropriate)

Description

This Commit aims at adding tests which describe likely usages of the diff library.
See discussion.

Types of Changes

What types of changes does your code introduce? Keep the ones that apply:

Refactor/improvements

Commented by nolar at 2019-04-26 10:38:53+00:00

Related: #40

[PR] Add README to the PyPI page, plus some other metainfo

A pull request by nolar at 2019-03-27 14:22:17+00:00
Original URL: zalando-incubator/kopf#11
Merged by nolar at 2019-03-28 13:12:40+00:00

Issue : #1

Add more information to the PyPI page (https://pypi.org/project/kopf/). Mostly the long_description from README.

[PR] Tests for object hierarchies

A pull request by nolar at 2019-04-14 19:08:05+00:00
Original URL: zalando-incubator/kopf#28
Merged by nolar at 2019-04-29 16:39:55+00:00

These tests "freeze" the public interface of the library regarding the hierarchy management — to prevent accidental regressions.

Issue : #13

Few fixes are added just to pass the tests. They slightly extend the behavior: instead of purely list & tuple classes, any iterables are accepted.

Mostly for the purpose of testing, but also as a possibility, the public interface was also extended by the namespace adjustments (move objects to the requested namespace if it is not yet set) and name harmonizing (setting either the names or the name prefixes for the objects if they are not yet set).

These two additions are directly used in the object "adoption" (i.e. assignment as a children of another parent object, and aligning all of its properties: labels, names, namespaces, etc).

Commented by parking52 at 2019-04-17 14:12:27+00:00

Hello

Automated RBAC generation and verification

An issue by nolar at 2019-04-29 09:44:00+00:00
Original URL: zalando-incubator/kopf#49

Background

With kopf>=0.9, the operators fail to start in the clusters with RBAC configured according to the docs. Introduced by #38, where GET is used on a specific peering object (not just on a list).

The deployment docs were not updated to reflect that. And, even if updated, that would lead to these incidents anyway, as the RBAC yaml file is not auto-validated and not auto-updated in our case, so we would not notice the change.

Suggestion: RBAC verification

Kopf should allow to verify if the RBAC yaml file matches the framework's and operator's expectations, and explain what is missing:

kopf rbac verify script1.py script2.py -f rbac.yaml

This verification step could be optionally used either in CI/CD testing stage, or in the docker build stage, and to fail the build if the source-code RBAC yaml file lacks some necessary permissions.

If no -f option is specified (OR: if --cluster is explicitly specified — TDB), then verify against the real currently authenticated cluster:

kopf rbac verify script1.py script2.py --cluster

The output should explain what is missing:

# Kopf's internals:
KopfPeering get permission: ❌absent
KopfPeering list permission: ✅present
KopfPeering watch permission: ✅present
KopfPeering patch permission: ✅present

# Used at script1.py::create_fn():
KopfExample list permission: ✅present
KopfExample watch permission: ✅present
KopfExample patch permission: ✅present

Some permissions are missing. The operator will fail to work.
Read more at https://kopf.readthedocs.io/en/stable/deployment/
Or use `kopf rbac generate --help`

Exit status should be 0 (all is okay) or 1 (something is missing), so that it could be used in CI/CD.

Suggestion: RBAC generation

Since Kopf would already contain the RBAC parsing & analysis logic, it is also fine to generate the RBAC yaml files from the codebase of the operator — based on which resources/events/causes are registered for handling (same CLI semantics as in kopf run: -m module or file.py).

kopf rbac generate script1.py script2.py > rbac.yaml
kubectl apply -f rbac.yaml

Extra: children objects introspection

As a challenge, some introspection might be needed into the internals of the handlers on which children objects they manipulate from the handlers (e.g. pod creation) — this must also be part of the RBAC docs. Or an additional decorator to declare these objects on the handler functions.

Acceptance Criteria

Implementation:
- RBAC generation to stdout.
- RBAC generation to file (-o, --output).
- RBAC verification of stdin.
- RBAC verification of file (-f, --file).
- RBAC verification of cluster (--cluster).
- Explanation of present/absent permissions.
- Exit status on verification.
Documentation.
- https://kopf.readthedocs.io/en/stable/deployment/
Tests:
- CLI tests.
- RBAC parsing tests.
- RBAC verification tests.

Commented by rosscdh at 2019-05-31 13:44:02+00:00

docs from readthe docs seem to be missing for both role and clusterrole

  - apiGroups: [apiextensions.k8s.io]
    resources: [customresourcedefinitions]
    verbs: [list, watch, patch, get]

Commented by rosscdh at 2019-05-31 21:12:07+00:00

also the events access was missing from both clusterrole and role

Commented by nolar at 2019-06-02 03:23:32+00:00

The events' RBAC is fixed in #89 — the events API was changed from v1beta1 to core v1, but the docs were not in sync with that.

For the customresourcedefinitions — thanks for pointing out. Fixed in #95 — both the docs, and the code for the case when cluster-scoped RBAC is not possible.

rosscdh Speaking of which, what do you mean by "missing from both clusterroles and role"? Events are namespaced, aren't they? What is the purpose of the cluster-scoped events RBAC? Or is it only for the purpose of creating the events globally, without the individual per-namespace role?

Commented by rosscdh at 2019-06-02 05:45:20+00:00

That's a damned good point.. I'll review the scoping as you pointed out.. I
think at the point that I got it working I was keysmashing.. so most
possible it needs a cleanup...
https://github.com/rosscdh/crd-route53/blob/master/kustomize/base/rbac.yml

Mate I must thank you for your efforts on this project itd made getting
into operators several shades easier thanks to your efforts.. I owe you at
least a beer.

On Sun, 2 Jun 2019, 05:23 Sergey Vasilyev, [email protected] wrote:

The events' RBAC is fixed in #89
#89 — the events API was
changed from v1beta1 to core v1, but the docs were not in sync with that.

For the customresourcedefinitions — thanks for pointing out. Fixed in #95
#95 — both the docs, and
the code for the case when cluster-scoped RBAC is not possible.

rosscdh https://github.com/rosscdh Speaking of which, what do you mean
by "missing from both clusterroles and role"? Events are namespaced,
aren't they? What is the purpose of the cluster-scoped events RBAC? Or is
it only for the purpose of creating the events globally, without the
individual per-namespace role?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nolar/kopf/issues/49?email_source=notifications&email_token=AADA6MTGFQL2TUOPC3PSEVLPYM4LLA5CNFSM4HJBZFB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXM4UQ#issuecomment-497995346,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADA6MQI6YH7A6BN46AJXKTPYM4LLANCNFSM4HJBZFBQ
.

Commented by rosscdh at 2019-06-02 05:50:38+00:00

P.S. I was thinking of using something like marshmallow to validate the
spec. Is this a strategy you would endorse? or do you have something else
in mind?

On Sun, 2 Jun 2019, 07:52 Ross, [email protected] wrote:

That's a damned good point.. I'll review the scoping as you pointed out..
I think at the point that I got it working I was keysmashing.. so most
possible it needs a cleanup...
https://github.com/rosscdh/crd-route53/blob/master/kustomize/base/rbac.yml

Mate I must thank you for your efforts on this project itd made getting
into operators several shades easier thanks to your efforts.. I owe you at
least a beer.

On Sun, 2 Jun 2019, 05:23 Sergey Vasilyev, [email protected]
wrote:

The events' RBAC is fixed in #89
#89 — the events API was
changed from v1beta1 to core v1, but the docs were not in sync with that.

For the customresourcedefinitions — thanks for pointing out. Fixed in #95
#95 — both the docs, and
the code for the case when cluster-scoped RBAC is not possible.

rosscdh https://github.com/rosscdh Speaking of which, what do you
mean by "missing from both clusterroles and role"? Events are
namespaced, aren't they? What is the purpose of the cluster-scoped events
RBAC? Or is it only for the purpose of creating the events globally,
without the individual per-namespace role?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nolar/kopf/issues/49?email_source=notifications&email_token=AADA6MTGFQL2TUOPC3PSEVLPYM4LLA5CNFSM4HJBZFB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXM4UQ#issuecomment-497995346,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADA6MQI6YH7A6BN46AJXKTPYM4LLANCNFSM4HJBZFBQ
.

Commented by nolar at 2019-06-02 13:16:02+00:00

rosscdh You are welcome!

Do you mean validation for the purpose of this issue? I never used marshmallow, so I can say nothing about its usage. But I see that it is lightweight, does not bring a lot of dependencies (actually, none), which is good — we can try.

If the size or amount of dependencies will become a problem in the future, I will move all non-runtime dependencies into extras, and make them installable via pip install 'kopf[sdk]', and ensure that all such imports are optional/conditional. But that can be ignored for now.

PS: Do you work on this issue? If so, I prefer to assign it to you, so that nobody else starts working on it in parallel.

Commented by rosscdh at 2019-06-02 18:38:01+00:00

I was thinking something like either another decorator or appending a
spec_validator to the current decorator
which if present would then validate just pre to being injected into the
create|delete|etc|_fn(spec)
tho problems cloud arise if there are version changes to an existing spec
whos schema then gets updated.. maybe spec versioning?
needs a bit of thought.

Maybe we make another ticket out of it?

On Sun, Jun 2, 2019 at 3:16 PM Sergey Vasilyev [email protected]
wrote:

rosscdh https://github.com/rosscdh You are welcome!

Do you mean validation for the purpose of this issue? I never used
marshmallow, so I can say nothing about its usage. But I see that it is
lightweight, does not bring a lot of dependencies (actually, none), which
is good — we can try.

If the size or amount of dependencies will become a problem in the future,
I will move all non-runtime dependencies into extras, and make them
installable via pip install 'kopf[sdk]', and ensure that all such imports
are optional/conditional. But that can be ignored for now.

PS: Do you work on this issue? If so, I prefer to assign it to you, so
that nobody else starts working on it in parallel.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/nolar/kopf/issues/49?email_source=notifications&email_token=AADA6MX2MDGIAMXFIVTVKDLPYPBZJA5CNFSM4HJBZFB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXVN4Y#issuecomment-498030323,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AADA6MTSY36MKQJ3EAJOGITPYPBZJANCNFSM4HJBZFBQ
.

Commented by nolar at 2019-06-04 07:36:25+00:00

rosscdh Hm. Sorry, I do not get how it is connected to RBAC validation/generation. Can you show an example how it could look like in the code? Or are you talking about #55 (arbitrary field validation)?

Commented by rosscdh at 2019-06-04 11:19:26+00:00

Sorry Nolar, I have combined a question with a ticket.

Separate question, how to ensure that the spec provided by the CRD is the expected spec.. is there a prescribed manner?

i.e in Go the specs are validated and contolled by structs, but in kopf? with dynamic python?

Regards
Ross

Commented by nolar at 2019-06-04 11:55:24+00:00

rosscdh Yes, that is a question for #55. I've moved this discussion there.

[PR] Enable Travis with an empty library

A pull request by nolar at 2019-03-27 09:12:34+00:00
Original URL: zalando-incubator/kopf#4
Merged by nolar at 2019-03-27 11:15:55+00:00

Issue #5

Configure Travis CI and prepare a dummy library and versioning schema before the initial import.

Currently, under my personal account (nolar), with the permissions granted to other maintainers as needed.

Tasks after the merge:

Put a 0.0 tag in the git repo to mark a versioning baseline.

Call handlers for cluster events

An issue by nolar at 2019-04-02 09:49:57+00:00
Original URL: zalando-incubator/kopf#20

Expected Behavior

Some handlers should be called when the cluster events happen, such as the cluster upgrades.

Actual Behavior

Only the objects changes cause the handlers execution.

The objects are not always changed when something else (unmonitored) happens in the cluster or with the cluster.

[PR] Update .zappr.yml

A pull request by perploug at 2019-03-26 13:46:09+00:00
Original URL: zalando-incubator/kopf#2
Merged by nolar at 2019-03-26 14:04:36+00:00

Change type to tools

Issue : #1

[PR] React on the errors and unknown events from k8s API

A pull request by nolar at 2019-03-27 13:07:02+00:00
Original URL: zalando-incubator/kopf#8
Merged by nolar at 2019-03-27 13:58:13+00:00

Issues: #10

There is an exception sporadically happening for the objects:

Traceback (most recent call last):
  ………
  File "/usr/local/lib/python3.7/dist-packages/kopf/reactor/queueing.py", line 83, in watcher
    key = (resource, event['object']['metadata']['uid'])
KeyError: 'uid'

It is, in turn, is caused by the unexpected event types coming from the Kubernetes API, specifically from the watch call (note that the object has the metadata, but no uid or standard k8s-object fields):

{'object': {'apiVersion': 'v1',
            'code': 410,
            'kind': 'Status',
            'message': 'too old resource version: 190491269 (208223535)',
            'metadata': {},
            'reason': 'Gone',
            'status': 'Failure'},
 'raw_object': {'apiVersion': 'v1',
                'code': 410,
                'kind': 'Status',
                'message': 'too old resource version: 190491269 (208223535)',
                'metadata': {},
                'reason': 'Gone',
                'status': 'Failure'},
 'type': 'ERROR'}

This ERROR event, in turn, is caused by how the watch-calls to the API are implemented in the Kubernetes library (be that Python or any other language):

A GET call is made to get a list of objects, with the ?watch=true argument.
The response for the watch-calls is the JSON-stream (one JSON per line) of events.
The library parses ("unmarshalls") the events and yields them to the caller.
The resourceVersion is remembered from the latest event (technically, for every event, but the latest one overrides).
When one GET call is terminated by the server (usually in few second), a new one is made, and this continues forever.
The latest known resourceVersion is passed to the next GET-call, so that the stream continues from that point only.

However, due nothing is happening for few minutes, the resourceVersion somehow becomes "too old", i.e. not remembered by Kubernetes already. This behaviour is documented in the k8s docs (https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes):

A given Kubernetes server will only preserve a historical list of changes for a limited time. Clusters using etcd3 preserve changes in the last 5 minutes by default. When the requested watch operations fail because the historical version of that resource is not available, clients must handle the case by recognizing the status code 410 Gone, clearing their local cache, performing a list operation, and starting the watch from the resourceVersion returned by that new list operation.

So, the Kubernetes API returns and yields these ERROR events. In theory, it should die with an exception (I would expect that instead of the "normal" ERROR events). As observed, these "too old resource version" errors are streamed very fast, non-stop (also strange).

The only valid way here is to restart the watch call from scratch, i.e. no resourceVersion provided.

This is an equivalent of the operator restart: every object is listed again, and goes through the handling cycle (usually a do-nothing handling). But the restart is "soft": the queues, the ascyncio tasks, and generally the state of the operator is not lost, and the time is not wasted (for the pod allocation).

This PR does these 3 things:

Soft-restarts the watching cycle on the "too old resource version" errors.
Fails on the ERROR event types for the unknown errors.
Warns about the unknown event types, which can appear in the future, and ignores them.

Filter by labels/annotations

An issue by nolar at 2019-04-26 09:52:24+00:00
Original URL: zalando-incubator/kopf#45

Currently, all objects are watched (either globally or by namespace, see #32), and all objects are handled. This is a normal case for the operator that "owns" the handled objects.

For the cases when an operator spies on the objects tat it does not "own", such as pods (#30), or the log (#46 ), it should be able to filter out all objects definitely not in the scope of interest.

The easiest way is by labels, as it is supported by the Kubernetes API, and can be put into a query to be performed server-side. Also, filtering by annotations is probably possible — via the field-selectors.

Example usage:

import kopf

@kopf.on.event('', 'v1', 'pods', labels={'model': None, 'model-id': '123abc')
def model_pod_changed(**kwargs):
    pass

The same is for create/update/delete/field handlers, and, when implemented, for the event & log handlers.

Additionally, the label filtering should be accepted on the command line (same semantics as kubectl):

kopf run handlers.py -l experiment=expr1

That can be useful for development and debugging purposes, when it is not desired to put the labels to the code permanently.

Commented by dlmiddlecote at 2019-06-19 18:39:58+00:00

Hey nolar,

I’d like to take this one on if possible.

I have a few questions about this:

You say that labels “can be put into a query to be performed server-side”, is that true? What happens if there are different filters for the same resource on different handlers, would we have to make 2 queries to the Kubernetes API? Im then thinking that this should then be handled in the code itself, do you agree? (similar for annotations).
What is the semantic around the {‘model’: None} label shown above; is that “the model key exists but with any value”, or “the model key exists with the value null”?
What should happen if labels are specified on the command line, and in the handler, is it a join of the two? Only the one in the handler, or command line wins?

Thanks!

Commented by nolar at 2019-06-20 13:56:07+00:00

dlmiddlecote

Hm. Probably, that little note was done when only the command-line filtering was in mind, not per-handler — i.e. global. This makes no sense with per-handler filtering, of course. The logic of label-matching must be inside of Kopf then, not in the API queries and server-side.

{'model': None} meant that the model label should be there, but it does not matter with which value. An equivalent of kubectl get pod -l mylabel vs kubectl get pod -l mylabel=myvalue. I think, it is impossible to have a label with value null (never tried though).

Keep in mind: this syntax snippets are just suggestions. They can be varied as needed if some problematic details pop up. E.g., a special object kopf.ANY can be introduced instead of None for label values — the same way as in mock library (when matching the call() arguments).

It is an interesting question. Maybe, the command-line label filtering must be removed at all, leaving only the per-handler filtering.

Initially, I would say that from the user point of view, this must be an intersection of the labels (not the union) — i.e. AND, not OR — i.e. it must match both, or be ignored.

The command-line filtering can be used when restricting an operator's scope on deployment time (e.g. -l experiment=expr1), while the per-handler labels can be used to express the object relations (e.g. {"parent-run-id": ANY/None} for pods).

However, if started with both, this causes problems and confusion: according to this logic, that should be a pod with parent-run-id present AND restricted to experiment=expr1. However, there is no place where this experiment label is put on the created pod, unless a developer explicitly implemented that.

And so, the internal logic of the operator code (handlers' code) is interacting with the outer logic of deployment (CLI filters).

If we go that way, Kopf must also implicitly add the command-line-specified labels on all created children objects (e.g. in kopf.adopt(), in the assumption that they all go through it). Which is not so thought-through territory, so I would recommend to avoid it for now.

Just per-handler filtering is enough. If the developers want it, they can define experiment as an env var, and add these labels themselves to the handler declarations as a global filter.

Commented by nolar at 2019-06-20 13:59:50+00:00

(reserved for dlmiddlecote)

Commented by nolar at 2019-07-24 09:34:52+00:00

Released as kopf==0.20

Docs: https://kopf.readthedocs.io/en/latest/handlers/#filtering
Announcement: https://twitter.com/nolar/status/1153971321560129543

Do not show the stacktraces on the retry/fatal exceptions

An issue by nolar at 2019-04-02 09:32:05+00:00
Original URL: zalando-incubator/kopf#16

The retry-/fatal-exceptions are the signals for the framework to stop handling or to retry the handler. They are not the regular "unexpected" errors, since they are expected.

As such, the stacktraces should not be printed. Yet the messages of the exceptions should be printed/logged.

For other regular exceptions, which are by design "unexpected", the stacktraces should be printed/logged as usually (similar as if they happen inside of a thread/greenlet, and the main process continues).

[PR] Peering scopes

A pull request by nolar at 2019-04-22 01:51:13+00:00
Original URL: zalando-incubator/kopf#36
Merged by nolar at 2019-05-17 11:22:59+00:00

Issue : #32, related #31, #17

For the purpose of strict namespace isolation of the operators, the peering objects must be cluster- or namespace-scoped (previously: always cluster-scoped).

For this, split the old cluster-wide KopfPeering resource into the new ClusterKopfPeering & namespaced KopfPeering, and use one of them depending on the --namespace option.

The testing of peering is performed manually — it works. More tests will be added when the whole peering subsystem will be covered with tests (as part of #13).

Docs preview:

See also: #37 for the namespace isolation of the listing/watching API calls.

TODO:

Manually test how do the operators react to changes in the kind: KopfPeering when re-created from the cluster to the namespaced scope — in different running modes (with/without --peering, with/without --namespace). Ensure that the operators actually behave as before and as expected.
Fallback to the old CRD KopfPeering if that CRD is cluster-scoped (but not if namespaced).
Document the upgrade scenario to code first, CRDs second, never vice versa.

Commented by nolar at 2019-05-15 15:30:50+00:00

So far, as tested with Minikube, the behaviour is this:

Baseline: Old code, old peerings (KopfPeering is cluster-scoped, ClusterKopfPeering does not exist):
- No CLI options: Auto-detection mode, cluster-wide peering was used.
- --namespace=default: Uses the cluster-wide peering.
- --peering=default: Uses the cluster-wide peering.
- --namespace=default --peering=default: Uses the cluster-wide peering.
Mainline: New code, old peerings (KopfPeering is cluster-scoped, ClusterKopfPeering does not exist):
- No CLI options: Auto-switches to standalone mode with a warning (mismatches❗️).
- --namespace=default: Uses the cluster-wide peering (as expected ✅ ).
- --peering=default: Fails with "The peering was not found (mismatches❗️).
- --namespace=default --peering=default: Uses the cluster-wide peering (as expected ✅ ).
Edgecase: Old code, new peerings (KopfPeering is namespaced, ClusterKopfPeering is cluster-scoped).
- Fails with "Namespace parameter required" (mismatches❗️).

Both cases of (new code + old peerings) fail because they try to use the ClusterKopfPeering CRD, which does not exist there — due to if namespace is None condition. They work fine if the CRD is the old one, KopfPeering.

All cases of (old code + new peerings) fail since they use the now-namespaced CRD (KopfPeering) with a cluster-scoped method (get_cluster_custom_object()).

This is not solvable, since it can only be done with a new version which supports the fallback to ClusterKopfPeering — and if one needs to upgrade, why not to upgrade to the new code with both old & new peerings supported; i.e. the transitional releases make no sense.

Commented by nolar at 2019-05-15 16:21:04+00:00

Rebased on master, and added the fallback scenario for the legacy mode (cluster-scoped KopfPeering). As tested manually, the new code works as expected both in the legacy mode (of the cluster) and in the new mode (as documented).

samurang87 aweller It can be reviewed and merged to master now.

[PR] Enable asyncio tests & mocks

A pull request by nolar at 2019-04-10 11:13:15+00:00
Original URL: zalando-incubator/kopf#26
Merged by nolar at 2019-04-15 11:47:05+00:00

Issue : #13 (needed for #25)

Since Kopf is mostly asyncio-based, there will be a lot of async/await & asyncio tests, which also do the asyncio mocks. They behave differently than the regular (sync) tests & mocks, so some little magic is needed to enable them.

Here, we configure the asyncio environment to transparently run all async def test-functions as the asyncio tests, and the mocker fixture to support the async-compatible CoroutineMock.

These changes do not belong to any other task-specific PR, as they are generic.

Consider pykube-ng?

An issue by nolar at 2019-04-02 09:30:39+00:00
Original URL: zalando-incubator/kopf#15

Originally by hjacobs :

I see that you are currently using the "official" Kubernetes client (Swagger Codegen).
I forked the old pykube to https://github.com/hjacobs/pykube as I'm rather unhappy about the complexity and size of the official Kubernetes client.
See also hjacobs/pykube#12
Not sure if this would even work or whether you use something specific of the Kubernetes Python client.

Commented by nolar at 2019-06-02 12:21:57+00:00

After #71, almost all Kubernetes-related code is consolidated in one package: kopf.k8s, where it is easy to be replaced by any other implementation. The only thing outside of this package is authentication (kopf.config.login()).

However, the whole codebase of Kopf assumes that the objects manipulated as dicts. Not even the Kubernetes client's "models". A lot of .get('metadata', {}).get('something') and .setdefault('status', {}).setdefault('kopf', {}) and similar lines are all around. It will be difficult to change that and to support both dicts & client's classes. It is better not to do so.

In addition, Kopf promises to hide implementation details from the user ⟹ which means that the user should not know which Kubernetes client is used under the hood ⟹ which means that the internal models/classes must not be exposed ⟹ which means they have to be converted to dicts on arrival.

Another tricky part will be watch-streaming. The official Kubernetes client does that in the while True cycle, and reconnects all the time. Pykube-ng exits after the first disconnection, which happens roughly ~5s, or as specified by a timeout query arg. The connection cannot stay forever, so either the client, or Kopf should handle the reconnections. Some partial workarounds can be found in #96 (pre-listing and resourceVersion usage).

Commented by nolar at 2019-06-02 19:48:55+00:00

So far, a list of issues detected while trying to switch to pykube:

pykube.HTTPClient object is needed on every call, there is no implicit config (as in the official client). Has to be stored globally and reused all over the code.
pykube.HTTPClient has a default timeout of 10s for any connection, including the watching. Can be overridden explicitly with timeout=None, but requires a separate pykube.HTTPClient instance for that.
pykube.HTTPClient raises exceptions from requests on timeouts, not its own.
Watch-call terminates after the connection is lost for any reason, no internal reconnection or while True. Has to be caught and repeated. In the official K8s client, it is done internally: the watch is eternal.
object_factory() prints a list of discovered resources to stdout, this is a visual garbage.
object_factory() assumes that the resource always exists, and fails on resource['namespaced'] when resource is None.
object_factory() requires a kind, and not plural; would be better if plural, singular, kind, and all aliases are accepted.
apiextensions.k8s.io/v1beta1/customresourcedefinitions does not exist in the cluster in a listing of resources, though is accessible. Pykube should assume that the developer knows what they are doing, and create the classes properly (but: only with plural name).
Patching is implemented as obj.update(), where the whole body of the object is used for a patch. And this involves the resourceVersion checking for non-conflicting patches. We need the partial patches on status field only (or finalizers, or annotations), not on the whole body. And we need no conflict resolution.

On a good side:

It was able to handle a custom resource KopfExample and a built-in resource Pod via the same code, no {resource=>classes+methods} mapping was needed. Both event-spy-handler and regular cause-handlers worked on pods.

Basically, the whole trick is achieved by this snippet (undoable in the official K8s client library):

        version = kwargs.pop("version", "v1")
        if version == "v1":
            base = kwargs.pop("base", "/api")
        elif "/" in version:
            base = kwargs.pop("base", "/apis")
        else:

This alone justifies the effort to continue switching.

Preview branch: https://github.com/nolar/kopf/tree/pykube (based on "resume-handlers" not yet merged branch).
Diff: wip/master/20190606...pykube

Commented by nolar at 2019-06-11 17:42:41+00:00

So far so good. The switch to pykube-ng is now fully implemented. The legacy kubernetes official client library is supported optionally (if installed) for auto-authentication, but is not used anywhere else — and I consider removing it completely.

The missing pykube-ng's parts are simulated inside kopf.k8s.classes (e.g. obj.patch() method), and eventually should move into pykube-ng itself.

The codebase seems functional. And clean. Arbitrary k8s resources (custom and builtin) are supported transparently, as it was prototyped above. The k8s-events are sent, all is fine.

What is left: all preceding PRs, on which it is based (all are pending for a review); and some cleanup in general plus the remaining TODOs marks (to be sure nothing is forgotten); and maybe a test-drive for few days in our testing infrastructure with real tasks.

Diff (still the same): wip/master/20190606...pykube — The diff is huge mostly because of tests (massive changes).

Configurable/optional finalizers

An issue by nolar at 2019-04-06 07:10:32+00:00
Original URL: zalando-incubator/kopf#24

Actual Behavior

Finalizers are marks that are stored as a list of arbitrary strings in metadata.finalizers. When the object is requested to be deleted, its deletion timestamp is set only. Until such marks exist on the object, it will not be deleted, and the deletion request will wait until the marks are removed by the operators/controllers (which should react to the deletion timestamp appearance). Only when all the finalizers are removed, the object is actually deleted.

Currently, when a resource is handled by the operator, the finalizers are always added on the object on its first appearance (before the handlers), and remove when it is marked for deletion (after the handlers).

If such an operator is stopped, the objects cannot be deleted, and the deletion command freezes — while there is no actual need to wait and to notify the operator (it will do nothing).

Expected Behavior

The finalisers should be optional. If there are no deletion handlers, the finalizers are not needed. If there are deletion handlers, the finalizers should be added.

Some deletion handlers can be explicitly marked as optional, thus ignored for the decision whether the finalizers are needed. The default assumption is that if the deletion handler exists, it is required.

@kopf.on.delete('', 'v1', 'pods', optional=True)
def pod_deleted(**_):
    pass

Two special cases:

If the object was created when there were no deletion handlers, and the finalizers were not added, but then a new operator version is started with the deletion handlers — the finalizers must be auto-added.
If the object was created when there were some deletion handlers, and the finalizers were added, but then a new operator version is started with no deletion handlers — the finalizers must be auto-removed.

Commented by dlmiddlecote at 2019-06-02 20:06:52+00:00

Hey nolar

I’d like to have a go at this issue, I should have some time over the coming week.

I have a preliminary question, if that’s ok.
In kopf.reactor.causation.detect_cause the finalizers are used to check whether the event is NEW or not. I think by not adding the finalizers sometimes we will break that check, and so we’ll probably need some other way to do the check.

Do you have any ideas what we could do? Or do you think things will still work fine?

Commented by nolar at 2019-06-04 07:24:53+00:00

dlmiddlecote Okay, I will find a way to assign it to you (it didn't work that easily as I thought).

Regarding the solution, there is one important criterion: The finalizers are mandatory if the operator contains the deletion handlers, and they are not optional. The reasons are simple: if the finalizers are absent, then

(1) the deletion handlers can be called when the object is already gone, thus breaking the handler's logic (e.g. 404 errors), and the after-handling patching (also 404);

(2) the deletion handlers are not guaranteed to execute — the finalizers block the object until all the handlers have succeeded (unlike the @kopf.on.event handlers, which are executed as "now or never").

I see two ways of solving it:

A: Signal to detect_cause() that you are fine with no NEW cause, and then detect it as CREATE — naturally happens if if ...: return NEW block is skipped. The signal comes based on registry.has_mandatory_deletion_handlers() or registry.has_blocking_handlers() (to be made; the names are just for example), similar to registry.has_cause_handlers() (exists and used) — somewhere in kopf.reactor.handling.custom_object_handler().

B: Always detect the NEW cause as it does now, but then, depending on registry.has_deletion_handlers(), treat it as CREATE. This implies that the NEW causes will trigger the creation handlers.

To my personal belief, the way A is more clean, and easier to test (there are no different reactions to the same cause depending on external factors, as the cause already represents those external factors (conceptually)).

In both cases, the unit-tests must be updated accordingly.

Commented by nolar at 2019-06-04 13:30:50+00:00

dlmiddlecote Okay, I've assigned it to myself to just have it assigned (not free). Consider this one as assigned to you — until I find a way to do this properly. Seems this feature is absent in GitHub.

Commented by dlmiddlecote at 2019-07-04 21:41:46+00:00

nolar - should this be closed?

Namespace-scoped operators

An issue by nolar at 2019-04-21 19:12:41+00:00
Original URL: zalando-incubator/kopf#32

Current Behaviour

Currently, --namespace= option is used only to limit the watch-event to the objects belonging to that namespace, and ignoring objects in other namespaces.

However, the framework continues to use the cluster-wide watching calls, and cluster-wide peering objects. The peering objects are now cluster-scoped by definition (in CRD), and there are no namespace-scoped peering objects at all.

Expected behaviour

As an operator developer, if I provide --namespace=something, I expect that the operator limits all its activities to that namespace only, and does not even request for the cluster-wide objects/queries — as they can be e.g. restricted by the permissions.

If I provide --namespace=something --peering=somepeering, I expect that the namespace-scoped peering object kind is used, not the cluster-scoped one.

Use-cases

The intended use-case 1: If an operator is a part of the application's deployment, and there are few instances of the same application deployed with different versions, but isolated by the namespaces. As a particular example: running a production, staging, and maybe experimental operators of the same application in different namespaces.

The intended use-case 2: Running in a cluster with strict RBAC rules, with no access to the cluster objects, restricted to one namespace only.

Steps to Reproduce the Problem

Create a RBAC service account with only the namespace permissions.
Deploy any of the example operators with --namespace=default (or any other namespace).
Observe how it fails on api.list_cluster_custom_object() (in queueing.py/watching.py).
- for kopfexamples
- for kopfpeerings

Acceptance Criteria

Silent handlers (spies)

An issue by nolar at 2019-04-18 14:56:04+00:00
Original URL: zalando-incubator/kopf#30

Currently, handlers are fully orchestrated by the framework, and their progress is stored in the object's status.kopf field. This limits the handlers only to the objects designed specially for this operator.

Sometimes, it is needed to watch after objects of different kind, or maybe even the built-in objects, such as pods, jobs, so on — e.g. created as the children of the main object. Putting the Kopf's progress field on these objects causes few problems:

Unnecessary changes & watch-events on those objects, as noticed by other controllers and operators.
Multiple Kopf-based operators will conflict with each other, as they use the same status subfield (see #23).

Kopf should provide a way to watch for the built-in objects silently:

import kopf

@kopf.on.event('', 'v1', 'pods')
def pod_changed(body, **kwargs):
    pass

This induced few limitations:

Progress is reported only in the logs, maybe in the k8s-events, but not in the status fields.
If the handler fails, there will be no retries as with the normal handlers. It can even miss the actual change and needed reaction in that case until the next event happens.
No cause detection will be provided (i.e. field diffs detection), only the raw events as they are sent by k8s.

This functionality is already present in the Kopf's reactor (the stream between the queueing & handling modules), so it makes sense to expose it as a feature.

Also, once done, add the missing docs for the sample problem: it should track when and if the PVCs are bound and finally unbound — so that the tutorial is indeed full and complete.

Silent handlers implemented.
Tests.
Docs on the feature in the "Handlers" section.
Tutorial extended for the PVC monitoring for being bound, and activating the deletion afterwards.

[PR] Adapt the README for PyPI

A pull request by nolar at 2019-04-23 11:02:42+00:00
Original URL: zalando-incubator/kopf#41
Merged by nolar at 2019-04-25 09:22:11+00:00

Issue : #12

PyPI now renders the README (long_description) improperly. It needs some hinting on the content type.

The links must be absolute, as they are shown not only in GitHub anymore.

The "Documentation" link on PyPI was pointing to README, but we now have the full docs.

nolar / kopf Goto Github PK

kopf's Introduction

Kubernetes Operator Pythonic Framework (Kopf)

Documentation

Features

Examples

Usage

Contributing

Versioning

License

Acknowledgments

kopf's People

Contributors

Stargazers

Watchers

Forkers

kopf's Issues

Actual Behavior

Expected Behavior

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

Current Behaviour

Expected Behaviour

Todos:

Actual Behavior

Expected Behaviour

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Expected Behavior

Actual Behavior

Actual Behaviour

Expected Behaviour

Steps to Reproduce the Problem

Specifications

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

One-line summary

Description

Types of Changes

Background

Suggestion: RBAC verification

Suggestion: RBAC generation

Extra: children objects introspection

Acceptance Criteria

Expected Behavior

Actual Behavior

Actual Behavior

Expected Behavior

Current Behaviour

Expected behaviour

Use-cases

Steps to Reproduce the Problem

Acceptance Criteria

Recommend Projects

Recommend Topics

Recommend Org