Git Product home page Git Product logo

opa-operator's People

Contributors

adwk67 avatar backstreetkiwi avatar bors[bot] avatar dependabot[bot] avatar fhennig avatar lfrancke avatar maltesander avatar nightkr avatar razvan avatar renovate-bot avatar renovate[bot] avatar sbernauer avatar soenkeliebau avatar stackable-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

pipern tomverelst

opa-operator's Issues

RUSTSEC-2020-0159: Potential segfault in `localtime_r` invocations

Potential segfault in localtime_r invocations

Details
Package chrono
Version 0.4.19
URL chronotope/chrono#499
Date 2020-11-10

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

Workarounds

No workarounds are known.

References

See advisory page for additional details.

Research: Do we need resource limits for the bundle builder container if we use config maps/pvcs?

The PR #347 implements resource limits and requests for the opa container. The second container opa-bundle-builder currently does not have any limits.

The opa-bundle-builder basically reads all provided OPA rego config maps and puts the content into a tar bundle. From the OPA docs this could grow to a quite a size.

Currently, the size of the data in ConfigMaps cannot exceed 1MB (etcd limit).

This can become a problem if there are many configmaps with rules / data.
The OpaBundleBuilder uses the tar crate, where it is stated that not all of the content must be explicitly in memory.

If rules are adapted dynamically, this also could put quite a load on the CPU when repacking the bundle.tar.gz (not sure we ever reach that many/big rules though).

This is done when:

  • The tar crate is tested for memory consumption
  • The tar crate is tested for CPU utilization
  • We have some tests / estimates if we need more resources than the default ones for the opa-bundle-builder (e.g. with 1000 - 10000 Configmaps a ~1MB)

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>whitesource/merge-confidence:beta)

Decision logging currently disabled

In the OPA server config.yaml we can activate logging decisions to the console (see https://www.openpolicyagent.org/docs/latest/configuration/#decision-logs), which in turn would be picked up by vector.

services:
  - name: stackable
    url: http://localhost:3030/opa/v1

bundles:
  stackable:
    service: stackable
    resource: opa/bundle.tar.gz
    persist: true
    polling:
      min_delay_seconds: 10
      max_delay_seconds: 20

decision_logs:
    console: true

I tried this when implementing logging for OPA. The problem is that this will lead to a JSON response when querying the data API like

{"decision_id": "123123123",  "result": true}

Our self written Java OPA authorizers (Druid, Trino) are not able to deserialize that response via Jackson since they only expect to have the result field set.

If we want the decision logging to be enabled, it requires to touch all opa-authorizer versions and make the Response object more stable (there can be even more fields in the response if required).

See
stackabletech/druid-opa-authorizer#72
stackabletech/trino-opa-authorizer#24

RUSTSEC-2020-0071: Potential segfault in the time crate

Potential segfault in the time crate

Details
Package time
Version 0.1.44
URL time-rs/time#293
Date 2020-11-18
Patched versions >=0.2.23
Unaffected versions =0.2.0,=0.2.1,=0.2.2,=0.2.3,=0.2.4,=0.2.5,=0.2.6

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

The affected functions from time 0.2.7 through 0.2.22 are:

  • time::UtcOffset::local_offset_at
  • time::UtcOffset::try_local_offset_at
  • time::UtcOffset::current_local_offset
  • time::UtcOffset::try_current_local_offset
  • time::OffsetDateTime::now_local
  • time::OffsetDateTime::try_now_local

The affected functions in time 0.1 (all versions) are:

  • at
  • at_utc

Non-Unix targets (including Windows and wasm) are unaffected.

Patches

Pending a proper fix, the internal method that determines the local offset has been modified to always return None on the affected operating systems. This has the effect of returning an Err on the try_* methods and UTC on the non-try_* methods.

Users and library authors with time in their dependency tree should perform cargo update, which will pull in the updated, unaffected code.

Users of time 0.1 do not have a patch and should upgrade to an unaffected version: time 0.2.23 or greater or the 0.3. series.

Workarounds

No workarounds are known.

References

time-rs/time#293

See advisory page for additional details.

Implement resource requests and limits for OPA pods

Part of this epic stackabletech/issues#241

Acceptance criteria

  • Resource requests and limits are configurable in CRD using the common structs from operator-rs
  • Resource requests and limits are configured for Kubernetes pods
  • Resource requests and limits are configured in the product (e.g. "-Xmx" etc. for Java based images)
  • Adapt/Add integration tests to specify and test correct amount of resources
  • Adapt/Add examples
  • Adapt documentation: New section in usage.adoc with product specific information and link to common shared resources concept
  • Optional: Use sensible defaults for each role (if reasonable and applicable) and document accordingly in usage.adoc
  • Code contains useful comments
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)
  • Helm chart can be installed and deployed operator works (or not applicable)
  • Feature Tracker has been updated
  • Followup tickets have been created if needed (e.g. to update demos)

[NEW VERSION] 0.45.x

Which new version of OpenPolicyAgent should we support?

0.45.0

Additional information

Changelog

Changes required

No breaking changes.

Implementation checklist

Please don't change anything in this list.
Not all of these steps are necessary for all versions.

  • Update the Docker image
  • Update documentation to include supported version(s)
  • Update operator to support the new version (if needed)
  • Update integration tests to test use the new versions (in addition or replacing old versions
  • Update examples to use new versions

Refactor structure of operator

  • move server, operator & crd to rust
  • change paths in workspace toml
  • rename server directory
  • copy packaging folder to top level
  • rename binary crate in cargo.toml
  • add bin section to toml
  • add ../ to assets in toml
  • remove -server in assets
  • rename src/main.rs
  • add ../ to build.rs
  • add rust flags to rust.yml
  • copy 2 publish scripts over & change product name
  • copy build_rpm and copy_assets.py
  • rename packaging/debian/service to stackable-product-operator.service
  • change execstart -> remove -server
  • rename packaging/rpm/specs/ -> remove server from name
  • change execstart -> remove -server
  • rename packaging/rpm/...../.service -> remove -server
  • rename packaging/rpm/SOURCES/ -> remove server from subdir name
  • change -server to -binary in deny.toml
  • change first -server to -binary and remove second server in Dockerfile
  • change packaging/rpm/specs -> change env var for _name

[NEW VERSION] OPA 0.51.0

Which new version of OpenPolicyAgent should we support?

Please specify the version, version range or version numbers to support, please also add these to the issue title

Additional information

If possible, provide a link to release notes/changelog

Changes required

Are there any upstream changes that we need to support?
e.g. new features, changed features, deprecated features etc.

Implementation checklist

Please don't change anything in this list.
Not all of these steps are necessary for all versions.

Bump operator-rs to 0.27.1

Update operator-rs to 0.27.1.

  • operator-rs bumped
  • operator has no dependecy on kube or serde_yaml (use stackabletech/operator-rs#450 instead)
  • Use Fragment wherever possible
  • Orphaned resources mechanism is used
  • Use parser for k9s Quantity instead of parsing memory values yourself

Product image selection will be tracked later on by stackabletech/issues#305 but should be pretty easy compared to the changes in this Issue

Handle stale information/clean up stale resources

Currently our operators will not act on removed information from the CR in some/most/all cases.

One example:
HBase operator has three roles (master, regionServer, restServer). If I create a HBase server CR with a restServer component and then remove it later (entirely, not setting replicas to 0) our operator will not clean up the STS that belongs to this role.

This is done when all stale STSs (and other resources not needed anymore) are cleaned up when they are not needed anymore.

NOTE: This is part of an epic (stackabletech/issues#203) and might not apply to this operator. If that is the case please comment on this issue and just close it. This issue was created as part of a special bulk creation operation.

Document supported config overrides

Most settings can be overridden today already even though they are not exposed as fields in the CRD.
This functionality however is hidden and not documented.

Please document all the possible files, CLI and Env (if applicable) that can be overridden.

You can take a look at the Druid PR: stackabletech/druid-operator#154
But please feel free to improve if you find anything.

Support specifying a namespace to watch

Currently this operator watches resources in all namespaces.
I'd like this to be configurable so I can specify which namespace to watch.

This should be a clap argument (which then can be provided on the command line or in an env var) called --watch-namespace.
It is okay to only take a single namespace for now.

  • Implement the above description
  • Document the default behavior and the new parameter

See stackabletech/issues#162 for the overarching epic

`built` information is printed before CLI handling

This is a bug, the information should only be printed after CLI stuff has been handled.

We could argue for an additional CLI flag to also print this but that's adifferent matter. The current implementation makes the printing of the CRDs useless as it includes the diagnostic information.

Document service discovery

As a user of services deployed by this operator I'd like to know how to discover its connection details.

It's done when

  • documentation is available on all the objects (e.g. ConfigMaps) created by this operator and the circumstances under which they are created and
  • documentation on the contents of the created objects is available.

NOTE: This ticket is part of an epic and autocreated for all our operators. It might not apply to this operator in particular, in that case please comment and close
stackabletech/documentation#86

CLI handling should print help message when CRD subcommand has no arguments

Currently when one calls an operator with one the crd subcommands but no actual parameter the operator will just start.

We'd like to print the help message instead.

Currently
stackable-operator crd restart -> no help message, operator starts
stackable-operator crd restart -p -> CRD is printed, operator exists

Intended
stackable-operator crd restart -> help message is printed, operator exits
stackable-operator crd restart -p -> CRD is printed, operator exists

LDAP user attributes (groups) in OPA

As a user/admin, I want to make policy decisions based on user attributes such as group membership, that I want to define in LDAP. I want to use these user attributes in my RegoRules in OPA.

This is done when

  • we have a solution design (ADR?) that solves the issue described above of getting group memberships from LDAP in a scalable way (i.e. not every OPA should contact LDAP every time to get all group information)
  • the solution should evaluate push vs. pull as well

(Note: This could be SSSD, but it's only one option)


Original below

A couple of points from our discussion:

  • We want the group handling inside OPA to not have to split it across all products, so (1) is out.
  • We also do not want to do a lot of LDAP requests constantly so (2c) is also out
  • We want to do either (2a) or (2b)

We are still figuring out how we will do that exactly.


Previous issue text for context:

As a user/admin, I want to make policy decisions based on user attributes such as group membership, that I want to define in LDAP. I want to use these user attributes in my RegoRules in OPA.

OPA already has documentation on how to do this, there are 2 (4) variants:

  1. put the user attributes into the policy decision request
  2. OPA gets the attributes directly from LDAP
    a. periodically pull from LDAP bundle API
    b. push LDAP into OPA
    c. pull LDAP info on demand

Since not every product might support forwarding LDAP information to OPA, and since we probably have access to the LDAP server anyways, I'm inclined to go with Option 2. Between 2a, 2b and 2c I am unsure. It's worth reading the linked docs for detailed trade-offs, there's a summary table at the end as well outlining some trade-offs.

Some things to consider are:

  • The amount of network requests and potential lag when chainging/fowarding/depending on other requests
  • Staleness of cached LDAP info
  • Size of the bundles being transferred and stored
  • Duplication across OPA instances

Documentation: Improve landing page & split usage page

part of: stackabletech/documentation#408

example of an improved landing page: https://docs.stackable.tech/home/stable/druid/index.html

The new landing page should feature:

  • meta tags and a meta description
  • an introduction of the tool
  • a section about getting started
  • a description and diagram of the resources it creates
  • a description of its dependencies or other operators that interact with it
  • a link to demos that use the operator
  • the versions (as already present)

To split the usage guide, turn it into a section like here. For pages that exist in Druid/other operators, try to keep the same ordering as in those operators.

RUSTSEC-2022-0048: xml-rs is Unmaintained

xml-rs is Unmaintained

Details
Status unmaintained
Package xml-rs
Version 0.8.4
URL https://github.com/netvl/xml-rs/issues
Date 2022-01-26

xml-rs is a XML parser has open issues around parsing including integer
overflows / panics that may or may not be an issue with untrusted data.

Together with these open issues with Unmaintained status xml-rs
may or may not be suited to parse untrusted data.

Alternatives

See advisory page for additional details.

OPA: Enable monitoring

The problem

The OPA-Service is currently not scraped by the monitoring service (Prometheus).

Proposed solution

Label OPA pods upon creation so that they are found by the monitoring service.

Acceptance criteria

  • Docs are updated.
  • Testable in the test-dev-cluster
  • Assets are bundled in operator packages (deb+rpm)

Automatically migrate container name 23.1 -> 23.4

Affected version

23.4.0

Current and expected behavior

In 23.1 the bundle-builder container within the OPA DaemonSet was called opa-bundle-builder.
In #420 it was renamed to bundle-builder.
When upgrading a OPA cluster 23.1 -> 23.4 the opa-operator patches the DaemonSet, so we end up with both containers.
The bundle-builder containers fails with
thread 'main' panicked at 'error binding to 0.0.0.0:3030: error creating server listener: Address already in use (os error 98)', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/warp-0 .3.3/src/server.rs:213:27

Possible solution

No response

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

yes

Convert Operator to K8S Architecture

These Acceptance Criteria need to be met:

  • Pods are changed to use Docker images instead of the current "packages"
  • hostNetwork is used (for now)
  • Volumes and VolumeMounts are added (for hostPath/local volumes)
  • command line templates are fixed

This depends on stackabletech/docker-images#6 which provides initial Docker images but might require further changes to the images.

Support offline mode

Current status

In the OPA operator repository we build 2 docker images. The operator itself and the bundle builder.
The OPA operator then references the OPA product image (build from the docker repository) as well as the bundle builder (build from the operator repository).
Both end up in our docker.stackable.tech/stackable repository.

Problems

The OPA product image is configurable via the new ProductImage (#385).
The bundle builder image is still hardcoded to the stackable docker repository.

This will only work offline if the mirrored repository is named exactly like the stackable repository.

Questions

  • Should we seperate the OPA operator and the bundle builder?
  • How to version the bundle builder? (currently versioned like the operator)
  • How to get the bundle builder into the opa image (curl exectuable, copy from docker image ...)

Goal

OPA operator should work offline and independent of the Stackable repository.

Solutions

1) Extract bundle builder in its own repository (like the *-opa-authorizers)

The executable will be stored in the nexus package repository and curl`d in the OPA product image docker build.

  • OPA operator and bundle builder do not run in the same Pod but are coupled together in the operator repository. Making a new repository would decouple this.
  • Different OPA operator and bundle builder versions possible (the bundle builder probably will not change much)
  • Multi arch concerns (more complex actions)
  • Extra repository, extra versioning required
  • Changes in config settings in the operator / bundle builder may lead to new releases for both

2) Keep build structure as is and copy bundle builder from its docker image to opa product image

  • No actions required
  • Probably no problems with multi arch
  • Circular dependecies for release process, e.g. OPA operator and bundle builder must be built first to be referenced in the OPA product image

3) Extend the OPA CRD with an explicit product image for the bundle builder

  • Fast implementation and light-weight solution (docker image already exists, only needs to be mirrored for offline mode)
  • CRD becomes more complex [breaking]

Outcome

Decided on Solution 1. Extract the bundle builder from the operator and use in the opa image.

TODO

  • Create new rep opa-bundle-builder and tag
  • Adapt opa docker image to use and build from opa-bundle-builder tag
  • Create new tags for all opa versions
  • Remove the hardcoded stackable repository from the init container and bundle builder and replace with opa image
  • Adapt unit tests if available
  • Adapt integration tests to new opa image tags
  • Adapt documentation to new opa image tags

Docs: Getting started guide

epic: stackabletech/documentation#237

Acceptance criteria:

  • a new module getting_started exists, with an index.adoc, a installation.adoc and first_steps.adoc
  • The old installation.adoc is removed.
  • All the yaml and shell snippets are in the examples/code directory and can be executed as a script, to test the documentation
  • versions to install are templated using the template_docs.sh script and the templating_vars.yaml file.
  • Any example files that are incorporated into the getting started guide are removed from the examples directory (because there they are not tested, with the new script they will be tested).

Document the BundleBuilder and Rego rule ConfigMaps

We currently do not explain the concept of the bundle builder (except two sentences in https://docs.stackable.tech/opa/stable/implementation-notes.html).
A Rego config map is shown in the getting started guide but not further explained. This should be better documented (in combination with the bundle builder).

Acceptance

  • Implementation Notes / Concepts are extended to explain the bundle builder
    • Extra Container with a local webserver, reading rego configmaps and providing a bundle.tar requested by the OPA container
    • Pros (and Cons):
      • Pro: Bundles can be dynamically extendend / reduced by adding or removing rego config maps
      • Optional Cons: Depending on the amount of rules, zipping these can lead to high CPU consumption
  • Implementation Notes or Concepts are extended to explain the rego config maps
    • Example rego rule configmaps are shown and explainend
    • How this can be extended / dynamically adapted

RUSTSEC-2021-0139: ansi_term is Unmaintained

ansi_term is Unmaintained

Details
Status unmaintained
Package ansi_term
Version 0.12.1
URL ogham/rust-ansi-term#72
Date 2021-08-18

The maintainer has adviced this crate is deprecated and will not
receive any maintenance.

The crate does not seem to have much dependencies and may or may not be ok to use as-is.

Last release seems to have been three years ago.

Possible Alternative(s)

The below list has not been vetted in any way and may or may not contain alternatives;

See advisory page for additional details.

Operator does not respect "replicas"

I noticed that the operator ignores the "replicas" setting on a rolegroup. That is unexpected. I know that it uses a daemonset and that therefore the replicas setting probably doesn't make sense, but maybe we should remove it in that case.

Not very high priority though I believe.

docs test unpredictable

The docs tests are not awaiting an OPA update to test if the ConfigMap was loaded. This should be fixed somehow. Ideally we could know once the bundle is loaded, but it looks like we cannot know that, in that case we'd need to use a "sleep"

RUSTSEC-2020-0036: failure is officially deprecated/unmaintained

failure is officially deprecated/unmaintained

Details
Status unmaintained
Package failure
Version 0.1.8
URL rust-lang-deprecated/failure#347
Date 2020-05-02

The failure crate is officially end-of-life: it has been marked as deprecated
by the former maintainer, who has announced that there will be no updates or
maintenance work on it going forward.

The following are some suggested actively developed alternatives to switch to:

See advisory page for additional details.

RUSTSEC-2021-0124: Data race when sending and receiving after closing a `oneshot` channel

Data race when sending and receiving after closing a oneshot channel

Details
Package tokio
Version 0.1.22
URL tokio-rs/tokio#4225
Date 2021-11-16
Patched versions >=1.8.4, <1.9.0,>=1.13.1
Unaffected versions <0.1.14

If a tokio::sync::oneshot channel is closed (via the
oneshot::Receiver::close method), a data race may occur if the
oneshot::Sender::send method is called while the corresponding
oneshot::Receiver is awaited or calling try_recv.

When these methods are called concurrently on a closed channel, the two halves
of the channel can concurrently access a shared memory location, resulting in a
data race. This has been observed to cause memory corruption.

Note that the race only occurs when both halves of the channel are used
after the Receiver half has called close. Code where close is not used, or where the
Receiver is not awaited and try_recv is not called after calling close,
is not affected.

See tokio#4225 for more details.

See advisory page for additional details.

opa 0.40.0

Which new version of OpenPolicyAgent should we support?

v0.40.0

Additional information

https://github.com/open-policy-agent/opa/releases/tag/v0.40.0

Changes required

Are there any upstream changes that we need to support?
e.g. new features, changed features, deprecated features etc.

Implementation checklist

Please don't change anything in this list.
Not all of these steps are necessary for all versions.

  • Update the Docker image
  • Update documentation to include supported version(s)
  • Update operator to support the new version (if needed)
  • Update integration tests to test use the new versions (in addition or replacing old versions
  • Update examples to use new versions

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.