The opa-operator from stackabletech

RUSTSEC-2020-0159: Potential segfault in `localtime_r` invocations

Potential segfault in localtime_r invocations

Details
Package	`chrono`
Version	`0.4.19`
URL	chronotope/chrono#499
Date	2020-11-10

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

Workarounds

No workarounds are known.

References

time-rs/time#293

See advisory page for additional details.

Research: Do we need resource limits for the bundle builder container if we use config maps/pvcs?

The PR #347 implements resource limits and requests for the opa container. The second container opa-bundle-builder currently does not have any limits.

The opa-bundle-builder basically reads all provided OPA rego config maps and puts the content into a tar bundle. From the OPA docs this could grow to a quite a size.

Currently, the size of the data in ConfigMaps cannot exceed 1MB (etcd limit).

This can become a problem if there are many configmaps with rules / data.
The OpaBundleBuilder uses the tar crate, where it is stated that not all of the content must be explicitly in memory.

If rules are adapted dynamically, this also could put quite a load on the CPU when repacking the bundle.tar.gz (not sure we ever reach that many/big rules though).

This is done when:

The tar crate is tested for memory consumption
The tar crate is tested for CPU utilization
We have some tests / estimates if we need more resources than the default ones for the opa-bundle-builder (e.g. with 1000 - 10000 Configmaps a ~1MB)

Implement product image selection

For stackabletech/operator-rs#470
Reference implementation of stackabletech/operator-rs#476

Since OPA is a dependency for Druid, Kafka and Trino, this will require adaptation of integration tests.

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: Cannot find preset's package (github>whitesource/merge-confidence:beta)

Update operator-rs to include workaround for failed patches of status

The used version of operator-rs should be updated to include stackabletech/operator-rs#114

Write Kubernetes Events for errors during reconciliation

We would like to emit Kubernetes events for all errors, please see the epic for details stackabletech/issues#158

Decision logging currently disabled

In the OPA server config.yaml we can activate logging decisions to the console (see https://www.openpolicyagent.org/docs/latest/configuration/#decision-logs), which in turn would be picked up by vector.

services:
  - name: stackable
    url: http://localhost:3030/opa/v1

bundles:
  stackable:
    service: stackable
    resource: opa/bundle.tar.gz
    persist: true
    polling:
      min_delay_seconds: 10
      max_delay_seconds: 20

decision_logs:
    console: true

I tried this when implementing logging for OPA. The problem is that this will lead to a JSON response when querying the data API like

{"decision_id": "123123123",  "result": true}

Our self written Java OPA authorizers (Druid, Trino) are not able to deserialize that response via Jackson since they only expect to have the result field set.

If we want the decision logging to be enabled, it requires to touch all opa-authorizer versions and make the Response object more stable (there can be even more fields in the response if required).

See
stackabletech/druid-opa-authorizer#72
stackabletech/trino-opa-authorizer#24

RUSTSEC-2020-0071: Potential segfault in the time crate

Potential segfault in the time crate

Details
Package	`time`
Version	`0.1.44`
URL	time-rs/time#293
Date	2020-11-18
Patched versions	`>=0.2.23`
Unaffected versions	`=0.2.0,=0.2.1,=0.2.2,=0.2.3,=0.2.4,=0.2.5,=0.2.6`

Impact

Unix-like operating systems may segfault due to dereferencing a dangling pointer in specific circumstances. This requires an environment variable to be set in a different thread than the affected functions. This may occur without the user's knowledge, notably in a third-party library.

The affected functions from time 0.2.7 through 0.2.22 are:

time::UtcOffset::local_offset_at
time::UtcOffset::try_local_offset_at
time::UtcOffset::current_local_offset
time::UtcOffset::try_current_local_offset
time::OffsetDateTime::now_local
time::OffsetDateTime::try_now_local

The affected functions in time 0.1 (all versions) are:

at
at_utc

Non-Unix targets (including Windows and wasm) are unaffected.

Patches

Pending a proper fix, the internal method that determines the local offset has been modified to always return None on the affected operating systems. This has the effect of returning an Err on the try_* methods and UTC on the non-try_* methods.

Users and library authors with time in their dependency tree should perform cargo update, which will pull in the updated, unaffected code.

Users of time 0.1 do not have a patch and should upgrade to an unaffected version: time 0.2.23 or greater or the 0.3. series.

Workarounds

No workarounds are known.

References

time-rs/time#293

See advisory page for additional details.

Use sticky scheduler

Implement resource requests and limits for OPA pods

Part of this epic stackabletech/issues#241

Acceptance criteria

[NEW VERSION] 0.45.x

Which new version of OpenPolicyAgent should we support?

0.45.0

Additional information

Changelog

Changes required

No breaking changes.

Implementation checklist

Please don't change anything in this list.
Not all of these steps are necessary for all versions.

Update the Docker image
Update documentation to include supported version(s)
Update operator to support the new version (if needed)
Update integration tests to test use the new versions (in addition or replacing old versions
Update examples to use new versions

Refactor structure of operator

[NEW VERSION] OPA 0.51.0

Which new version of OpenPolicyAgent should we support?

Please specify the version, version range or version numbers to support, please also add these to the issue title

Additional information

If possible, provide a link to release notes/changelog

Changes required

Are there any upstream changes that we need to support?
e.g. new features, changed features, deprecated features etc.

Implementation checklist

Please don't change anything in this list.
Not all of these steps are necessary for all versions.

Update bundle builder stackabletech/opa-bundle-builder#9
Update the Docker image stackabletech/docker-images#382
Update documentation / examples / tests #451

Change version from an enum to a string

This allows for more flexibility and means we don't have to release a new operator for a new upstream version.

Bump operator-rs to 0.27.1

Update operator-rs to 0.27.1.

operator-rs bumped
operator has no dependecy on kube or serde_yaml (use stackabletech/operator-rs#450 instead)
Use Fragment wherever possible
Orphaned resources mechanism is used
Use parser for k9s Quantity instead of parsing memory values yourself

Product image selection will be tracked later on by stackabletech/issues#305 but should be pretty easy compared to the changes in this Issue

CRD has `repoRuleReference` while it should be `regoRuleReference`

Handle stale information/clean up stale resources

Currently our operators will not act on removed information from the CR in some/most/all cases.

One example:
HBase operator has three roles (master, regionServer, restServer). If I create a HBase server CR with a restServer component and then remove it later (entirely, not setting replicas to 0) our operator will not clean up the STS that belongs to this role.

This is done when all stale STSs (and other resources not needed anymore) are cleaned up when they are not needed anymore.

NOTE: This is part of an epic (stackabletech/issues#203) and might not apply to this operator. If that is the case please comment on this issue and just close it. This issue was created as part of a special bulk creation operation.

Document supported config overrides

Most settings can be overridden today already even though they are not exposed as fields in the CRD.
This functionality however is hidden and not documented.

Please document all the possible files, CLI and Env (if applicable) that can be overridden.

You can take a look at the Druid PR: stackabletech/druid-operator#154
But please feel free to improve if you find anything.

RUSTSEC-2020-0016: `net2` crate has been deprecated; use `socket2` instead

net2 crate has been deprecated; use socket2 instead

Details
Status	unmaintained
Package	`net2`
Version	`0.2.37`
URL	deprecrated/net2-rs@`3350e38`
Date	2020-05-01

The net2 crate has been deprecated
and users are encouraged to considered socket2 instead.

See advisory page for additional details.

Support specifying a namespace to watch

Currently this operator watches resources in all namespaces.
I'd like this to be configurable so I can specify which namespace to watch.

This should be a clap argument (which then can be provided on the command line or in an env var) called --watch-namespace.
It is okay to only take a single namespace for now.

Implement the above description
Document the default behavior and the new parameter

See stackabletech/issues#162 for the overarching epic

`built` information is printed before CLI handling

This is a bug, the information should only be printed after CLI stuff has been handled.

We could argue for an additional CLI flag to also print this but that's adifferent matter. The current implementation makes the printing of the CRDs useless as it includes the diagnostic information.

RUSTSEC-2018-0017: `tempdir` crate has been deprecated; use `tempfile` instead

tempdir crate has been deprecated; use tempfile instead

Details
Status	unmaintained
Package	`tempdir`
Version	`0.3.7`
URL	rust-lang-deprecated/tempdir#46
Date	2018-02-13

The tempdir crate has been deprecated
and the functionality is merged into tempfile.

See advisory page for additional details.

Allow configuring OPA log level

Requested in https://stackable-workspace.slack.com/archives/C02JK31U59R/p1667988811260369

Document service discovery

As a user of services deployed by this operator I'd like to know how to discover its connection details.

It's done when

documentation is available on all the objects (e.g. ConfigMaps) created by this operator and the circumstances under which they are created and
documentation on the contents of the created objects is available.

NOTE: This ticket is part of an epic and autocreated for all our operators. It might not apply to this operator in particular, in that case please comment and close
stackabletech/documentation#86

CLI handling should print help message when CRD subcommand has no arguments

Currently when one calls an operator with one the crd subcommands but no actual parameter the operator will just start.

We'd like to print the help message instead.

Currently
stackable-operator crd restart -> no help message, operator starts
stackable-operator crd restart -p -> CRD is printed, operator exists

Intended
stackable-operator crd restart -> help message is printed, operator exits
stackable-operator crd restart -p -> CRD is printed, operator exists

LDAP user attributes (groups) in OPA

As a user/admin, I want to make policy decisions based on user attributes such as group membership, that I want to define in LDAP. I want to use these user attributes in my RegoRules in OPA.

This is done when

we have a solution design (ADR?) that solves the issue described above of getting group memberships from LDAP in a scalable way (i.e. not every OPA should contact LDAP every time to get all group information)
the solution should evaluate push vs. pull as well

(Note: This could be SSSD, but it's only one option)

Original below

A couple of points from our discussion:

We want the group handling inside OPA to not have to split it across all products, so (1) is out.
We also do not want to do a lot of LDAP requests constantly so (2c) is also out
We want to do either (2a) or (2b)

We are still figuring out how we will do that exactly.

Previous issue text for context:

As a user/admin, I want to make policy decisions based on user attributes such as group membership, that I want to define in LDAP. I want to use these user attributes in my RegoRules in OPA.

OPA already has documentation on how to do this, there are 2 (4) variants:

put the user attributes into the policy decision request
OPA gets the attributes directly from LDAP
a. periodically pull from LDAP bundle API
b. push LDAP into OPA
c. pull LDAP info on demand

Since not every product might support forwarding LDAP information to OPA, and since we probably have access to the LDAP server anyways, I'm inclined to go with Option 2. Between 2a, 2b and 2c I am unsure. It's worth reading the linked docs for detailed trade-offs, there's a summary table at the end as well outlining some trade-offs.

Some things to consider are:

The amount of network requests and potential lag when chainging/fowarding/depending on other requests
Staleness of cached LDAP info
Size of the bundles being transferred and stored
Duplication across OPA instances

Print startup diagnostic information using the `built` crate

See stackabletech/spark-operator#118 for an example
This is blocked by stackabletech/operator-rs#192

Documentation: Improve landing page & split usage page

part of: stackabletech/documentation#408

example of an improved landing page: https://docs.stackable.tech/home/stable/druid/index.html

The new landing page should feature:

meta tags and a meta description
an introduction of the tool
a section about getting started
a description and diagram of the resources it creates
a description of its dependencies or other operators that interact with it
a link to demos that use the operator
the versions (as already present)

To split the usage guide, turn it into a section like here. For pages that exist in Druid/other operators, try to keep the same ordering as in those operators.

Enable Log Aggregation for OPA

See stackabletech/hbase-operator#291 and stackabletech/docker-images#283 for reference.

This is part of stackabletech/issues#288

Related: #373

TODO

Remove deploy/crd folder and generation

This should only be done once updated with templating from stackabletech/operator-templating#55.

Disable CRD serialization from build.rs
Delete deploy/crd folder

RUSTSEC-2022-0048: xml-rs is Unmaintained

xml-rs is Unmaintained

Details
Status	unmaintained
Package	`xml-rs`
Version	`0.8.4`
URL	https://github.com/netvl/xml-rs/issues
Date	2022-01-26

xml-rs is a XML parser has open issues around parsing including integer
overflows / panics that may or may not be an issue with untrusted data.

Together with these open issues with Unmaintained status xml-rs
may or may not be suited to parse untrusted data.

Alternatives

quick-xml

See advisory page for additional details.

OPA: Enable monitoring

The problem

The OPA-Service is currently not scraped by the monitoring service (Prometheus).

Proposed solution

Label OPA pods upon creation so that they are found by the monitoring service.

Acceptance criteria

Docs are updated.
Testable in the test-dev-cluster
Assets are bundled in operator packages (deb+rpm)

Automatically migrate container name 23.1 -> 23.4

Affected version

23.4.0

Current and expected behavior

In 23.1 the bundle-builder container within the OPA DaemonSet was called opa-bundle-builder.
In #420 it was renamed to bundle-builder.
When upgrading a OPA cluster 23.1 -> 23.4 the opa-operator patches the DaemonSet, so we end up with both containers.
The bundle-builder containers fails with
thread 'main' panicked at 'error binding to 0.0.0.0:3030: error creating server listener: Address already in use (os error 98)', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/warp-0 .3.3/src/server.rs:213:27

Possible solution

No response

Additional context

No response

Environment

No response

Would you like to work on fixing this bug?

yes

Convert Operator to K8S Architecture

These Acceptance Criteria need to be met:

Pods are changed to use Docker images instead of the current "packages"
hostNetwork is used (for now)
Volumes and VolumeMounts are added (for hostPath/local volumes)
command line templates are fixed

This depends on stackabletech/docker-images#6 which provides initial Docker images but might require further changes to the images.

Implement Versioning for up and downgrades from operator-rs

Support offline mode

Current status

In the OPA operator repository we build 2 docker images. The operator itself and the bundle builder.
The OPA operator then references the OPA product image (build from the docker repository) as well as the bundle builder (build from the operator repository).
Both end up in our docker.stackable.tech/stackable repository.

Problems

The OPA product image is configurable via the new ProductImage (#385).
The bundle builder image is still hardcoded to the stackable docker repository.

This will only work offline if the mirrored repository is named exactly like the stackable repository.

Questions

Should we seperate the OPA operator and the bundle builder?
How to version the bundle builder? (currently versioned like the operator)
How to get the bundle builder into the opa image (curl exectuable, copy from docker image ...)

Goal

OPA operator should work offline and independent of the Stackable repository.

Solutions

1) Extract bundle builder in its own repository (like the *-opa-authorizers)

The executable will be stored in the nexus package repository and curl`d in the OPA product image docker build.

OPA operator and bundle builder do not run in the same Pod but are coupled together in the operator repository. Making a new repository would decouple this.
Different OPA operator and bundle builder versions possible (the bundle builder probably will not change much)

Multi arch concerns (more complex actions)
Extra repository, extra versioning required
Changes in config settings in the operator / bundle builder may lead to new releases for both

2) Keep build structure as is and copy bundle builder from its docker image to opa product image

No actions required
Probably no problems with multi arch

Circular dependecies for release process, e.g. OPA operator and bundle builder must be built first to be referenced in the OPA product image

3) Extend the OPA CRD with an explicit product image for the bundle builder

Fast implementation and light-weight solution (docker image already exists, only needs to be mirrored for offline mode)

CRD becomes more complex [breaking]

Outcome

Decided on Solution 1. Extract the bundle builder from the operator and use in the opa image.

TODO

Create new rep opa-bundle-builder and tag
Adapt opa docker image to use and build from opa-bundle-builder tag
Create new tags for all opa versions
Remove the hardcoded stackable repository from the init container and bundle builder and replace with opa image
Adapt unit tests if available
Adapt integration tests to new opa image tags
Adapt documentation to new opa image tags

Docs: Getting started guide

epic: stackabletech/documentation#237

Acceptance criteria:

a new module getting_started exists, with an index.adoc, a installation.adoc and first_steps.adoc
The old installation.adoc is removed.
All the yaml and shell snippets are in the examples/code directory and can be executed as a script, to test the documentation
versions to install are templated using the template_docs.sh script and the templating_vars.yaml file.
Any example files that are incorporated into the getting started guide are removed from the examples directory (because there they are not tested, with the new script they will be tested).

Enable Prometheus scraping

Annotation prometheus.io/scrape: "true" should be added to role services

Enable Prometheus scraping

Annotation prometheus.io/scrape: "true" should be added to role services

Document the BundleBuilder and Rego rule ConfigMaps

We currently do not explain the concept of the bundle builder (except two sentences in https://docs.stackable.tech/opa/stable/implementation-notes.html).
A Rego config map is shown in the getting started guide but not further explained. This should be better documented (in combination with the bundle builder).

Acceptance

Implementation Notes / Concepts are extended to explain the bundle builder
- Extra Container with a local webserver, reading rego configmaps and providing a bundle.tar requested by the OPA container
- Pros (and Cons):
  - Pro: Bundles can be dynamically extendend / reduced by adding or removing rego config maps
  - Optional Cons: Depending on the amount of rules, zipping these can lead to high CPU consumption
Implementation Notes or Concepts are extended to explain the rego config maps
- Example rego rule configmaps are shown and explainend
- How this can be extended / dynamically adapted

regoRuleReference -> find better name or remove

From the CRD Review.

RUSTSEC-2021-0139: ansi_term is Unmaintained

ansi_term is Unmaintained

Details
Status	unmaintained
Package	`ansi_term`
Version	`0.12.1`
URL	ogham/rust-ansi-term#72
Date	2021-08-18

The maintainer has adviced this crate is deprecated and will not
receive any maintenance.

The crate does not seem to have much dependencies and may or may not be ok to use as-is.

Last release seems to have been three years ago.

Possible Alternative(s)

The below list has not been vetted in any way and may or may not contain alternatives;

See advisory page for additional details.

Operator does not respect "replicas"

I noticed that the operator ignores the "replicas" setting on a rolegroup. That is unexpected. I know that it uses a daemonset and that therefore the replicas setting probably doesn't make sense, but maybe we should remove it in that case.

Not very high priority though I believe.

docs test unpredictable

The docs tests are not awaiting an OPA update to test if the ConfigMap was loaded. This should be fixed somehow. Ideally we could know once the bundle is loaded, but it looks like we cannot know that, in that case we'd need to use a "sleep"

RUSTSEC-2020-0036: failure is officially deprecated/unmaintained

failure is officially deprecated/unmaintained

Details
Status	unmaintained
Package	`failure`
Version	`0.1.8`
URL	rust-lang-deprecated/failure#347
Date	2020-05-02

The failure crate is officially end-of-life: it has been marked as deprecated
by the former maintainer, who has announced that there will be no updates or
maintenance work on it going forward.

The following are some suggested actively developed alternatives to switch to:

See advisory page for additional details.

RUSTSEC-2021-0124: Data race when sending and receiving after closing a `oneshot` channel

Data race when sending and receiving after closing a oneshot channel

Details
Package	`tokio`
Version	`0.1.22`
URL	tokio-rs/tokio#4225
Date	2021-11-16
Patched versions	`>=1.8.4, <1.9.0,>=1.13.1`
Unaffected versions	`<0.1.14`

If a tokio::sync::oneshot channel is closed (via the
oneshot::Receiver::close method), a data race may occur if the
oneshot::Sender::send method is called while the corresponding
oneshot::Receiver is awaited or calling try_recv.

When these methods are called concurrently on a closed channel, the two halves
of the channel can concurrently access a shared memory location, resulting in a
data race. This has been observed to cause memory corruption.

Note that the race only occurs when both halves of the channel are used
after the Receiver half has called close. Code where close is not used, or where the
Receiver is not awaited and try_recv is not called after calling close,
is not affected.

See tokio#4225 for more details.

See advisory page for additional details.

opa 0.40.0

Which new version of OpenPolicyAgent should we support?

v0.40.0

Additional information

https://github.com/open-policy-agent/opa/releases/tag/v0.40.0

Changes required

Are there any upstream changes that we need to support?
e.g. new features, changed features, deprecated features etc.

Implementation checklist

Please don't change anything in this list.
Not all of these steps are necessary for all versions.

Update the Docker image
Update documentation to include supported version(s)
Update operator to support the new version (if needed)
Update integration tests to test use the new versions (in addition or replacing old versions
Update examples to use new versions

RUSTSEC-2021-0153: `encoding` is unmaintained

encoding is unmaintained

Details
Status	unmaintained
Package	`encoding`
Version	`0.2.33`
URL	lifthrasiir/rust-encoding#127
Date	2021-12-05

Last release was on 2016-08-28. The issue inquiring as to the status of the crate has gone unanswered by the maintainer.

Possible alternatives

encoding_rs

See advisory page for additional details.

stackabletech / opa-operator Goto Github PK

opa-operator's People

Contributors

Stargazers

Watchers

Forkers

opa-operator's Issues

Impact

Workarounds

References

Impact

Patches

Workarounds

References

Acceptance criteria

TODO

Alternatives

The problem

Proposed solution

Acceptance criteria

Affected version

Current and expected behavior

Possible solution

Additional context

Environment

Would you like to work on fixing this bug?

Current status

Problems

Questions

Goal

Solutions

1) Extract bundle builder in its own repository (like the *-opa-authorizers)

2) Keep build structure as is and copy bundle builder from its docker image to opa product image

3) Extend the OPA CRD with an explicit product image for the bundle builder

Outcome

TODO

Acceptance

Possible Alternative(s)

Possible alternatives

Recommend Projects

Recommend Topics

Recommend Org