jetstack / jetstack-secure Goto Github PK

View Code? Open in Web Editor NEW

251.0 19.0 24.0 1.7 MB

Open source components of Jetstack Secure

Home Page: https://www.jetstack.io/jetstack-secure/

License: Apache License 2.0

Makefile 2.86% Go 94.13% Shell 0.10% Dockerfile 0.79% Smarty 2.13%

certificate-management cert-manager kubernetes cluster-configuration opa

jetstack-secure's People

Contributors

Stargazers

Watchers

jetstack-secure's Issues

Split "exporters" into "matchers" and "exporters"

The current "exporters" put together evaluation results and metadata and then render the result in a certain format.

The process of putting together results and metadata should be decoupled from rendering that "pack" into different formats. Here is where the "matcher" + "exporter" pattern comes into place.

As part of this change, the interface for the exporters needs to change to accept a Report struct containing both the result and the metadata, instead of receiving the policy manifest and a result collection as it does now.

Document the running of the agent

We need to document the following:

how to run the agent command (from source for now)
what the different values in the agent config file do

The target for this documentation is a developer. It might be good to show the using of the echo server too for testing.

Development version instead of actual version in docker image

The binary in the docker image does display the version information.

What happened

$ docker run quay.io/jetstack/preflight:v0.1.9 version -v
Preflight version:  development

Commit:
Built:
Go:

What should happen

$ docker run quay.io/jetstack/preflight:v0.1.9 version -v
Preflight version:  v0.1.9

Commit: <commit hash>
Built: <build timestamp>
Go: <Go version>

Scrape whitelisted namespaces to allow more restricted RBAC

Acceptance Criteria:

Acceptance Criteria is a set of test scenarios specific to this particular issue. It should capture functional and non-functional requirements.

it is possible to run preflight with permissions to scrape a single namespace without knowing the other namespaces in the cluster to exclude

Assumptions:

Some users will not be able to have cluster-viewer to run preflight.

Sub-tasks:

extend the generic datagatherer here https://github.com/jetstack/preflight/blob/764397f8ba1aad718eb9d3a05893ae213c7680ee/pkg/datagatherer/k8s/generic.go

Create manifests to deploy preflight agent

In order to help users install Preflight in their clusters, we should provide a guide and some manifests to do this.

Sub-tasks:

Create a readme explaining the installation of the agent
Explain authorization requirements of datagathers, perhaps linking to updated dg docs for things like k8s rbac etc.
Create a deployment yaml to run the agent (1 replica)

Acceptance Criteria:

There is a clear process outlined to install preflight with the k8s/pods and gke dg configured.

Assumptions:

we're only worrying about k8s/pods and gke for now.
users want to run preflight in a k8s cluster

This task relates to this milestone

Limit namespaces used in k8s datagatherer

It's a request to be able to ignore pods in the kube-system namespace as these are often not under the control of the users in managed clusters.

Sub-tasks:

support excluding namespacs in a k8s datagatherer config

Acceptance Criteria:

I should be able to run the pods datagatherer without sending the pod data from the kube-system ns

Assumptions:

this is a good first port of call, and label selectors and other more advanced filters are not needed at this time.

This task relates to this milestone

Implement a k8s secret gatherer

We have recently implemented a generic k8s data gatherer #99. This is being used to fetch resources for the private certmanager package. To make this package ready for real use, we need to make sure that when secrets are gathered, only their metadata is sent.

Acceptance Criteria:

when I run a k8s/secrets.v1 datagatherer - only the metadata is sent
it is not possible to configure the agent to send secret data to the backend

Risks:

people have secret data in non k8s-secret resources

Assumptions:

we're not going to be running policy on secret content

This task relates to this milestone

A flag for setting the log level and more verbose log output

I am stuck with a non-working preflight agent on my cluster. Before going further (i.e., using mitmproxy to see what is going on with the HTTP request being made to plaform.jetstack.io), I wondered: is there a "debug" mode that would make the logs a bit more verbose?

By verbose, I would probably expect to see some of the request being made: see the 200 OK and so on. And maybe also some of the payload and the HTTP headers on the request and response.

The agent does not seem to have any --level or -v flag though:

% preflight agent -h
The agent will periodically gather data for the configured data
	gatherers and send it to a remote backend for evaluation

Usage:
  preflight agent [flags]
  preflight agent [command]

Available Commands:
  info        print several internal parameters of the agent

Flags:
  -c, --agent-config-file agent.yaml   Config file location, default is agent.yaml in the current working directory. (default "./agent.yaml")
      --backoff-max-time duration      Max time for retrying failed data gatherers (given as XhYmZs). (default 10m0s)
  -k, --credentials-file string        Location of the credentials file. For OAuth2 based authentication.
  -h, --help                           help for agent
      --input-path string              Input file path, if used, it will read data from a local file instead of gathering data from clusters
      --one-shot                       Runs agent a single time if true, or continously if false
      --output-path string             Output file path, if used, it will write data to a local file instead of uploading to the preflight server
  -p, --period duration                Override time between scans in the configuration file (given as XhYmZs).
      --strict                         Runs agent in strict mode. No retry attempts will be made for a missing data gatherer's data.

Use "preflight agent [command] --help" for more information about a command.

Consider removing public packages

In this commit 1bab9a1 we remove testing of packages in this repo.

It's considering removing the packages from this repository since they are no longer really relevant to the agent operation.

Index report bucket contents

Currently, Preflight will export data to a bucket in the following 'directory' structure:

clusterA:
  timestampA:
    packageA.json
    packageB.json
  timestampB:
    ...
clusterB:
  ...

This bucket is queried when the data is loaded for display in the frontend when it is loaded buy the (closed source) preflight backend.

The issues with this format is that all the reports must be listed to find the latest (since we can't order the results from object storage).

Let's change the format to make it possible to find the latest report in constant time.

Goals:

minimize the complexity in the writes to object storage
create a structure which helps minimize the complexity in the reads of the same data
create a structure which allows the latest report for each cluster to be found and loaded quickly
avoid duplicated data in the bucket
never delete any data from the bucket

Non goals:

pagination of results for a single cluster's reports (this can happen at a later date)

Do not report rules marked as manual

In some packages, we have rules marked as manual: true.

That was an attempt to have rules that cannot be checked automatically, but require a human to check them manually (see #73).

The problem is that generated reports mark those rules as missing: true, which is true but can be confusing.

We should discuss ways to improve that.

Investigate the best way to authenticate on Azure

Right now, we use a short-lived token for the AKS data-gatherer, and storage account plus secret key for the blob storage (via environment variables).

The short-lived token is a problem for obvious reasons, we need to something that lasts longer.

The storage account and key are fine, but we would like to see if there is something like a credentials file to match what we have for GCS.

As a user I'd like the installation of the agent to be simpler

Currently there are instructions on how to deploy the Preflight agent to a cluster in the README.md. These include inline Kubernetes manifests which users can copy, edit, and deploy.

Ideally the deployment process should be even easier and more automated. This should improve the user experience, make it less likely that mistakes are introduced while copying manifests, and make casual users more likely to deploy the agent and try Preflight.

The only part of the deployment that user must edit is the agent configuration. Most of this configuration can have sensible defaults, except the user token. This means it should all be achievable with a few commands, which user can copy from the README.md and run. We probably can't quite have 'one click' installation but can reduce the required steps to a few commands for a minimal deployment.

Ideas

The most minimal option would be for users to just curl a file with several Kubernetes manifests, edit the config or use sed to inject a token value from an environment variable, and kubectl apply the file.
A tidier version of this would be to use Kustomize for deployment. Users could fetch a default config file and edit as required. Kustomize Secret/ConfigMap generation would then get this into the cluster, inject values using parches, and deploy the other default manifests. Users can make their own overlays to modify the rest of the deployment.
Another option is a Helm chart. This seems overkill but is how a lot of people expect to install applications into their cluster so is worth considering.
Manifests could be generated by the backend, so when a user signs up and a token is generated a set of manifests is also generated which they can just download and apply, or apply directly from an endpoint we provide.

Acceptance Criteria:

Acceptance Criteria is a set of test scenarios specific to this particular issue. It should capture functional and non-functional requirements.

There is a minimal deployment process
The deployment process has automated testing

Risks:

Poor deployment automation could make the process more painful for users
- This can be mitigated by ensuring the deployment process is tested
Obfuscating the deployment manifests could put users off as they're not sure what they're putting in their cluster
- This can be mitigated by ensuring that manifests are still visible in the repo and are well documented
Overcomplicating the deployment process can make it harder for developers to maintain
- This requires us to agree a balance of automation and maintainability.

Assumptions:

Users want easier deployment
Users might make mistakes copying manifests
Casual users will be put off by having to copy and edit multiple YAML files.

Dependencies:

This will introduce a dependency whatever process we use to test the deployment automation
- For example we may introduce kind as a dependency for testing
This may also introduce a deployment tool as a dependency
- For example Kustomize or Helm

Sub-tasks:

Agree a deployment approach
Implement the deployment approach
Implement testing for the deployment
Get a user to try the deployment and give feedback

Config file schema

I feel that it would be better if we defined a Go struct for the Preflight config and un-marshalled YAML into this, rather than using viper.GetString throughout the code.

This will make it clearer what the structure of the config should be, and make it clearer when it changes. It also means that the config can be checked more before Preflight actually runs anything.

This would also allow us to add a command to check a config file, if we felt this was useful.

Allow disabling of checks in a package

We have packages of checks that preflight can run. While these are intended to be used in full - sometimes checks will need to be disabled.

Preflight yaml changes

It makes sense to me to configure this in one of the following ways:

Namespaced List of Excludes

enabled-packages:
  - "examples.jetstack.io/gke_basic"

ignored-checks:
  - "examples.jetstack.io/gke_basic/networking/private_cluster"

This is simple and likely most simple to implement but I don't like how the package names and namespaces are duplicated.

Checks listed under package

enabled-packages:
- name: "examples.jetstack.io/gke_basic"
  disabled-checks:
  - private_cluster

This appears to be more concise but requires a change in the data format for packages making it marginally more complex to implement.

Whitelist/Blacklist

enabled-packages:
- name: "examples.jetstack.io/gke_basic"
  enabled-checks: # whitelist
  - private_cluster
  disabled-checks: # blacklist
  - master_auth

Where both fields are optional - if both present, use the whitelist. This is marginally more complex to implement but makes it easier to use a single check from a package (using the whitelist).

Changes to reports

The excluded checks are not only absent from the report but are also not run. This is important to avoid running checks where the results are not important to the user.

wdyt @j-fuentes @wwwil ?

Remove all references to check

We are now almost at the point where we are ready to start onboarding public beta users to the new tool.

As part of clearing up the messaging around the product, we should make the public repo all about the agent.

remove the check documentation
remove the check command code

MoveAPI types to private repo

In pull request #148 we removed all check related code. We still have API types such as report defined in this project.

eg : https://github.com/jetstack/preflight/blob/master/api/report.go

These types are no longer used in the public project and should be moved to the backend.

[agent] Implement minimal config file loading for agent

Sub-tasks

implement a means of loading config - much the same as preflight core
configure schedule variable
configure endpoint variable
configure identity token variable
configure loaded data gather variables

This is just to load the vars from the file - nothing else.

Users of the agent need this to be able to control how the agents works when its running in their clusters. If we one day have a generated yaml, this might form the content of a configmap to be loaded into the running agent.

Acceptance Criteria:

I can run the agent with different config
for now, the agent just prints the config and exits

Internal: Created from: https://github.com/jetstack/preflight-private/issues/260 Needs: https://github.com/jetstack/preflight-private/issues/263 Enables: https://github.com/jetstack/preflight-private/issues/265

Rename repo to "jetstack-secure-agent"

We stopped using preflight and started using the name Jetstack Secure Platform. Would it make sense to also rename the jetstack/preflight repo to jetstack/jetstack-secure-agent?

Specify workload identity as requirement

When following the guide to install Preflight in cluster on GKE having workload identity enabled on the cluster should be listed as a requirement. Without it the second Terraform project cannot be applied.

Add docs for AKS and EKS data gatherer

There is no documentation for the AKS and EKS data gatherers. This should be added so that users know how to set them up.

[agent] Make agent share identity with backend

Users of the agent need to to be able to prove their ID so that they have access to all the features of the backend. When uploading data - share the identity of the agent.

Sub-tasks

share the configured token as a bearer token header or similar
handle 401 errors gracefully with a good error message

Acceptance Criteria:

Agent token shows in request header (which would allow a backend to determine its ID)

Notes

The use of token auth doesn't scale in the long term as we add many agents tokens to the backend configuration. In the future it'd likely we'll offload the authentication to some third party. I use Auth0 as an example.

In this version of the future agents would gain an access token from Auth0 to talk to the backend using 'machine to machine' auth. We'd create client credentials to give to installers of agents automatically, perhaps using an API endpoint like this.

Our use of simple static auth doesn't make this option harder to reach in future, it looks almost identical to the installer of the agent. It'd also be easy to run both Auth0 + static token auth at the same time if needed.

Internal: Created from: https://github.com/jetstack/preflight-private/issues/260 Pre-req: https://github.com/jetstack/preflight-private/issues/266

As an agent user, I'd like to be able to use the agent via an http proxy

This is a user request being written up as a story.

It's common in certain companies to route all traffic via an http proxy - we should support this to allow those in such envs to use preflight.

Assumptions:

we're talking http_proxy env var here

Sub-tasks:

implement means of detecting the use of http_proxy in the env
route via the configured proxy if it's working (e.g. if it can reach jetstack.io?)
if the proxy is not working, crash the agent with an error

Acceptance Criteria:

Acceptance Criteria is a set of test scenarios specific to this particular issue. It should capture functional and non-functional requirements.

it's possible to send data from dgs to an external backend via an http proxy

Remove the artificial prefix `preflight_` in the rego rules

Preflight assumes there is some conversion between the rule IDs in the Policy Manifest and the REGO files.

This conversion is needed because rule IDs in rego have certain limitations.

But also, we have introduced the preflight_ prefix as part of this convention. In my opinion, this is very artificial and not necessary at all. I think this was introduced in the early times of Preflight to distinguish which rules were actually Preflight rules and which ones were just helpers. This is no longer necessary, as the Policy Manifest is the source of truth for what is a Preflight rule.

Redact last applied configuration annotation from secrets

#133 removes the secret data but leaves the last applied configuration in annotations. We need to also redact the last-applied-configuration annotation too since it contains secret data.

Implement cpu_limits_disabled_for_latency_sensitive_workloads and container_registry_close_to_cluster

In #73 we removed these rules to make the package work after #68.

We might want to actually implement these in the future.

Create generator for Preflight Packages

Add a command (e.g. preflight package generate) that generates the skeleton of a package so it is easier for a user to get started writing its own package.

Print a better error when attempting to use token based authentication and service account credentials at the same time

The agent support both basic token authentication and a service account credentials file.

Right now the token can be specified in either the config file or the --auth-token flag.

The execution should fail with a clear message if a service account credentials file is provided but there is a basic authentication token specified as well.

Document gathering custom data with the local datagatherer

Currently Preflight has a set of built-in datagatherers. This is a limitation, because if someone wants to create a new package but doesn't have a datagatherer that provides the information for it, the only option is to write Go code and create a new datagatherer. This might not be ideal for some people.

Ideally, there should be a way of plug into Preflight any external data provided from an arbitrary process or file or whatever that is generic.

Support GKE regional clusters

At the moment Preflight GKE data gatherer can only be configured for a single zone cluster.

Add support for regional clusters.

Github workflows fail to tag images after push

The release-master and release-tag workflows are failing to re-tag the image in quay.io:

docker buildx imagetools create quay.io/jetstack/preflight:b2384c5add501ec97516fc906ceab6967ff868a8 --tag quay.io/jetstack/preflight:latest
error: failed commit on ref "index-sha256:fcc90749fc55b6d51f932b3178a710be08cfe89b70beb57aa1d0fbe4ef2e3c66": unexpected status: 401 Unauthorized

The image was pushed correctly with the commit tag: https://quay.io/repository/jetstack/preflight/manifest/sha256:fcc90749fc55b6d51f932b3178a710be08cfe89b70beb57aa1d0fbe4ef2e3c66

It seems to be an issue with buildkit: docker/buildx#327 open-policy-agent/gatekeeper#665

Simplify in-cluster demo deployment

At the moment the in-cluster demo outlined here https://github.com/jetstack/preflight/blob/master/docs/installation_manual_in_cluster.md is perhaps a little more complicated than required.

I suggest the following two simplifications:

use a more simple pod policy
use a single GCP iam service account for the pod

Product update (newsletter)

Allow to load data to mock several data gatherers from a single local file

#180 adds a nice feature to output the data gathered to a file. That file contains data from all the data-gatherers.

It would be great to have an option that allows the agent to run and use that file as the source for the data to mock all the data-gatherered.

We already have the option local in the configuration for a data-gatherer, but that loads an input file for each data-gatherer.

The feature that is described here should work by loading a file with the exact format of the file that the flag in #180 outputs.

Easier end-to-end testing

We should make it easier to do end-to-end testing of Preflight. While there are Go tests for components the only way to test the whole application currently is against a real cluster. As Preflight is designed to work with various platforms (GKE, AKS, etc.) testing support for all of them requires many clusters, which is not convenient.

Make gathering of data from many data sources more reliable

Right now, if we fail to gather data, that data is missing from the readings. This means that as we increase the number of requests made to gather data, we dramatically increase the likelihood that the whole data gather stage partially fails.

Ideally, we retry gatherers when they fail so that the agent is more reliable.

They should back off exponentially.

PR: #185

Use a < 1.0.0 version for schema version

We are using v1.0.0 for our package schema versions. e.g.

https://github.com/jetstack/preflight/blob/master/preflight-packages/examples.jetstack.io/pods_basic/policy-manifest.yaml#L1

We should likely formalise changes to this.

In the meantime and while we work on the spec, it might be an idea to move this back to 0.1.0 or similar.

Split deployment manifests into groups

Currently, we have just a Kustomize 'base' for Preflight that contains manifests for both RBAC related resources and the CronJob, which means that it all gets applied with one kubectl apply -k ...

The problem

Typically within an organization, some administrator is going to create the Namespace, ServiceAccount, etc and then another system is going to deploy the workload (the CronJob in this case).

The way these manifests are provided at the moment makes it hard to apply them in two steps.

Make it possible to run the agent as a one-off task

During one off reviews of clusters it's helpful to run the agent once only rather than as a long running process.

To support this use case it should be possible to set a flag on the agent that means the data gathering step only happens once.

It should also not sleep if only running the process once to make the command exit as fast as possible.

Make package lint descend into subdirectories

When testing packages using the Preflight CLI tool it is possible to test multiple packages together like so:

$ preflight package test ./preflight-packages
...
2020/01/20 14:04:59 All packages tests passed :)

Here Preflight recursively descends into subdirectories to find packages. However this does not work when linting packages. For example:

$ preflight package lint preflight-packages
2020/01/20 14:08:44 Linting package preflight-packages
2020/01/20 14:08:44 Lint: preflight-packages - Unable to read manifest path
2020/01/20 14:08:44 Encountered lint errors
exit status 1

The ability to descend into subdirectories should be added to the lint command.

Cache go modules in docker build

As this comment suggest, we need a better approach to build the Preflight docker image.

Evaluate if it is worth using go_container.sh or something else.

[agent] Create a new agent binary/command

We're going to cut back the scope of the cluster component to a more simple 'agent'.

This agent binary will have a reduced set of responsibilities:

gathering data on a schedule
uploading data for evaluation to some backend

This story is to just build the binary that we can start adding other features to afterwards.

This is being done to simplify the installation of preflight and support the remote evaluation of packages.

Acceptance Criteria:

It is possible to build an agent binary from the makefile in the project

Internal Use: Created from https://github.com/jetstack/preflight-private/issues/260 Enables: https://github.com/jetstack/preflight-private/issues/264

Support for testing packages

Currently, users need to use OPA's CLI to perform tests on the REGO files.

Preflight CLI should have a way to execute the tests of a package with preflight package test <package> without the user having to install OPA's CLI.

Update datagatherer documentation with example data they send

It's important that we have examples of the data sent by each data gatherer to the backend (when running the agent).

Update each datagatherer to show an example json payload - perhaps in the readme of the datagatherer.

This is useful for reviewing the data sent but also when making new packages.

Support for Azure Blob Storage

I would like to be able to store reports in Azure Blob Storage.

Expose context information about why a check is failing

This is a screenshot from https://preflight.jetstack.io/

It shows the result of running our basic pod checks. We can see that some pods are missing requests and limits - oh dear.

This is based on the following report: https://github.com/jetstack/preflight/blob/master/examples/reports/pods.json#L7

All that's exposed in the report is the sucess/fail - not the reason.

When using OPA as something that backs a k8s webhook the idiomatic way to do this is to return a message if the rule fails, otherwise return nothing.

In our pod example, I think i'd return the pod name & namespace in a single string, one for each violation.

I guess we might also remove the success attribute and replace it with a violations list - if there are violations then it failed?

AKS data-gatherer missing some information

~~Currently the AKS data gatherer collects information about the configuration of an AKS cluster from the Azure API.~~

~~https://docs.microsoft.com/en-us/rest/api/aks/managedclusters/get~~

The information returned includes a list of node pools, referred to as agent pools. However it does not give details of each of these pools. This needs to be fetched separately. We should get the configuration of each node pool so we can make the checks performed in the AKS package more comprehensive.

~~https://docs.microsoft.com/en-us/rest/api/aks/agentpools/get~~

These will both return separate JSON documents, in fact there will be a JSON document for each node pool. We need to work out how this will be handled in Preflight. We could put them all in a list in master JSON document to evaluate with Rego. Alternatively we could make an AKS node pool data gatherer separately, but this would require support for multiple instances of the same data gatherer type to fetch multiple node pools and seems like more work for users.

I had misunderstood the problem here. Using the az aks show --resource-group preflight --name preflight-test-wil command I can see all the required information, as described in the API spec: https://docs.microsoft.com/en-us/rest/api/aks/managedclusters/get

However when using HTTP GET requests, as the data gatherer does, some information is missing. This also occurs when doing the same thing manually with the curl command.

Related to #30

[agent] Implement data upload from agent

Agents are not able to evaluate packages, only gather data. They need to be able to upload to a backend.

~~Needs: https://github.com/jetstack/preflight-private/issues/265~~ (or it can upload can upload an empty body for now?)

Enables: https://github.com/jetstack/preflight-private/issues/267

Sub-tasks

implement an agent command that gathers data and posts it to an endpoint on a given path
configure the endpoint and path in the config file

Acceptance Criteria:

there is an agent command I can run that gathers data and uploads it to the supplied endpoint

https://webhook.site can be used to test that the data is sent correctly.

Internal: Created from: https://github.com/jetstack/preflight-private/issues/260

[agent] Have agent run data gathers in its config

Users of the agent need to the agent to collect data, this should be configured based on the config file contents and work much the same as it does in preflight core

Sub-tasks

use preflight-core implementation to build similar data gather functionality in the agent
run the data gatherers configured in the config file

Acceptance Criteria:

for now, the agent prints the gathered data to the console in a json format.
only data gatherers in the config are run

Internal: Created from: https://github.com/jetstack/preflight-private/issues/260 Needs: https://github.com/jetstack/preflight-private/issues/264

input-data option should read data-readings from file not, the whole payload of the data-readings request.

At the moment, when input-data is specified, the agent read the file and blindly sends its content to the server.

Instead of doing that, it should craft an actual DataReadingPost object. The content of the file is the source of DataReadings, not the source of all the payload.

jetstack / jetstack-secure Goto Github PK

jetstack-secure's People

Contributors

Stargazers

Watchers

Forkers

jetstack-secure's Issues

Preflight yaml changes

Changes to reports

Recommend Projects

Recommend Topics

Recommend Org