Git Product home page Git Product logo

helm-charts's People

Contributors

allcontributors[bot] avatar burakince avatar ryshoooo avatar subramaniam20jan avatar xadrianzetx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

helm-charts's Issues

(psycopg2.OperationalError) SCRAM authentication requires libpq version 10 or above

Describe the bug a clear and concise description of what the bug is.

I try to deploy MLflow with in a local Kubernetes.

First, I created an empty database in Postgres 15 at postgresql://postgres-service.hm-postgres.svc:5432/hm_mlflow_db:

However, I failed to deploy MLflow with databaseMigration: true by

helm upgrade \
  mlflow \
  mlflow \
  --install \
  --repo=https://community-charts.github.io/helm-charts \
  --namespace=hm-mlflow \
  --create-namespace \
  --values=my-values.yaml

my-values.yaml:

backendStore:
  databaseMigration: true
  databaseConnectionCheck: true
  postgres:
    enabled: true
    host: postgres-service.hm-postgres.svc
    port: 5432
    database: hm_mlflow_db
    user: admin
    password: passw0rd

Any guide would be appreciate, thanks! ๐Ÿ˜ƒ

What's your helm version?

version.BuildInfo{Version:"v3.12.0", GitCommit:"c9f554d75773799f72ceef38c51210f1842a1dea", GitTreeState:"clean", GoVersion:"go1.20.3"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.2", GitCommit:"5835544ca568b757a8ecae5c153f317e5736700e", GitTreeState:"clean", BuildDate:"2022-09-21T14:33:49Z", GoVersion:"go1.19.1", Compiler:"gc", Platform:"darwin/amd64"} Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.3+k3s1", GitCommit:"01ea3ff27be0b04f945179171cec5a8e11a14f7b", GitTreeState:"clean", BuildDate:"2023-03-27T22:04:57Z", GoVersion:"go1.19.7", Compiler:"gc", Platform:"linux/arm64"}

Which chart?

mlflow

What's the chart version?

0.7.19

What happened?

dbchecker instance shows

[INFO] Waiting for Database to become ready...
[INFO] Database OK โœ“

However, mlflow-db-migration shows this and keeps retrying:

2023/07/03 23:39:31 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(psycopg2.OperationalError) SCRAM authentication requires libpq version 10 or above

(Background on this error at: https://sqlalche.me/e/14/e3q8)
Operation will be retried in 0.1 seconds
2023/07/03 23:39:31 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(psycopg2.OperationalError) SCRAM authentication requires libpq version 10 or above

(Background on this error at: https://sqlalche.me/e/14/e3q8)
Operation will be retried in 0.3 seconds
2023/07/03 23:39:31 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(psycopg2.OperationalError) SCRAM authentication requires libpq version 10 or above

(Background on this error at: https://sqlalche.me/e/14/e3q8)
Operation will be retried in 0.7 seconds
2023/07/03 23:39:32 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(psycopg2.OperationalError) SCRAM authentication requires libpq version 10 or above

(Background on this error at: https://sqlalche.me/e/14/e3q8)
Operation will be retried in 1.5 seconds
2023/07/03 23:39:33 WARNING mlflow.store.db.utils: SQLAlchemy engine could not be created. The following exception is caught.
(psycopg2.OperationalError) SCRAM authentication requires libpq version 10 or above

[mlflow] Use port name instead of port number in ServiceMonitor

Describe the bug a clear and concise description of what the bug is.

First of all, thanks to everyone creating this Helm Chart as it is really good and easy to use.

However, I encountered a problem when choosing to include ServiceMonitor and Prometheus metrics along the Deployment. Generally, the created ServiceMonitor for MLFlow is correct, yet in the current form it does not work for me.
I use the latest Prometheus deployed using the official Helm Chart and the MLFlow metrics did not show up in the Targets, yet it was visible in Service Discovery panel in Prometheus Dashboard, but appeared as 0/1 active targets.

After a couple of hours of educated debugging I changed manually the targetPort: 80 to port: http in the deployed ServiceMonitor manifest. It worked straightaway!

What I propose is a simple fix:
According to official Prometheus Troubleshooting docs the port specified in ServiceMonitor should use name instead of port number (Link to docs)
Simple fix would be to change targetPort: 80 to port: http in templates/servicemonitor.yaml. Port name http is already hardcoded, so can be used directly or new parameter could be introduced to give the freedom to choose port name.
I am aware that port number of type Integer should also work...

What's your helm version?

3.6.0

What's your kubectl version?

1.19

Which chart?

mlflow

What's the chart version?

0.2.21

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm install --namespace mlflow mlflow-tracking-server community-charts/mlflow --set serviceMonitor.enabled=true

Anything else we need to know?

No response

[mlflow] Make it possible to provide postgres secret as predefined

Is your feature request related to a problem ?

Right now the way how user/password are exposed in the chart require to put both as plain text into the repository. It would be better to provide a predefined secrets. I can think of various ways to achieve this.

Describe the solution you'd like.

One way how this could work:

  1. Add "Values.backendStore.secret" as value
  2. Do not set "backend-store-uri" if Values.backendStore.secret is set
  3. Use "MLFLOW_BACKEND_STORE_URI" and reference it form the provided secret ("Provide a default for --backend-store-uri")

Describe alternatives you've considered.

Cannot think of many alternatives right now

Additional context.

No response

[mlflow] Migration Job should run before upgrade

Describe the bug a clear and concise description of what the bug is.

When trying to install mlflow chart I'm trying to migrate from old mlflow version to the new one. I'm using backendStore.databaseMigration: true value for that. But mlflow pod failed to start with error:

mlflow.exceptions.MlflowException: Detected out-of-date database schema (found version c48cb773bb87, but expected cc1f77228345). Take a backup of your database, then run 'mlflow db upgrade <database_uri>' to migrate your database to the latest schema. NOTE: schema migration may result in database downtime - please consult your database's documentation for more detail.

From the looks of things migration Job should have pre-install,pre-upgrade hooks instead of post-install,post-upgrade but I can be wrong here.

Running Job from the chart manually with kubectl fixed this issue for me, but it will probably appear with the next release.

Thanks!

What's your helm version?

v3.9.3

What's your kubectl version?

v1.24.3

Which chart?

mlflow

What's the chart version?

0.6.0

What happened?

No response

What you expected to happen?

DB migration job should run before mlflow pod upgrade.

How to reproduce it?

  1. Install mlflow with old DB schema (1.23.1)
  2. Try to upgrade with 0.6.0 helm chart

Enter the changed values of values.yaml?

mlflow:
  nodeSelector:
    redacted: Shared
  
  ingress:
    enabled: true
  
  artifactRoot:
    s3:
      enabled: true
      bucket: "redacted"
      awsAccessKeyId: ""
      awsSecretAccessKey: ""
  
  extraEnvVars:
    AWS_DEFAULT_REGION: eu-central-1
    MLFLOW_S3_ENDPOINT_URL: https://bucket.redacted.s3.eu-central-1.vpce.amazonaws.com
  
  backendStore:
    databaseMigration: true
    databaseConnectionCheck: true
    mysql:
      enabled: true
      host: "redacted.eu-central-1.rds.amazonaws.com"
      database: "mlflow"
      user: ""
      password: ""

Enter the command that you execute and failing/misfunctioning.

helm upgrade --install --values override.yaml --wait --create-namespace --atomic --timeout 15m0s -f secrets://secrets.yaml shared-services ./shared-services

Anything else we need to know?

Chart was installed as a part of another umbrella chart

[mlflow] Extra args broken

Describe the bug a clear and concise description of what the bug is.

The new staticPrefix argument being under extraArgs breaks the chart for users that need to use the extraArgs

What's your helm version?

version.BuildInfo{Version:"v3.8.1", GitCommit:"5cb9af4b1b271d11d7a97a71df3ac337dd94ad37", GitTreeState:"clean", GoVersion:"go1.17.8"}

What's your kubectl version?

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/arm64"} Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/arm64"}

Which chart?

mlflow

What's the chart version?

0.2.7

What happened?

The newly added staticPrefix parameter under extraArgs breaks the chart when used because it tries to add an extra argument to the mlflow server command that doesnt exist.

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm install -f mlflow/values.yaml mlflow ./mlflow/

Anything else we need to know?

I am just creating a pull request to address this in a bit different way and havent tested it yet. Just wanted to create a request to highlight a solution.

You could also handle the staticPrefix as a separate argument in the extraEnv when starting up the mlflow server to make this work smoother for a final user, but this solution should work as well.

[kserve] No Cluster Role or Cluster Role Binding for kserve default Service Account

Describe the bug a clear and concise description of what the bug is.

unable to get deploy config.","error":"configmaps \"inferenceservice-config\" is forbidden: User \"system:serviceaccount:kserve:default\" cannot get resource \"configmaps\" in API group \"\" in the namespace \"kserve\"",

What's your helm version?

3.8.0

What's your kubectl version?

1.22.5

Which chart?

kserve

What's the chart version?

1.0.1

What happened?

After installing the current kserve Helm chart, I had an issue where the kserve deployment could never successfully deploy because the kserve default Service Account does not have a cluster role nor a cluster role binding so it couldn't access the configmap it needed.

Once I created a cluster role binding to a cluster role that had the proper API group permissions, the deployment was able to access the inferenceservice-config configmap and continue in it's process.

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm install [RELEASE_NAME] community-charts/kserve

Anything else we need to know?

No response

[mlflow] MySQL Setting Support

Is your feature request related to a problem ?

Currently, it's not easy to define MySQL DB settings to the Mlflow Helm chart.

Describe the solution you'd like.

MySQL driver already added to mlflow docker image with version 1.27.0.36. We need to have following settings in the values.yaml file.

backendStore:
  mysql:
    # -- Specifies if you want to use mysql backend storage
    enabled: false
    # -- MySQL host address. e.g. your Amazon RDS for MySQL
    host: "" # required
    # -- MySQL service port
    port: 3306 # required
    # -- mlflow database name created before in the mysql instance
    database: "" # required
    # -- mysql database user name which can access to mlflow database
    user: "" # required
    # -- mysql database user password which can access to mlflow database
    password: "" # required

Describe alternatives you've considered.

NONE

Additional context.

No response

[mlflow] The readiness probe and liveness probe need to be configurable

Is your feature request related to a problem ?

When you start mlflow behind a proxy, you often do not want to serve it on root. This means that you configure mlflow with --static-prefix and ensure it is served with a prefix mentioned.

The way the chart is designed right now, it allows for the mlflow server to be started up with this extra argument, but the readiness probe and liveness probe arent configurable to use the new prefix added to the mlflow server.

Describe the solution you'd like.

Parameterize the readiness probe and liveness probe path in the deployment to ensure it can be configured by users of the chart.

Describe alternatives you've considered.

NONE

Additional context.

No response

[kserve] Update kserve helm chart to the latest version

Is your feature request related to a problem ?

Yes old version of kserve has some bugs. Hence we need to update kserve helm chart to the latest version

Describe the solution you'd like.

Update kserve helm chart to the latest version

Describe alternatives you've considered.

NA

Additional context.

No response

[mlflow] model artifacts not saved in remote s3 artifact store

Describe the bug a clear and concise description of what the bug is.

I have local minikube cluster. I installed the helm chart with some changed settings. See below for the changed values. Everthing else is same as per default values yaml file. For db backend I am using bitnami/postgresql and for s3 storage minio instance. I also have created a initial bucket named "mlflow" in minio.

And then I created a simple k8s pod to run the simple training example from mlflow docs. This pod has env variables set as : MLFLOW_TRACKING_URI=http://mlflow.airflow.svc.cluster.local:5000 Here is the link to that code. I can see the metadata about the model in UI however , artifact section in UI is empty and also the bucket is empty.

What's your helm version?

version.BuildInfo{Version:"v3.9.0", GitCommit:"7ceeda6c585217a19a1131663d8cd1f7d641b2a7", GitTreeState:"clean", GoVersion:"go1.17.5"}

What's your kubectl version?

Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}

Which chart?

mlflow

What's the chart version?

latest

What happened?

No response

What you expected to happen?

I would expect the artifacts in minio bucket.

How to reproduce it?

install the helm chart with minio and postgresql config. Run a simple exmple frpom docs.

Enter the changed values of values.yaml?

backendStore:
    databaseMigration: true
    databaseConnectionCheck: true
    postgres:
      enabled: true
      host: mlflow-postgres-postgresql.airflow.svc.cluster.local
      database: mlflow_db
      user: mlflow
      password: mlflow
artifactRoot:
  proxiedArtifactStorage: true
  s3:
    enabled: true
    bucket: mlflow
    awsAccessKeyId: {{ requiredEnv "MINIO_USERNAME" }}
    awsSecretAccessKey: {{ requiredEnv "MINIO_PASSWORD" }}
extraEnvVars:
  MLFLOW_S3_ENDPOINT_URL: minio.airflow.svc.cluster.local

Enter the command that you execute and failing/misfunctioning.

helm install mlflow-release community-charts/mlflow --values values.yaml

Anything else we need to know?

No response

[kserve] Kserve fails in the kind cluster test step

Describe the bug a clear and concise description of what the bug is.

When we run the pipeline for the chart kserve, kind tests fails because certmanager not installed in the cluster.

What's your helm version?

v3.9.0

What's your kubectl version?

v1.24.2

Which chart?

kserve

What's the chart version?

0.0.2

What happened?

Kserve controller waits forever and getting the following error.

Warning  FailedMount  19s (x8 over 83s)  kubelet            MountVolume.SetUp failed for volume "cert" : secret "kserve-webhook-server-cert" not found

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm install mlflow community-charts/kserve

Anything else we need to know?

No response

[mlflow] Add option to create a kubernetes secret as part of the chart

Is your feature request related to a problem ?

In case one would create an ingress resource with annotations referring to a secret it would be nice to be able to deploy that dependency along with the chart.

Describe the solution you'd like.

Additional helm value to add secret with a list of key-value pairs which represents the data of the secret

Describe alternatives you've considered.

Additional context.

No response

[mlflow] Run chart-testing (lint) step returns Error validating maintainer 404 Not Found error

Describe the bug a clear and concise description of what the bug is.

When we open a pull request, chart-testing (lint) step in release.yaml file getting the following error.

Error: Error linting charts: Error processing charts
------------------------------------------------------------------------------------------------------------------------
 โœ–๏ธŽ mlflow => (version: "0.1.47", path: "charts/mlflow") > Error validating maintainer 'Burak Ince': 404 Not Found
------------------------------------------------------------------------------------------------------------------------

Because of maintainer name for the ct lint command must be a GitHub username rather than a real name.

What's your helm version?

v3.9.0

What's your kubectl version?

v1.24.2

Which chart?

mlflow

What's the chart version?

0.1.47

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

ct lint --debug --config ./.github/configs/ct-lint.yaml --lint-conf ./.github/configs/lintconf.yaml

Anything else we need to know?

No response

[mlflow] blob storage and DB support for on-promise systems

Is your feature request related to a problem ?

When we install the Mlflow to on-promise systems like RaspberryPi-based Kubernetes cluster, they can't have a SaaS solution for blob storage and DB systems.

Describe the solution you'd like.

Maybe we can have one flag for the Postgres DB and another flag for the Minio installation. They could be false on default and they can be sub-chart for our helm chart.

Describe alternatives you've considered.

  • For any production usage, our users must not have a DB container on their's Kubernetes cluster. They must use their DB from a VM or SaaS solution.
  • People can use official charts to install Minio or Postgres DB to their system.

Additional context.

No response

[all charts] Readme Changes Fails the Pipeline if there is no new chart version defined

Describe the bug a clear and concise description of what the bug is.

If we have some automated task to update readme files, it fails the pipeline because there is no new helm chart version existing. Helm chart creation process must be ignored from readme-files changes.

What's your helm version?

NONE

What's your kubectl version?

NONE

Which chart?

all charts

What's the chart version?

NONE

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

NONE

Anything else we need to know?

No response

[mlflow] Postgresql as an optional dependency

Is your feature request related to a problem ?

When I deploy the mlflow chart into the cluster, I don't have any postgres database running externally. So I have to deploy the postgres helm chart first and then set the pointers to the database manually. It'd be preferable if the mlflow chart had postgresql dependency available.

Describe the solution you'd like.

Postgresql helm chart as an optional dependency.

Describe alternatives you've considered.

Installing postgresql separately or using a cloud-provided database. None of which I'm very happy about.

Additional context.

No response

[mlflow] Default sqlite3 is not in-memory because of typo

Describe the bug a clear and concise description of what the bug is.

Apparently, the intention of the default database setting was to use an in-memory SQLite database.

In order to do this, the special filename ":memory:" is utilized (see https://www.sqlite.org/inmemorydb.html ) .

However, in the deployment file the ending colon is missing, which means that SQLite is creating a file /mlflow/:memory.

Whereas I find it would also be perfectly fine to persist the SQLite3 database (and for this, a PersistentVolumeClaim would be needed), using the ill ":memory" creates a strange filename and is NOT serving the purpose of the in-memory database.

What's your helm version?

Irrelevant

What's your kubectl version?

Irrelevant

Which chart?

mlflow

What's the chart version?

0.7.19

What happened?

I found a weird-looking file ":memory" within /mlflow in the mlflow container.

Reading the output of "helm template ..." I see that a URI ending with ":memory" without an ending semicolon is used. I check the source and in effect the source has a typo.

What you expected to happen?

I would expect the URI to end with ":memory:".

How to reproduce it?

Whenever you install the chart without specifying a PostgreSQL or MySQL database.

Enter the changed values of values.yaml?

None

Enter the command that you execute and failing/misfunctioning.

A normal install without values.

Anything else we need to know?

I understand the error might have arisen since YAML will not let you end a string literal with a colon unless you quote the string. At some point the ending colon can have been removed without reflection about the meaning.

[mlflow] templates/deployment.yaml backend-store-uri did not render properly when using postgresql

Describe the bug a clear and concise description of what the bug is.

- --backend-store-uri=postgresql{{ $dbConnectionDriver }}://

backend-store-uri did not render properly when using postgresql

What's your helm version?

3.12.2

What's your kubectl version?

1.28.1

Which chart?

mlflow

What's the chart version?

0.7.19

What happened?

No response

What you expected to happen?

No response

How to reproduce it?

No response

Enter the changed values of values.yaml?

No response

Enter the command that you execute and failing/misfunctioning.

helm install mlflow community-charts/mlflow --values values.yaml

Anything else we need to know?

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.