Git Product home page Git Product logo

kubeflow-aws's People

Contributors

eterna2 avatar isavcic avatar lzuwei avatar padarn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

kubeflow-aws's Issues

proxy-agent crashing because it is relying on GCP metadata server

$ kubectl -n kubeflow logs pod/proxy-agent-7bd656ff7d-b6qlm
+++ dirname /opt/proxy/attempt-register-vm-on-proxy.sh
++ cd /opt/proxy
++ pwd
+ DIR=/opt/proxy
+ kubectl get configmap inverse-proxy-config
Error from server (NotFound): configmaps "inverse-proxy-config" not found
+ [[ ! -z '' ]]
++ curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H 'Metadata-Flavor: Google'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (6) Could not resolve host: metadata.google.internal
+ INSTANCE_ZONE=/

Deployment of ml-pipeline not working with IAM roles (kube2iam)

I used the IAM overlay to configure my Kubeflow deployment to use my S3 bucket. Just configuring the bucket, prefix, region and role, I get this for the ml-pipeline deployment:

$kubectl logs -n kubeflow ml-pipeline-7cd7f6678d-hm89c -f
I0115 14:54:17.589476       8 client_manager.go:127] Initializing client manager
[mysql] 2020/01/15 14:54:18 packets.go:427: busy buffer
[mysql] 2020/01/15 14:54:18 packets.go:408: busy buffer
E0115 14:54:18.517982       8 default_experiment_store.go:73] Failed to commit transaction to initialize default experiment table
[mysql] 2020/01/15 14:54:18 packets.go:427: busy buffer
[mysql] 2020/01/15 14:54:18 packets.go:408: busy buffer
E0115 14:54:18.519865       8 db_status_store.go:71] Failed to commit transaction to initialize database status table
[mysql] 2020/01/15 14:54:18 packets.go:427: busy buffer
[mysql] 2020/01/15 14:54:18 packets.go:408: busy buffer
E0115 14:54:18.521396       8 default_experiment_store.go:73] Failed to commit transaction to initialize default experiment table
F0115 14:55:02.668289       8 client_manager.go:311] Failed to create Minio bucket. Error: Get http://s3.amazonaws.com:443/my-bucket/?location=: net/http: HTTP/1.x transport connection broken: malformed HTTP response "\x15\x00\x00\x00\x02\x01\x00"

I also tried changing the port to 80, but the following error appeared:

I0115 16:14:22.433511       8 client_manager.go:127] Initializing client manager
[mysql] 2020/01/15 16:14:22 packets.go:427: busy buffer
[mysql] 2020/01/15 16:14:22 packets.go:408: busy buffer
E0115 16:14:22.672748       8 default_experiment_store.go:73] Failed to commit transaction to initialize default experiment table
[mysql] 2020/01/15 16:14:22 packets.go:427: busy buffer
[mysql] 2020/01/15 16:14:22 packets.go:408: busy buffer
E0115 16:14:22.677105       8 db_status_store.go:71] Failed to commit transaction to initialize database status table
[mysql] 2020/01/15 16:14:22 packets.go:427: busy buffer
[mysql] 2020/01/15 16:14:22 packets.go:408: busy buffer
E0115 16:14:22.680739       8 default_experiment_store.go:73] Failed to commit transaction to initialize default experiment table
F0115 16:14:22.699166       8 client_manager.go:311] Failed to create Minio bucket. Error: The AWS Access Key Id you provided does not exist in our records.

The default configuration seem to work without problems. My guess is that the ml-pipeline doesn't support IAM auth. I was lead to believe this since it instances the MinIo client explicitly passing the accessKey and the secretKey.

https://github.com/kubeflow/pipelines/blob/dc34a3568d79dd96c908703869596dcf6514bf52/backend/src/apiserver/client/minio.go#L29-L30

Do you know if ml-pipelines supports IAM? If so, how do you achieved IAM authentication in your cluster?

gosu missing in mysql image

$ kubectl -n kubeflow logs pod/mysql-6995764585-tntqj
2020-02-19 13:57:28+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.6.47-1debian9 started.
2020-02-19 13:57:28+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
/usr/local/bin/docker-entrypoint.sh: line 354: exec: gosu: not found

Is there a reason this custom su is used?

Deployment of ml-pipeline with Kubeflow pipeline version 1.0.3 with IAM role is not working.

when I run the command .
kubectl kustomize overlay/${VARIANT} > kubeflow-pipelines-aws.yaml

I got the config merge issue.

I1027 20:05:10.994651 40204 log.go:172] well-defined vars that were never replaced: mlmdDb,pipelinePath,awsIAMAnnotationKey,dbHost,dbPort,kfp-app-name,kfp-app-version,artifactRepositoryKeyPrefix,groupConcatMaxLen,kfp-namespace,mysqlStorageType,cacheDb,kfp-container-runtime-executor,mysqlPvcName,mysqlStorageSize,pipelineDb,artifactRepositoryBucket,awsIAMRole,awsRegion,kfp-artifact-bucket-name,mysqlStorageClass
Error: 80 is expected to be string

1.5.0 support and collaboration on ArgoCD-based AWS rollout

Firstly, thanks a lot for this community effort! We are currently running several KF1.2 distributions on AWS where we integrate as tightly as possible with AWS managed services and using your solution for S3/RDS in Pipelines is an important part of that. We offload on-cluster services to AWS as far as possible. This means in particular:

  • Using Cognito for user pool management
  • Using RDS for all metadata storage (pipelines, metadb, cachedb, katibdb) instead of an on-cluster MySQL db
  • Using S3 for all pipeline artifact storage instead of using Minio for on-cluster storage
  • Using Secret Manager in combination with https://github.com/external-secrets/kubernetes-external-secrets in order to manage secrets (for example RDS database credentials)
  • Using IRSA to manage granular pod-level access to AWS resources

In addition, we use ArgoCD as GitOps operator try to avoid any middleware such as kftctl / Kubeflow Operator we rolling out. We have started a new community effort that aims to do that for Kubeflow 1.3 here: https://github.com/argoflow/argoflow-aws (currently still under construction). We plan to integrate the solution that you have developed here as well, and would also welcome any direct contributions from you!

Most important for us right now though is that KF 1.3 used Pipelines 1.5.0. Your latest version is for Pipelines 1.4.1. Are you aware of any changes that are necessary to upgrade to 1.5.0 and do you intend to upgrade soon?

mysql-related Kubernetes resource errors

The Deployment "mysql" is invalid: spec.template.metadata.labels: Invalid value: map[string]string(nil): `selector` does not match template `labels

and

error validating data: ValidationError(Deployment.spec.template.spec.containers[0].env[1]): unknown field "secretKeyRef" in io.k8s.api.core.v1.EnvVar; if you choose to ignore these errors, turn validation off with --validate=false

Fixed by #5

`kustomize overlay/accesskey` returns error due to mismatched apiversion

When I run:

kubectl kustomize overlay/accesskey
I get:
Error: failed to find an object with apps_v1beta2_Deployment|ml-pipeline-ui to apply the patch

I was able to fix this problem by updating the ml-pipieline-ui and ml-pipeline definitions in accesskey/aws-configurations-patch.yaml to use apiVersion apps/v1.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.