Git Product home page Git Product logo

greenplum-for-kubernetes's Introduction

Greenplum for Kubernetes

This repo houses the Helm charts as well as code and resources that go into the Docker images for a Greenplum on Kubernetes release.

git clone [email protected]:greenplum-db/greenplum-for-kubernetes.git

Table of Contents

Run Unit and Lint Tests

# runs only unit tests
make unit

# runs only linting
make lint

# runs unit and lint
make check

Deploy Greenplum for Kubernetes on Minikube

Setup

Install Minikube and it's pre-requisites by following instructions here. (If there are any problems, use minikube delete to delete the current minikube and then recreate it.)

Use HyperKit with minikube. It's much faster:

brew install docker-machine-driver-hyperkit
# Make sure to follow the `sudo chown` commands that brew outputs
minikube config set vm-driver hyperkit

Start minikube as below. You can adjust the resources to be lower if needed, but we recommend these resource settings for improved performance and enabling larger clusters.

minikube start --memory 8192 --cpus 8 --kubernetes-version=v1.16.7 --disk-size 80g

Confirm kubectl can access the minikube.

kubectl get nodes

NAME       STATUS    ROLES     AGE       VERSION
minikube   Ready     master    21d       v1.10.0

Authenticate with gcloud to pull required images from gcr.io

gcloud auth login
gcloud auth configure-docker

Build

Run the following command to change the local docker environment to point to the minikube docker. You only need to run this command once (per shell).

eval $(minikube docker-env)

(Note: to undo this docker setting in current shell, run eval "$(docker-machine env -u)")

Then, to build and upload the Greenplum images to minikube's docker registry.

make docker

The image should now have the tag greenplum-for-kubernetes:<version> as shown below:

$ docker images | grep greenplum
greenplum-operator                                  container-structure-test-in-docker                  2de1e0fa79b1        8 minutes ago       928MB
greenplum-operator                                  latest                                              24ca53fb90aa        8 minutes ago       111MB
greenplum-operator                                  v2.0.0-alpha.2.dev.7.gee6a7552                      24ca53fb90aa        8 minutes ago       111MB
greenplum-instance                                  container-structure-test-in-docker                  b90a82a6a746        10 minutes ago      928MB
greenplum-for-kubernetes                            latest                                              a2076cf10832        10 minutes ago      2.13GB
greenplum-for-kubernetes                            v2.0.0-alpha.2.dev.7.gee6a7552                      a2076cf10832        10 minutes ago      2.13GB

To clean up images:

# clean files and images directly created by our Makefile
make clean

# clean all the dangling docker images
make docker-clean

Deploy

Our make deploy target creates a regsecret used by the pods to download images. This requires a service account key for GCP. You must either place the key in a file named key.json inside the operator directory or a set the GCP_SVC_ACCT_KEY environment variable for this to work.

To install Greenplum cluster in the minikube environment, run the command below:

make deploy

You can access the Greenplum cluster with psql running in master-0:

kubectl exec -it master-0 -- bash -c "source /usr/local/greenplum-db/greenplum_path.sh; psql"

If you want to access the Greenplum service outside the minikube and you have a compatible "psql" executable in your path, you can do:

PGPORT=$(minikube service --format "{{.Port}}" --url greenplum) \
  PGHOST=$(minikube service --format "{{.IP}}" --url greenplum) \
  bash -c 'psql -U gpadmin -h $PGHOST -p $PGPORT'

To remove the Greenplum deployment:

make deploy-clean

Integration Test

The integration tests work by deploying Greenplum for Kubernetes and running checks against the cluster. Make sure you have performed the Build step before attempting to run the integration tests.

To run integration tests on minikube, execute the following command:

make integration

To run a specific integration test suite (from greenplum-operator/integration/), run:

make -C greenplum-operator integration-<suite> # e.g. `integration-ha`

Deploy Greenplum for Kubernetes on GKE

Setup

Create a cluster in GKE either with the web console or on the command line.

Run the following command to set the GKE cluster as your Kubernetes context for the command line

gcloud container clusters get-credentials <your-cluster-name> --project <your-project> --zone <cluster-zone>

Confirm kubectl can access the GKE cluster.

kubectl get nodes

NAME                                          STATUS   ROLES    AGE   VERSION
gke-test-default-pool-dba9314c-6xkg   Ready    <none>   20h   v1.15.9-gke.24
gke-test-default-pool-dba9314c-gjxc   Ready    <none>   20h   v1.15.9-gke.24
gke-test-default-pool-dba9314c-s83j   Ready    <none>   20h   v1.15.9-gke.24
gke-test-default-pool-dba9314c-tz4v   Ready    <none>   20h   v1.15.9-gke.24
gke-test-default-pool-dba9314c-wwc6   Ready    <none>   20h   v1.15.9-gke.24
gke-test-default-pool-dba9314c-x1cq   Ready    <none>   20h   v1.15.9-gke.24

Authenticate with gcloud to pull required images from gcr.io

gcloud auth login
gcloud auth configure-docker

Build

The following command will build the docker images for the Greenplum instance locally and then upload them to GCR so they can be used on your GKE cluster.

make gke-docker

The image should now have the tag greenplum-for-kubernetes:<version> as shown below:

$ docker images | grep greenplum
greenplum-operator                                  container-structure-test-in-docker                  2de1e0fa79b1        8 minutes ago       928MB
greenplum-operator                                  latest                                              24ca53fb90aa        8 minutes ago       111MB
greenplum-operator                                  v2.0.0-alpha.2.dev.7.gee6a7552                      24ca53fb90aa        8 minutes ago       111MB
gcr.io/gp-kubernetes/greenplum-operator             dev-24ca53fb90aa                                    24ca53fb90aa        8 minutes ago       111MB
greenplum-instance                                  container-structure-test-in-docker                  b90a82a6a746        10 minutes ago      928MB
greenplum-for-kubernetes                            latest                                              a2076cf10832        10 minutes ago      2.13GB
greenplum-for-kubernetes                            v2.0.0-alpha.2.dev.7.gee6a7552                      a2076cf10832        10 minutes ago      2.13GB
gcr.io/gp-kubernetes/greenplum-for-kubernetes       dev-a2076cf10832                                    a2076cf10832        10 minutes ago      2.13GB

A file will be created at /tmp/.gp-kubernetes_gke_image_tags containing the tags of the built images. This file will be used by the make gke-deploy target to reference the correct images.

To clean up images:

# clean files and images directly created by our Makefile
make clean

# clean all the dangling docker images
make docker-clean

Deploy

Our make gke-deploy target creates a regsecret used by the pods to download images. This requires a service account key for GCP. You must either place the key in a file named key.json inside the operator directory or a set the GCP_SVC_ACCT_KEY environment variable for this to work.

To install Greenplum cluster in the GKE environment, run the command below. This will set up parameters for the operator to use the docker images that were built with the gke-docker target.

make gke-deploy

You can access the Greenplum cluster with psql directly through the master pod with:

kubectl exec -it master-0 -- bash -c "source /usr/local/greenplum-db/greenplum_path.sh; psql"

To remove the Greenplum deployment:

make gke-deploy-clean

Integration Test

The integration tests work by deploying Greenplum for Kubernetes and running checks against the cluster. Make sure you have performed the Build step before attempting to run the integration tests.

To run integration tests on GKE, run the following command:

make gke-integration

To run a specific integration test suite (from greenplum-operator/integration/), run:

make -C greenplum-operator gke-integration-<suite> # e.g. `gke-integration-ha`

greenplum-for-kubernetes's People

Contributors

dependabot[bot] avatar khuddlefish avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

greenplum-for-kubernetes's Issues

How's it going?

I am very interested in this project but it seems that it is not updated anymore

how i can find docker images?

docker pull gcr.io/gp-kubernetes/greenplum-operator:latest
Error response from daemon: manifest for gcr.io/gp-kubernetes/greenplum-operator:latest not found: manifest unknown: Failed to fetch "latest" from request "/v2/gp-kubernetes/greenplum-operator/manifests/latest".

Docker build is failing

I added a Github action to fire up the Docker build directly. This is built on a fork, thus the PlaidCloud name in the paths.

The main culprit seems to be this error: not enough arguments in call to k8scsr.WaitForCertificate

The build process progresses to a point but the following error is raised:

#15 55.45 # github.com/pivotal/greenplum-for-kubernetes/greenplum-operator/pkg/admission
[289](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:293)
#15 55.45 ../../pkg/admission/certificate.go:96:108: not enough arguments in call to k8scsr.WaitForCertificate
[290](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:294)
#15 55.45 	have (context.Context, "k8s.io/client-go/kubernetes/typed/certificates/v1beta1".CertificateSigningRequestInterface, *"k8s.io/api/certificates/v1beta1".CertificateSigningRequest)
[291](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:295)
#15 55.45 	want (context.Context, kubernetes.Interface, string, types.UID)
[292](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:296)
#15 ERROR: executor failed running [/bin/sh -c cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build]: exit code: 2
[293](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:297)
------
[294](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:298)
 > [build-in-docker 6/6] RUN --mount=type=cache,target=/root/.cache/go-build     --mount=type=cache,target=/go/pkg/mod     cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build:
[295](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:299)
#15 3.854 go: downloading github.com/go-openapi/jsonreference v0.19.6
[296](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:300)
#15 3.860 go: downloading github.com/mailru/easyjson v0.7.6
[297](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:301)
#15 3.882 go: downloading github.com/PuerkitoBio/purell v1.1.1
[298](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:302)
#15 3.882 go: downloading github.com/go-openapi/jsonpointer v0.19.5
[299](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:303)
#15 3.894 go: downloading github.com/josharian/intern v1.0.0
[300](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:304)
#15 3.895 go: downloading github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578
[301](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:305)
#15 55.45 # github.com/pivotal/greenplum-for-kubernetes/greenplum-operator/pkg/admission
[302](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:306)
#15 55.45 ../../pkg/admission/certificate.go:96:108: not enough arguments in call to k8scsr.WaitForCertificate
[303](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:307)
#15 55.45 	have (context.Context, "k8s.io/client-go/kubernetes/typed/certificates/v1beta1".CertificateSigningRequestInterface, *"k8s.io/api/certificates/v1beta1".CertificateSigningRequest)
[304](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:308)
#15 55.45 	want (context.Context, kubernetes.Interface, string, types.UID)
[305](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:309)
------
[306](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:310)
ERROR: failed to solve: executor failed running [/bin/sh -c cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build]: exit code: 2
[307](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:311)
Error: buildx failed with: ERROR: failed to solve: executor failed running [/bin/sh -c cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build]: exit code: 2#15 55.45 # github.com/pivotal/greenplum-for-kubernetes/greenplum-operator/pkg/admission
[289](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:293)
#15 55.45 ../../pkg/admission/certificate.go:96:108: not enough arguments in call to k8scsr.WaitForCertificate
[290](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:294)
#15 55.45 	have (context.Context, "k8s.io/client-go/kubernetes/typed/certificates/v1beta1".CertificateSigningRequestInterface, *"k8s.io/api/certificates/v1beta1".CertificateSigningRequest)
[291](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:295)
#15 55.45 	want (context.Context, kubernetes.Interface, string, types.UID)
[292](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:296)
#15 ERROR: executor failed running [/bin/sh -c cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build]: exit code: 2
[293](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:297)
------
[294](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:298)
 > [build-in-docker 6/6] RUN --mount=type=cache,target=/root/.cache/go-build     --mount=type=cache,target=/go/pkg/mod     cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build:
[295](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:299)
#15 3.854 go: downloading github.com/go-openapi/jsonreference v0.19.6
[296](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:300)
#15 3.860 go: downloading github.com/mailru/easyjson v0.7.6
[297](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:301)
#15 3.882 go: downloading github.com/PuerkitoBio/purell v1.1.1
[298](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:302)
#15 3.882 go: downloading github.com/go-openapi/jsonpointer v0.19.5
[299](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:303)
#15 3.894 go: downloading github.com/josharian/intern v1.0.0
[300](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:304)
#15 3.895 go: downloading github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578
[301](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:305)
#15 55.45 # github.com/pivotal/greenplum-for-kubernetes/greenplum-operator/pkg/admission
[302](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:306)
#15 55.45 ../../pkg/admission/certificate.go:96:108: not enough arguments in call to k8scsr.WaitForCertificate
[303](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:307)
#15 55.45 	have (context.Context, "k8s.io/client-go/kubernetes/typed/certificates/v1beta1".CertificateSigningRequestInterface, *"k8s.io/api/certificates/v1beta1".CertificateSigningRequest)
[304](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:308)
#15 55.45 	want (context.Context, kubernetes.Interface, string, types.UID)
[305](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:309)
------
[306](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:310)
ERROR: failed to solve: executor failed running [/bin/sh -c cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build]: exit code: 2
[307](https://github.com/PlaidCloud/greenplum-for-kubernetes/actions/runs/4009969312/jobs/6885935975#step:5:311)
Error: buildx failed with: ERROR: failed to solve: executor failed running [/bin/sh -c cd /greenplum-for-kubernetes/greenplum-operator/cmd/greenplumOperator && go build]: exit code: 2

Performance Issues

We haven't been doing real-world testing until recently because we thought we had a disk performance issue and put a replicated local disk solution in place (Piraeus). Unfortunately, even with faster disks, our GP4K clusters are nowhere near as fast as our stand-alone Greenplum cluster running on VMs - even with identical hardware. It could be networking speed but our gpcheckperf numbers look close between the two clusters. We have tried various gpconfig settings and numbers of segments. I suspect those may make a difference on much larger test sets but for our relatively small 3.4 million row x 63 column table (AO columnar), nothing seems to make it faster.

I've compared the configuration values between the two clusters and adjusted a few things in GP4K that have minimal improvement.

We were able to narrow down the slow performance to a representative query that is quite simple. It does include a sub-query but even removing the sub-query does not result in noticeable speed improvement.

SELECT 
CAST(t."CURRENCY__LOCAL" AS TEXT) AS "CURRENCY__LOCAL", 
CAST(count(*) AS BIGINT) AS cnt, 
CAST(
	(
		SELECT count(distinct(st."CURRENCY__LOCAL")) AS count_1 
		FROM my_schema.my_table as st
	) AS BIGINT
) AS unique_values 
FROM my_schema.my_table as t
GROUP BY t."CURRENCY__LOCAL" 
ORDER BY CAST(count(*) AS BIGINT) DESC, CAST(t."CURRENCY__LOCAL" AS TEXT) ASC  
LIMIT 1000;

This is a sample of our test results (Averages over 5 runs):

  • Greenplum cluster (CentOS 7 with Greenplum v6.11.2 w/ Mirrors)
    • Optimizer OFF - 284ms
    • Optimizer ON - 197ms
  • GP4K (Ubuntu 22.04 with Greenplum 7 Beta 0 w/o Mirrors)
    • Optimizer OFF - 1933ms
    • Optimizer ON - 1624ms

Network and disk speeds are very close since they are on the same GCP hardware and using local disks for reads. We are using GKE Dataplane V2 so it is using the latest Cilium eBPF connections for networking. The 800% slower results for GP4K are a complete mystery at this point after two weeks of testing. We can't find a root cause but it is consistently much slower no matter the settings, segment counts, physical hardware, and even switching between local disks and Google SSD PVs.

We also confirmed there was not excessive skew that would cause one segment to do much more work. VACUUM ANALYZE of the table also has no discernible impact since it hasn't had any DDL operations after the initial load.

The nodes used for the testing have no other workloads running on them outside of the Piraeus (local disk manager) and GP4K to ensure consistent availability of resources for the tests.

For reference, our gpcheckperf results are below.

Stand-alone Greenplum cluster

Per host transfer rates
gps01 Tx rate: 3725.75
gps03 Tx rate: 3720.34
gps02 Tx rate: 3681.55
gps04 Tx rate: 3557.49

Per host receive rates
gps01 Rx rate: 3274.11
gps03 Rx rate: 2728.79
gps02 Rx rate: 3880.88
gps04 Rx rate: 4801.35


Summary:
sum = 14685.13 MB/sec
min = 617.46 MB/sec
max = 1914.68 MB/sec
avg = 1223.76 MB/sec
median = 1231.51 MB/sec

GP4K Cluster

Per host transfer rates
segment-a-3 Tx rate: 4885.97
segment-a-0 Tx rate: 5511.73
segment-a-2 Tx rate: 5653.75
segment-a-1 Tx rate: 3526.87

Per host receive rates
b'segment-a-0' Rx rate: 5419.50
b'segment-a-1' Rx rate: 3433.28
b'segment-a-2' Rx rate: 5669.18
b'segment-a-3' Rx rate: 5056.36


Summary:
sum = 19578.32 MB/sec
min = 535.32 MB/sec
max = 3945.47 MB/sec
avg = 1223.64 MB/sec
median = 593.41 MB/sec

The disk performance between the two systems (using dd):

  • Stand-alone Greenplum = 620MB/s per host (RAID0)
  • GP4K = 603MB/s per host (LVM Pool)

make unit error on MacOS ARM

make unit
go install -modfile tools/go.mod github.com/golangci/golangci-lint/cmd/golangci-lint
make -C greenplum-operator controller-gen
go get -modfile ../tools/go.mod sigs.k8s.io/controller-tools/cmd/[email protected]
ginkgo -p -r -skipPackage=integration ./...
make: ginkgo: No such file or directory
make: *** [unit] Error 1

build greenplum-instance image

hi, I'm trying to build the greenplum-instance image, where can I find the image gcr.io/gp-kubernetes/ubuntu-gpdb-ent ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.