Git Product home page Git Product logo

spanner-autoscaler's Introduction

Spanner Autoscaler

actions-workflow-test release license

Spanner Autoscaler is a Kubernetes Operator to scale Google Cloud Spanner automatically based on Cloud Spanner Instance CPU utilization like Horizontal Pod Autoscaler.

Overview

Cloud Spanner is scalable. When CPU utilization becomes high, we can reduce it by increasing compute capacity.

Spanner Autoscaler is created to reconcile Cloud Spanner compute capacity like Horizontal Pod Autoscaler by configuring a compute capacity range and targetCPUUtilization.

When CPU Utilization(High Priority) is above (or below) targetCPUUtilization, Spanner Autoscaler tries to bring it back to the threshold by calculating desired compute capacity and then increasing (or decreasing) compute capacity.

The pricing of Cloud Spanner states that any compute capacity which is provisioned will be billed for a minimum of one hour, so Spanner Autoscaler maintains the increased compute capacity for about an hour. Spanner Autoscaler has --scale-down-interval flag (default: 55min) for achieving this.

While scaling down, removing large amounts of compute capacity at once (like 10000 PU -> 1000 PU) can cause a latency increase. Therefore, Spanner Autoscaler decreases the compute capacity in steps to avoid such large disruptions. This step size can be provided with the scaledownStepSize parameter (default: 2000 PU).

Scheduled scaling feature

If there are some batch jobs or any other compute intensive tasks which are run periodically on the Cloud Spanner, it is now possible to bump up the scaling range only for a specified duration. For example, the following SpannerAutoscaleSchedule will add an extra compute capacity of 600 Processing Units to the spanner instance every day at 2 o'clock, just for 3 hours:

apiVersion: spanner.mercari.com/v1beta1
kind: SpannerAutoscaleSchedule
metadata:
  name: spannerautoscaleschedule-sample
  namespace: your-namespace
spec:
  targetResource: spannerautoscaler-sample
  additionalProcessingUnits: 600
  schedule:
    cron: "0 2 * * *"
    duration: 3h

Installation

Spanner Autoscaler can be installed using KPT by following 2 steps:

  1. Deploy the operator through kpt

    $ kpt pkg get https://github.com/mercari/spanner-autoscaler/config spanner-autoscaler-pkg
    $ kpt live init spanner-autoscaler-pkg/kpt
    $ kpt live install-resource-group
    
    ## Append '--dry-run' to the below line to just
    ## check the resources which will be created
    $ kustomize build spanner-autoscaler-pkg/kpt | kpt live apply -
    
    ## To uninstall, use the following
    $ kustomize build spanner-autoscaler-pkg/kpt | kpt live destroy -

    ℹ️ TIP: Instead of kpt, you can also use kubectl directly to install the resources (use ?ref=master for latest version) as follows:

    $ kustomize build "https://github.com/mercari/spanner-autoscaler.git/config/default?ref=v0.4.1" | kubectl apply -f -

    These resources can then be adopted by kpt by using the --inventory-policy=adopt flag while using kpt live apply command. More info.

  2. Create a Custom Resource for managing a spanner instance

    $ kubectl apply -f spanner-autoscaler-pkg/samples

    Examples of CustomResources can be found below.
    For authentication using a GCP service account JSON key, follow these steps to create a k8s secret with credentials.

CRD reference

Examples

Single Service Account using Workload Identity:

apiVersion: spanner.mercari.com/v1beta1
kind: SpannerAutoscaler
metadata:
  name: spannerautoscaler-sample
  namespace: your-namespace
spec:
  targetInstance:
    projectId: your-gcp-project-id
    instanceId: your-spanner-instance-id
  scaleConfig:
    processingUnits:
      min: 1000
      max: 4000
    scaledownStepSize: 1000
    targetCPUUtilization:
      highPriority: 60

Using Service Account JSON key for each SpannerAutoscaler:

  apiVersion: spanner.mercari.com/v1beta1
  kind: SpannerAutoscaler
  metadata:
    name: spannerautoscaler-sample
    namespace: your-namespace
  spec:
    targetInstance:
      projectId: your-gcp-project-id
      instanceId: your-spanner-instance-id
+   authentication:
+     iamKeySecret:
+       namespace: your-namespace
+       name: spanner-autoscaler-gcp-sa
+       key: service-account
    scaleConfig:
      processingUnits:
        min: 1000
        max: 4000
      scaledownStepSize: 1000
      targetCPUUtilization:
        highPriority: 60

Using Service Accounts with Workload Identity and impersonation:

  apiVersion: spanner.mercari.com/v1beta1
  kind: SpannerAutoscaler
  metadata:
    name: spannerautoscaler-sample
    namespace: your-namespace
  spec:
    targetInstance:
      projectId: your-gcp-project-id
      instanceId: your-spanner-instance-id
+   authentication:
+     impersonateConfig:
+       targetServiceAccount: GSA_SPANNER@TENANT_PROJECT.iam.gserviceaccount.com
    scaleConfig:
      processingUnits:
        min: 1000
        max: 4000
      scaledownStepSize: 1000
      targetCPUUtilization:
        highPriority: 60

GCP Setup

On your GCP project, you will need to enable spanner.googleapis.com and monitoring.googleapis.com APIs.

Create service account

You will need to create at least one GCP service account, which will be used by the spanner-autoscaler controller to authenticate with GCP for modifying compute capacity of a Spanner instance. This service account should have the following roles:

  • roles/spanner.admin (on the Spanner instances)
  • roles/monitoring.viewer (on the project)

For fine grained access control, you should create one GCP service account per Spanner instance. This way, you will be able to specify a different service account in each of SpannerAutoscaler CRD resources you create later.

Authenticate with service account JSON key

Generate a JSON key for the GCP service account (created above) and put it in a Kubernetes Secret:

$ kubectl create secret generic spanner-autoscaler-gcp-sa --from-file=service-account=./service-account-key.json -n your-namespace

ℹ️ By default, spanner-autoscaler will have read access to secrets named spanner-autoscaler-gcp-sa in any namespace. If you wish to use a different name for your secret, then you need to explicitly create a Role and a RoleBinding (example) in your namespace. This will provide spanner-autoscaler with read access to any secret of your choice.

You can then refer to this secret in your SpannerAutoscaler CRD resource with serviceAccountSecretRef field [example].

[Optional] Advanced methods for GCP authentication

Following are some other advanced methods which can also be used for GCP authentication:

Details

    Enable Workload Identity

    Details

    You can configure the controller (spanner-autoscaler-controller-manager) to use GKE Workload Identity feature for key-less GCP access. Steps to do this:

    1. Enable Workload Identity on the GKE cluster - Ref.
    2. Let's call the Kubernetes service account of the controller (spanner-autoscaler/spanner-autoscaler-controller-manager) as KSA_CONTROLLER and the GCP service account created above as GSA_CONTROLLER.
      Now configure Workload Identity between KSA_CONTROLLER and GSA_CONTROLLER with the following steps:
      1. Allow KSA_CONTROLLER to impersonate GSA_CONTROLLER by creating an IAM Policy binding:
        $ gcloud iam service-accounts add-iam-policy-binding --role roles/iam.workloadIdentityUser --member "serviceAccount:PROJECT_ID.svc.id.goog[spanner-autoscaler/spanner-autoscaler-controller-manager]" GSA_CONTROLLER@PROJECT_ID.iam.gserviceaccount.com`
      2. Add annotation
        $ kubectl annotate serviceaccount  --namespace spanner-autoscaler spanner-autoscaler-controller-manager iam.gke.io/gcp-service-account=GSA_CONTROLLER@PROJECT_ID.iam.gserviceaccount.com`

    Single service account with Workload Identity

    Details

    The Kubernetes service account which is used for running the spanner-autoscaler controller can be bound to the GCP service account (created above) through Workload Identity. If this is done, there is no need to provide serviceAccountSecretRef or impersonateConfig authentication parameters in the spec section of the SpannerAutoscaler CRD resources.

    An example for this is shown here.

    Using service accounts with Workload Identity and Impersonation

    Details

    In this method there are 3 service accounts involved (2 GCP service accounts and 1 Kubernetes service account):

    • GSA_SPANNER: The GCP Service Account (created above) which has the correct permissions for modifying Spanner compute capacity
    • GSA_CONTROLLER: The GCP Service Account which is used for Workload Identity with the GKE cluster
    • KSA_CONTROLLER: The Kubernetes Service Account which is used for running the spanner-autoscaler controller pod in the GKE

    After enabling Workload Identity between GSA_CONTROLLER and KSA_CONTROLLER, you can configure GSA_CONTROLLER as roles/iam.serviceAccountTokenCreator of the GSA_SPANNER service account as follows:

    $ gcloud iam service-accounts add-iam-policy-binding $GSA_SPANNER --member=serviceAccount:$GSA_CONTROLLER --role=roles/iam.serviceAccountTokenCreator

    This will allow KSA_CONTROLLER to use GSA_CONTROLLER and impersonate (act as) GSA_SPANNER for a short period of time (by using a short-lived token). An example for this can be found here.

    TIP: Custom role with minimum permissions

    Details

    Instead of predefined roles, you can define and use a custom role with lesser privileges for Spanner Autoscaler. To scale the target Cloud Spanner instance, the weakest predefined role is roles/spanner.admin. To observe the CPU usage metric of the project of the Spanner instance, the weakest predefined role is roles/monitoring.viewer.
    The custom role can be created with just the following permissions:

    • spanner.instances.get
    • spanner.instances.update
    • monitoring.timeSeries.list

Development and Contribution

See docs/development.md and CONTRIBUTING.md respectively.

ℹ️ Migration from 0.3.0 to 0.4.0:

The older version 0.3.0 (with apiVersion: spanner.mercari.com/v1alpha1) is now deprecated in favor of 0.4.0 (with apiVersion: spanner.mercari.com/v1beta1).

Version 0.4.0 is backward compatible with 0.3.0, but there is a restructuring of the SpannerAutoscaler resource definition and names of many fields have changed. Thus it is recommended to go through the SpannerAutoscaler CRD reference and replace v1alpha1 resources with v1beta1 spec definition.

License

Spanner Autoscaler is released under the Apache License 2.0.

⚠️ NOTE:

  1. This project is currently in active development phase and there might be some backward incompatible changes in future versions.
  2. Spanner Autoscaler watches High Priority CPU utilization only. It doesn't watch Low Priority CPU utilization and Rolling average 24 hour utilization.
  3. It doesn't check the storage size and the number of databases as well. You must take care of these metrics by yourself.

ℹ️ More information and background of spanner-autoscaler is available on this blog!

spanner-autoscaler's People

Contributors

110y avatar apstndb avatar dependabot[bot] avatar kaustubhhiware avatar kazuki-hanai avatar micnncim avatar rustycl0ck avatar shuheiktgw avatar tjun avatar tkuchiki avatar w1mvy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spanner-autoscaler's Issues

Formal timezone support in SpannerAutoscalerSchedule

What you want to add

I think there are two options.

  1. Document the current CRON_TZ support which is handled by github.com/robfig/cron/v3.
  2. Implement timeZone field as like Kubernetes CronJob

Why this is needed

Internal users of Mercari and many external users want to define schedules in local timezone.
For example, there are cases where the time zone is important to represent the beginning of the month in the calendar.
e.g. YYYYYY-MM-01 00:00:00 JST can be any of YYYY-${MM-1}-{28,29,30,31} 15:00:00 UTC so it is hard to be written in UTC.

Document least privileged custom role

What you want to add

Document definition of the least privileged role.

Why this is needed

Create a GCP service account for Spanner Autoscaler with roles/spanner.admin and roles/monitoring.viewer.

spanner.admin is a very strong role with database contents access.
We should be able to use a least privileged custom role.

Display current Spanner's status when execute "kubectl get spannerautoscaler"

Hello. Thank you for the great tool.

What you want to add

I want to display Spanner's status (current node/current cpu), when execute kubectl get spannerautoscaler

Why this is needed

Now, kubectl get spannerautoscaler displayed only the setting values. (target cpu/min node/max node/etc...)
So, If you want to know Spanner' status , you need to display the GCP console, etc.

I want to be able to check both configuration and setting values at once, like kubectl get hpa.

Since I already have a diff for this feature, can I create a PR for it?

$ kubectl get spannerautoscaler
NAME                 PROJECT ID   INSTANCE ID     MIN NODES   MAX NODES   CURRENT NODES   MIN PUS   MAX PUS   CURRENT PUS   TARGET CPU   CURRENT CPU   AGE
spanner-autoscaler   my-project   test-instance                           0               100       300       100           30           4             3d20h

Spanner Autoscaler Operator fails guess AuthType fails to guess authentication config

What happened

I have a SpannerAutoscaler resource below:

apiVersion: spanner.mercari.com/v1beta1
kind: SpannerAutoscaler
metadata:
  labels:
    app: spanner-autoscaler
    version: main
  name: xxx
  namespace: xxx
spec:
  authentication:
    impersonateConfig:
      targetServiceAccount: "[email protected]"
  scaleConfig:
    processingUnits:
      max: 1000
      min: 100
    scaledownStepSize: 2000
    targetCPUUtilization:
      highPriority: 60
  targetInstance:
    instanceId: xxx
    projectId: xxx

Error:

... -dev/spanner-autoscaler","error":"rpc error: code = PermissionDenied desc = Caller is missing IAM permission spanner.instances.get on resource projects/xxx/instances/xxx.","stacktrace":"github.com/mercari/spanner-autoscaler/internal/syncer.(*syncer).Start\n\t/workspace/internal/syncer/syncer.go:209"} ...

What you expected to happen

Spanner Autoscaler Operator should be able to impersonate provided service account and fetch metrics from GCP.

How to fix

Add type: impersonation to .spec.authentication :

apiVersion: spanner.mercari.com/v1beta1
kind: SpannerAutoscaler
metadata:
  labels:
    app: spanner-autoscaler
    version: main
  name: xxx
  namespace: xxx
spec:
  authentication:
    impersonateConfig:
      targetServiceAccount: "[email protected]"
    type: impersonation
  scaleConfig:
    processingUnits:
      max: 1000
      min: 100
    scaledownStepSize: 2000
    targetCPUUtilization:
      highPriority: 60
  targetInstance:
    instanceId: xxx
    projectId: xxx

Environment

Fails in both:

  • mercari/spanner-autoscaler:v0.4.0
  • mercari/spanner-autoscaler:v0.4.1

Enable linting at CI for PRs

What you want to add

Golang files (and potentially any other files too) should pass lint checks when a PR is created.

Why this is needed

To maintain best practices and common standards.

Create a CI workflow for automated release

What you want to add

Whenever a tag is created, a github release should automatically be created (preferably with a list of commits or PRs since the last tag).

Why this is needed

There is currently no automated way to create a release. Although there is a way to create a tag from PR labels, that does not work for previously merged PRs.
If we have an automated way to create a release whenever a tag is created, it can work with automatically created tags from PRs, as well as, it can also work if we manually create a tag on a previous commit (after a PR has been merged).

Upgrade and fix broken `kpt` package files

What you want to add

Current state of the kpt package of spanner-autoscaler is broken:

$ kpt pkg get https://github.com/mercari/spanner-autoscaler /tmp/sp-as/
Package "sp-as":
Fetching https://github.com/mercari/spanner-autoscaler@master
From github.com:mercari/spanner-autoscaler
 * branch            master     -> FETCH_HEAD
Adding package "".
Error: Kptfile at "/var/folders/jn/_l5nj45j27vbqvrld35zqs0c0000gp/T/kpt-get-843547213/kpt" has an old version ("v1alpha1") of the Kptfile schema.
Please update the package to the latest format by following https://kpt.dev/installation/migration.

Why this is needed

For keeping the distribution package up to date and working

Error when trying to apply v1alpha1 using single ServiceAccount with Workload Identity.

What happened

Upgraded SpannerAutoscaler from v0.2.1 to v0.4.3. I then applied the v1alpha1 changes and got the following error.

The SpannerAutoscaler "spanner-autoscaler" is invalid: spec.serviceAccountSecretRef: Invalid value: "null": spec.serviceAccountSecretRef in body must be of type object: "null"

The v1alpha1 manifest uses single ServiceAccount with Workload Identity as follows. ref

---
apiVersion: spanner.mercari.com/v1alpha1
kind: SpannerAutoscaler
metadata:
  name: spannerautoscaler-sample-alpha
  namespace: test
spec:
  scaleTargetRef:
    projectId: test-project
    instanceId: test-instance
  minNodes: 1
  maxNodes: 4
  maxScaleDownNodes: 2
  targetCPUUtilization:
    highPriority: 80

What you expected to happen

I expect that even v1alpha1 that uses WorkloadIdentity and one ServiceAccount will convert correctly.

When convert from v1alpha1 using single ServiceAccount with Workload Identity to v1beta1, I think that this is due to the fact that AuthTypeADC is not set in spec.authentication.type during conversion, and have made the following fixes.

Please check it and if it looks OK, I will create a Pull Request.

https://github.com/mercari/spanner-autoscaler/compare/master...w1mvy:fix/convertfrom-v1alpha-when-single-sa?expand=1

How to reproduce it

apply v1alpha1 using single ServiceAccount with Workload Identity in env spanner-autoscaler v0.4.3

Environment

  • GKE
  • SpannerAutoscaler: v0.4.3

Support schedule configuration like CronJob

What you want to add

Add schedule node count schedule configuration like this

spec:
  scaleTargetRef:
    projectId: your-gcp-project-id
    instanceId: your-spanner-instance-id
  minNodes: 1
  maxNodes: 4
  cron:
    schedule: "*/1 * * * *"
    period: 1H
    minNodes: 4

Why this is needed

We sometimes want to increase node num before starting a batch job.

Support Application Default Credentials and Workload Identity

What you want to add

Make spanner-autoscaler able to use Application Default Credential of the controller Pod if serviceAccountSecretRef field is not populated.

Why this is needed

Currently, spanner-autoscaler does only support deployments like Service Account JSON in Secret for each namespace via serviceAccountSecretRef field.
It is focused on multi-tenant deployments and not usual in single-tenant deployments.

Additionally, spanner-autoscaler should support GKE Workload Identity.

Upgrade libraries

What you want to add

Upgrade some libraries(especially kubebuilder, controller-runtime)

Why this is needed

Some libraries are outdated.

Suppor low cost instances(a.k.a. Processing Units)

What you want to add

Supports processing units as well as today's node count.

Currently, SpannerAutoscaler custom resource has node count related fields.

  • minNodes: Minimum number of Cloud Spanner nodes.
  • maxNodes: Maximum number of Cloud Spanner nodes. It should be higher than minNodes and not over quota.
  • maxScaleDownNodes(optional): Maximum number of nodes scale down at once. Default is 2.

and status.currentNodes, status.desiredNodes

They should be enhanced to support Processing Units.

We want to maintain backward compatibility.

Why this is needed

Processing Units for low cost instances are announced and seems to be launched as Private Preview.

spanner-autoscaler should support Processing Units.

References

Client library already supports ProcessingUnits.

Support multi-tenancy with Workload Identity

What you want to add

Add Service Account impersonation related fields into the SpannerAutoscaler custom resource.
It should be mutually exclusive with serviceAccountSecretRef field.

Why this is needed

Currently, spanner-autoscaler doesn't support multi-tenancy with Workload Identity. In other words, multi-tenancy requires per-namespace Service Account JSON keys.
spanner-autoscaler should support key-less multi-tenancy.

refs #27

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.