Git Product home page Git Product logo

quota-monitoring-solution's Introduction

Quota Monitoring and Alerting

An easy-to-deploy Looker Studio Dashboard with alerting capabilities, showing usage and quota limits in an organization or folder.

Google Cloud enforces quotas on resource usage for project owners, setting a limit on how much of a particular Google Cloud resource your project can use. Each quota limit represents a specific countable resource, such as the number of API requests made per day to the number of load balancers used concurrently by your application.

Quotas are enforced for a variety of reasons:

  • To protect the community of Google Cloud users by preventing unforeseen spikes in usage.
  • To help you manage resources. For example, you can set your own limits on service usage while developing and testing your applications.

We are introducing a new custom quota monitoring and alerting solution for Google Cloud customers.

1. Summary

Quota Monitoring Solution is a stand-alone application of an easy-to-deploy Looker Studio dashboard with alerting capabilities showing all usage and quota limits in an organization or folder.

1.1 Four Initial Features

key-features

*The data refresh rate depends on the configured frequency to run the application.

2. Architecture

architecture

The architecture is built using Google Cloud managed services - Cloud Functions, Pub/Sub, Dataflow and BigQuery.

  • The solution is architected to scale using Pub/Sub.
  • Cloud Scheduler is used to trigger Cloud Functions. This is also an user interface to configure frequency, parent nodes, alert threshold and email Ids. Parent node could be an organization Id, folder id, list of organization Ids or list of folder Ids.
  • Cloud Functions are used to scan quotas across projects for the configured parent node.
  • BigQuery is used to store data.
  • Alert threshold will be applicable across all metrics.
  • Alerts can be received by Email, Mobile App, PagerDuty, SMS, Slack, Webhooks and Pub/Sub. Cloud Monitoring custom log metric has been leveraged to create Alerts.
  • Easy to get started and deploy with Looker Studio Dashboard. In addition to Looker Studio, other visualization tools can be configured.
  • The Looker Studio report can be scheduled to be emailed to appropriate team for weekly/daily reporting.

3. Deployment Guide

Content

3.1 Prerequisites

  1. Host Project - A project where the BigQuery instance, Cloud Function and Cloud Scheduler will be deployed. For example Project A.

  2. Target Node - The Organization or folder or project which will be scanned for Quota Metrics. For example Org A and Folder A.

  3. Project Owner role on host Project A. IAM Admin role in target Org A and target Folder A.

  4. Google Cloud SDK is installed. Detailed instructions to install the SDK here. See the Getting Started page for an introduction to using gcloud and terraform.

  5. Terraform version >= 0.14.6 installed. Instructions to install terraform here

    • Verify terraform version after installing.
    terraform -version

    The output should look like:

    Terraform v0.14.6
    + provider registry.terraform.io/hashicorp/google v3.57.0

    Note - Minimum required version v0.14.6. Lower terraform versions may not work.

3.2 Initial Setup

  1. In local workstation create a new directory to run terraform and store credential file

    mkdir <directory name like quota-monitoring-dashboard>
    cd <directory name>
  2. Set default project in config to host project A

    gcloud config set project <HOST_PROJECT_ID>

    The output should look like:

    Updated property [core/project].
  3. Ensure that the latest version of all installed components is installed on the local workstation.

    gcloud components update
  4. Cloud Scheduler depends on the App Engine application. Create an App Engine application in the host project. Replace the region. List of regions where App Engine is available can be found here.

    gcloud app create --region=<region>

    Note: Cloud Scheduler (below) needs to be in the same region as App Engine. Use the same region in terraform as mentioned here.

    The output should look like:

    You are creating an app for project [quota-monitoring-project-3].
    WARNING: Creating an App Engine application for a project is irreversible and the region
    cannot be changed. More information about regions is at
    <https://cloud.google.com/appengine/docs/locations>.
    
    Creating App Engine application in project [quota-monitoring-project-1] and region [us-east1]....done.
    
    Success! The app is now created. Please use `gcloud app deploy` to deploy your first app.

3.3 Create Service Account

  1. In local workstation, setup environment variables. Replace the name of the Service Account in the commands below

    export DEFAULT_PROJECT_ID=$(gcloud config get-value core/project 2> /dev/null)
    export SERVICE_ACCOUNT_ID="sa-"$DEFAULT_PROJECT_ID
    export DISPLAY_NAME="sa-"$DEFAULT_PROJECT_ID
  2. Verify host project Id.

    echo $DEFAULT_PROJECT_ID
  3. Create Service Account

    gcloud iam service-accounts create $SERVICE_ACCOUNT_ID --description="Service Account to scan quota usage" --display-name=$DISPLAY_NAME

    The output should look like:

    Created service account [sa-quota-monitoring-project-1].

3.4 Grant Roles to Service Account

3.4.1 Grant Roles in the Host Project

The following roles need to be added to the Service Account in the host project i.e. Project A:

  • BigQuery
    • BigQuery Data Editor
    • BigQuery Job User
  • Cloud Functions
    • Cloud Functions Admin
  • Cloud Scheduler
    • Cloud Scheduler Admin
  • Pub/Sub
    • Pub/Sub Admin
  • Run Terraform
    • Service Account User
    • Enable APIs
    • Service Usage Admin
  • Storage Bucket
    • Storage Admin
  • Scan Quotas
    • Cloud Asset Viewer
    • Compute Network Viewer
    • Compute Viewer
  • Monitoring
    • Notification Channel Editor
    • Alert Policy Editor
    • Viewer
    • Metric Writer
  • Logs
    • Logs Configuration Writer
    • Log Writer
  • IAM
    • Security Admin
  1. Run following commands to assign the roles:

    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/bigquery.dataEditor" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/bigquery.jobUser" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/cloudfunctions.admin" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/cloudscheduler.admin" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/pubsub.admin" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/iam.serviceAccountUser" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/storage.admin" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/serviceusage.serviceUsageAdmin" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/cloudasset.viewer" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/compute.networkViewer" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/compute.viewer" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/monitoring.notificationChannelEditor" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/monitoring.alertPolicyEditor" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/logging.configWriter" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/logging.logWriter" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/monitoring.viewer" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/monitoring.metricWriter" --condition=None
    
    gcloud projects add-iam-policy-binding $DEFAULT_PROJECT_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/iam.securityAdmin" --condition=None

3.4.2 Grant Roles in the Target Folder

SKIP THIS STEP IF THE FOLDER IS NOT THE TARGET TO SCAN QUOTA

If you want to scan projects in the folder, add following roles to the Service Account created in the previous step at the target folder A:

  • Cloud Asset Viewer
  • Compute Network Viewer
  • Compute Viewer
  • Folder Viewer
  • Monitoring Viewer
  1. Set target folder id

    export TARGET_FOLDER_ID=<target folder id like 38659473572>
  2. Run the following commands add to the roles to the service account

    gcloud alpha resource-manager folders add-iam-policy-binding  $TARGET_FOLDER_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/cloudasset.viewer"
    
    gcloud alpha resource-manager folders add-iam-policy-binding  $TARGET_FOLDER_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/compute.networkViewer"
    
    gcloud alpha resource-manager folders add-iam-policy-binding  $TARGET_FOLDER_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/compute.viewer"
    
    gcloud alpha resource-manager folders add-iam-policy-binding  $TARGET_FOLDER_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/resourcemanager.folderViewer"
    
    gcloud alpha resource-manager folders add-iam-policy-binding  $TARGET_FOLDER_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/monitoring.viewer"

    Note: If this fails, run the commands again

3.4.3 Grant Roles in the Target Organization

SKIP THIS STEP IF THE ORGANIZATION IS NOT THE TARGET

If you want to scan projects in the org, add following roles to the Service Account created in the previous step at the Org A:

  • Cloud Asset Viewer
  • Compute Network Viewer
  • Compute Viewer
  • Org Viewer
  • Folder Viewer
  • Monitoring Viewer

org-service-acccount-roles

  1. Set target organization id

    export TARGET_ORG_ID=<target org id ex. 38659473572>
  2. Run the following commands to add to the roles to the service account

    gcloud organizations add-iam-policy-binding  $TARGET_ORG_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com" --role="roles/cloudasset.viewer" --condition=None
    
    gcloud organizations add-iam-policy-binding  $TARGET_ORG_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com"  --role="roles/compute.networkViewer" --condition=None
    
    gcloud organizations add-iam-policy-binding  $TARGET_ORG_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com"  --role="roles/compute.viewer" --condition=None
    
    gcloud organizations add-iam-policy-binding  $TARGET_ORG_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com"  --role="roles/resourcemanager.folderViewer" --condition=None
    
    gcloud organizations add-iam-policy-binding  $TARGET_ORG_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com"  --role="roles/resourcemanager.organizationViewer" --condition=None
    
    gcloud organizations add-iam-policy-binding  $TARGET_ORG_ID --member="serviceAccount:$SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com"  --role="roles/monitoring.viewer" --condition=None

3.5 Download the Source Code

  1. Clone the Quota Management Solution repo

    git clone https://github.com/google/quota-monitoring-solution.git quota-monitorings-solution
  2. Change directories into the Terraform example

    cd ./quota-monitorings-solution/terraform/example

3.6 Set OAuth Token Using Service Account Impersonization

Impersonate your host project service account and set environment variable using temporary token to authenticate terraform. You will need to make sure your user has the Service Account Token Creator role to create short-lived credentials.

gcloud config set auth/impersonate_service_account \
    $SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com

export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)
  • TIP: If you get an error saying unable to impersonate, you will need to unset the impersonation. Have the role added similar to below, then try again.

    # unset impersonation
    gcloud config unset auth/impersonate_service_account
    
    # set your current authenticated user as var
    PROJECT_USER=$(gcloud config get-value core/account)
    
    # grant IAM role serviceAccountTokenCreator
    gcloud iam service-accounts add-iam-policy-binding $SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com \
        --member user:$PROJECT_USER \
        --role roles/iam.serviceAccountTokenCreator \
        --condition=None

3.7 Configure Terraform

  1. Verify that you have these 3 files in your local directory:

    • main.tf
    • variables.tf
    • terraform.tfvars
  2. Open terraform.tfvars file in your favourite editor and change values for the variables.

    vi terraform.tfvars
  3. For region, use the same region as used for App Engine in earlier steps.

    The variables source_code_base_url, qms_version, source_code_zip and source_code_notification_zip on the QMS module are used to download the source for the QMS Cloud Functions from the latest GitHub release.

    To deploy the latest unreleased code from a local clone of the QMS repository, set qms_version to main

3.8 Run Terraform

  1. Run terraform commands

    • terraform init
    • terraform plan
    • terraform apply
      • On Prompt Enter a value: yes
  2. This will:

    • Enable required APIs
    • Create all resources and connect them.

    Note: In case terraform fails, run terraform plan and terraform apply again

  3. Stop impersonating service account (when finished with terraform)

    gcloud config unset auth/impersonate_service_account

3.9 Testing

  1. Initiate first job run in Cloud Scheduler.

    Console

    Click 'Run Now' on Cloud Job scheduler.

    Note: The status of the ‘Run Now’ button changes to ‘Running’ for a fraction of seconds.

    run-cloud-scheduler

    Terminal

    gcloud scheduler jobs run quota-monitoring-cron-job --location <region>
    gcloud scheduler jobs run quota-monitoring-app-alert-config --location <region>
  2. To verify that the program ran successfully, check the BigQuery Table. The time to load data in BigQuery might take a few minutes. The execution time depends on the number of projects to scan. A sample BigQuery table will look like this: test-bigquery-table

3.10 Looker Studio Dashboard setup

  1. Go to the Looker Studio dashboard template. A Looker Studio dashboard will look like this: ds-updated-quotas-dashboard

  2. Make a copy of the template from the copy icon at the top bar (top - right corner) ds-dropdown-copy

  3. Click on ‘Copy Report’ button without changing datasource options ds-copy-report-fixed-new-data-source

  4. This will create a copy of the report and open in Edit mode. If not click on ‘Edit’ button on top right corner in copied template: ds-edit-mode-updated

  5. Select any one table like below ‘Disks Total GB - Quotas’ is selected. On the right panel in ‘Data’ tab, click on icon ‘edit data source’ ds_edit_data_source It will open the data source details ![ds_datasource_config_step_1]img/ds_datasource_config_step_1.png

  6. Replace the BigQuery Project Id, Dataset Id and Table Name to match your deployment. Verify the query by running in BigQuery Editor to make sure query the correct results and there are no syntax errors:

    SELECT 
        project_id,
        added_at,
        region,
        quota_metric,
        CASE
            WHEN CAST(quota_limit AS STRING) ='9223372036854775807' THEN 'unlimited'
        ELSE
            CAST(quota_limit AS STRING)
        END AS str_quota_limit,
        SUM(current_usage) AS current_usage,
        ROUND((SAFE_DIVIDE(CAST(SUM(current_usage) AS BIGNUMERIC), CAST(quota_limit AS BIGNUMERIC))*100),2) AS current_consumption,
        SUM(max_usage) AS max_usage,
        ROUND((SAFE_DIVIDE(CAST(SUM(max_usage) AS BIGNUMERIC), CAST(quota_limit AS BIGNUMERIC))*100),2) AS max_consumption
    FROM
        (
            SELECT
                *,
                RANK() OVER (PARTITION BY project_id, region, quota_metric ORDER BY added_at DESC) AS latest_row
            FROM
                `[YOUR_PROJECT_ID].quota_monitoring_dataset.quota_monitoring_table`
        ) t
    WHERE
        latest_row=1
        AND current_usage IS NOT NULL
        AND quota_limit IS NOT NULL
        AND current_usage != 0
        AND quota_limit != 0
        GROUP BY
        project_id,
        region,
        quota_metric,
        added_at,
        quota_limit
  7. After making sure that query is returning results, replace it in the Data Studio, click on the ‘Reconnect’ button in the data source pane. ds_data_source_config_step_3

  8. In the next window, click on the ‘Done’ button. ds_data_source_config_step_2

  9. Once the data source is configured, click on the ‘View’ button on the top right corner. Note: make additional changes in the layout like which metrics to be displayed on Dashboard, color shades for consumption column, number of rows for each table etc in the ‘Edit’ mode. ds-switch-to-view-mode

3.11 Scheduled Reporting

Quota monitoring reports can be scheduled from the Looker Studio dashboard using ‘Schedule email delivery’. The screenshot of the Looker Studio dashboard will be delivered as a pdf report to the configured email Ids.

ds-schedule-email-button

3.11 Alerting

The alerts about services nearing their quota limits can be configured to be sent via email as well as following external services:

  • Slack
  • PagerDuty
  • SMS
  • Custom Webhooks

3.11.1 Slack Configuration

To configure notifications to be sent to a Slack channel, you must have the Monitoring Notification Channel Editor role on the host project.

3.11.1.1 Create Notification Channel
  1. In the Cloud Console, use the project picker to select your Google Cloud project, and then select Monitoring, or click the link here: Go to Monitoring
  2. In the Monitoring navigation pane, click Alerting.
  3. Click Edit notification channels.
  4. In the Slack section, click Add new. This brings you to the Slack sign-in page:
    • Select your Slack workspace.
    • Click Allow to enable Google Cloud Monitoring access to your Slack workspace. This action takes you back to the Monitoring configuration page for your notification channel.
    • Enter the name of the Slack channel you want to use for notifications.
    • Enter a display name for the notification channel.
  5. In your Slack workspace:
    • Invite the Monitoring app to the channel by sending the following message in the channel:
    • /invite @Google Cloud Monitoring
    • Be sure you invite the Monitoring app to the channel you specified when creating the notification channel in Monitoring.
3.11.1.2 Configuring Alerting Policy
  1. In the Alerting section, click on Policies.
  2. Find the Policy named ‘Resource Reaching Quotas’. This policy was created via Terraform code above.
  3. Click Edit.
  4. It opens an Edit Alerting Policy page. Leave the current condition metric as is, and click on Next.
  5. In the Notification Options, Select the Slack Channel that you created above.
  6. Click on Save.

You should now receive alerts in your Slack channel whenever a quota reaches the specified threshold limit.

4. Release Note

v4.0.0: Quota Monitoring across GCP services

New

  • The new version provides visibility into Quotas across various GCP services beyond the original GCE (Compute).
  • New Looker Studio Dashboard template reporting metrics across GCP services

Known Limitations

  • The records are grouped by hour. Scheduler need to be configured to start running preferably at the beginning of the hour.
  • Out of the box solution is configured to scan quotas ‘once every day’. The SQL query to build the dashboard uses current date to filter the records. If you change the frequency, make changes to the query to rightly reflect the latest data.

v4.4.0

New in v4.4.0

  • The new version includes a fix that converts the data pull process to use the Montoring Query Language (MQL). This allows QMS to pull the limit and current usage at the exact same time, so reporting queries can be more tightly scoped, eliminating over reporting problems.

    To upgrade existing installations:

    • Re-run the Terraform, to update the Cloud Functions and Scheduled Query
    • Update the SQL used in the Looker Studio dashboard according to Step #7 of 3.10 Looker Studio Dashboard setup.

5. What is Next

  1. Graphs (Quota utilization over a period of time)
  2. Search project, folder, org, region
  3. Threshold configurable for each metric

6. Getting Support

Quota Monitoring Solution is a project based on open source contributions. We'd love for you to report issues, file feature requests, and send pull requests (see Contributing). Quota Monitoring Solution is not officially covered by the Google Cloud product support.

7. Contributing

quota-monitoring-solution's People

Contributors

anuradha-bajpai-google avatar bgood avatar dependabot[bot] avatar mikesparr avatar quota-monitoring-solution-bot avatar shadowshot-x avatar ypenn21 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

quota-monitoring-solution's Issues

Real time Quota Monitoring

Currently whenever the program runs, it fetches the Quota usage and Limit data. The frequency could be every hour or every day. This requirement is to make Quota usage & limit available in real-time primarily for alerting

Cloud Functions error logging

Customer noticed that Cloud Functions quotaMonitoringListProjects and quotaMonitoringScanProjects generate plenty (thousands) of unneccessary logs.

Can we disable it? Any reasons for those functions to run? For Customer, they are not needed and generate too much noise in the logs

Failed to find a usable hardware address from the network interfaces; using random bytes: ba:61:09:1a:36:5a:d6:16.

"Function invocation was interrupted. Error: function terminated. Recommended action: inspect logs for termination reason. Additional troubleshooting documentation can be found at https://cloud.google.com/functions/docs/troubleshooting#logging "
"Compute Engine API has not been used in project 698303822038 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/compute.googleapis.com/overview?project=698303822038 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry."
"The request was aborted because there was no available instance. Additional troubleshooting documentation can be found at: https://cloud.google.com/functions/docs/troubleshooting#scalability"

Possibility to set time window when quota data is collected

Customer would like to have a possibility to set time frame when quota data is collected so it helps in monitoring closer to realtime.

Many of their workloads are spiky which using existing quota level collection are hard to monitor. Additionally they would like to have a possibility to set collection frequency to lower values like 1min for instance

Track the maximum usage as well as current usage against quota limits

Sometimes usage against quota will fluctuate within the QMS polling window and could cause issues where usages would exceed the limit. This is particularly the case for rate quotas, since they tend to fluctuation more often, which can result in quota errors that are more spurious in nature. As such, it QMS should also track the maximum usage in addition to the current usage.

For this enhancement QMS will introduce a longer look back period (7 days) in the quota query and use a new data model for BigQuery.

Proposed BQ Table schema:

Column Name Type Description
project_id String Project Id the quota metric applies to
added_at Timestamp Time at which the quota data was retrieved
region String Region the quota metric applies to
quota_metric String Quota metric
limit_name String Name of the limit
current_usage Integer Current usage against the quota
max_usage Integer Maximum usage against the quota in the query window
quota_limit Integer Quota limit for the metric
threshold Integer Alerting threshold for the quota metric

Proposed improvement to the deployment guide

Feature request

Proposed improvement to the deployment guide

Context

Deployment guide is easy to follow in general terms but i would like to propose a couple of improvements.

During the guide several shell variables are defined, as follows:

DEFAULT_PROJECT_ID
SERVICE_ACCOUNT_ID
DISPLAY_NAME
SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com
TARGET_ORG_ID
TARGET_FOLDER_ID

Proposed improvement:
Define a shell variable for the region:
export REGION_ID="region" # example export REGION_ID="us-east1"

Use the region variable in the App Eengine command:
gcloud app create --region=$REGION_ID

Add an step to list all the previously defined variables with printf so it generates a easy to follow list that will help to modify the terraform file: terraform/terraform.tfvars

For example we can guide people to run:

printf "\nProjectID: $DEFAULT_PROJECT_ID \n
ServiceAccountID: $SERVICE_ACCOUNT_ID \n
ServiceAccountDisplayName: $DISPLAY_NAME \n
serviceAccount: $SERVICE_ACCOUNT_ID@$DEFAULT_PROJECT_ID.iam.gserviceaccount.com \n
TargetOrganizationID: $TARGET_ORG_ID \n
TargetFolderID: $TARGET_FOLDER_ID \n
RegionID: $REGION_ID\n"

This will generate an output like:
ProjectID: shared-infra-project-310517
ServiceAccountID: sa-shared-infra-project-310517
ServiceAccountDisplayName: sa-shared-infra-project-310517
serviceAccount: sa-shared-infra-project-310517@shared-infra-project-310517.iam.gserviceaccount.com
TargetOrganizationID: 188152774216
TargetFolderID: 218655808095
RegionID: us-east1

This output is an easy guide to modify the correct values in terraform/terraform.tfvars.

Impact

Improve ease of deployment

Improved error handling and descriptions for Cloud Functions

Functions quotaMonitoringListProjects and quotaMonitoringScanProjects often generate errors which are not too descriptive, examples below:

Despite those errors Terraform script finishes successfully. Any chance to improve those?

Logging bucket location setting as a parameter

Customer would like to have a possibility to set logging bucket location as a parameter during configuration. Currently it is set to US as default. It is important for them, as they have org policy restriction for specific regions only.

Migrated issue: Iterating on Paged Response objects after their "client" has been closed #865

Issue migrated from: GoogleCloudPlatform/professional-services#865

The issue is that PagedResponseIterators cannot be used once their associated service client is closed. This appears to be a pattern in this repo so I am opening this issue.

I debugged this for a customer case, wanted to post it as an issue here to be addressed. The example below is a specific one, however, I see this pattern across this repo and it should be addressed.

The function loadBigQueryTable(...) is iterating on the second timeSeriesList argument [1]. This argument is created by a call to getQuota [2], and is a paged-response wrapper on the MetricService.ListTimeSeries(...) API. This API returns paged responses so when there are enough time series results, there will be multiple API calls. In getQuotas(...), a MetricServiceClient is created [3] and it is "closed" in getQuotas(...). However, getQuotas(...) is returning projectQuotas which has an internal reference to metricServiceClient. When it goes to use that internal reference, the RejectedExecutionException is thrown.

The easiest fix, is for getQuotas(...) to iterate through all of projectQuotas and to return a List or some other in-memory copy of the results which does not rely on a future remote call.

[1] Iterating on the timeSeriesListArgument: https://github.com/GoogleCloudPlatform/professional-services/blob/main/tools/quota-monitoring-alerting/java/quota-scan/src/main/java/functions/ScanProjectQuotasHelper.java#L136

[2] Call to getQuota which creates the paged response:

professional-services/tools/quota-monitoring-alerting/java/quota-scan/src/main/java/functions/ScanProjectQuotas.java

Line 126 in f96748b

ListTimeSeriesPagedResponse projectQuotas = getQuota(gcpProject.getProjectName(), filter);
[3] Creation of MetricServiceClient:

professional-services/tools/quota-monitoring-alerting/java/quota-scan/src/main/java/functions/ScanProjectQuotasHelper.java

Line 94 in f96748b

try (MetricServiceClient metricServiceClient = MetricServiceClient.create()) {
.

Ommit zones when listing quotas per region

Customer's feedback is that we are logging zones as regions which generate unnecessary data in the database. In their case there is 10k+ entries in the database which could be skipped.

Any plans to stop logging this entries?

Avoid downloading service account keys file for better security

Instead of requiring the download of service account keys files, perhaps impersonate the service account and use short-lived access token for running the terraform code.

Example:

# impersonate service account to generate short-lived access token
gcloud config set auth/impersonate_service_account \
    $HOST_PROJECT_SA_EMAIL

# set ENV var for terraform provider access via token
export GOOGLE_OAUTH_ACCESS_TOKEN=$(gcloud auth print-access-token)

# run terraform
terraform init
terraform plan
terraform apply

# stop impersonating service account
gcloud config unset auth/impersonate_service_account

Current implementation, however, requires the ${var.creds_file} unless you comment out line 26 in the terraform/example/main.tf file like the following:

provider "google" {
  # credentials = file("${var.creds_file}")
  project     = var.project_id
  region      = var.region
}

Although this will likely be run in a local and trusted environment, demonstrating the security best practice using short-lived tokens and not downloading long-lived keys (which also may be restricted by an Org policy) could be beneficial.

Terraform depends_on for the downloaded file

resource "google_storage_bucket_object" "source_code_object" {
name = "${var.qms_version}-${var.source_code_zip}"
bucket = google_storage_bucket.bucket_gcf_source.name
source = var.source_code_zip
}

https://github.com/google/quota-monitoring-solution/blob/cca469a27e3d5e9b6a700b0bb6f793e804dd8e31/terraform/modules/qms/main.tf#L192-197

I believe the above resources need to depend on the local-exec to happen before this step is attempted.

Additional Info:
I'm using Terraform Cloud Workspaces as my environment. The execution of resources is not guaranteed to be linear (top-down) and should include a depends_on meta argument like defined in these docs.

Separate out the service account used for deployment and ongoing operation

As a best practice we should use different service accounts for deploying QMS and the normal operation of collecting and reporting on quotas.

Here is a prospective list of the normal operation permissions.

  • Resources in Project (Project A - where QMS will be deployed) will be provisioned using CI/CD pipeline and terraform service account will have permission to create resources.
  • Following predefined roles needs to be granted to run the application:
    • At Project level (Project A):
      • For Quota Scanning and reporting on Dashboard
        • roles/bigquery.jobUser
        • roles/cloudfunctions.developer
        • roles/cloudscheduler.jobRunner
        • roles/pubsub.publisher
        • roles/pubsub.subscriber
      • For Alerting
        • roles/monitoring.notificationChannelEditor
        • roles/monitoring.alertPolicyEditor
        • roles/monitoring.metricWriter
        • roles/logging.configWriter
        • roles/logging.logWriter
    • At Folder level - If target node is a folder for projects in folder needs to be scanned:
      • For Quota Scanning and reporting on Dashboard
        • roles/cloudasset.viewer
        • Custom role with following permissions:
          • monitoring.timeSeries.list
          • resourcemanager.folders.get
          • Resourcemanager.projects.get
    • At Org level - If target node is a org for projects in org needs to be scanned:
      • For Quota Scanning and reporting on Dashboard
        • roles/cloudasset.viewer
        • Custom role with following permissions:
          • monitoring.timeSeries.list
          • resourcemanager.organizations.get
          • Resourcemanager.projects.get

Include a Metric Description

Users may not always be familiar with what a particular metric measures. Providing help text or a link could help.

Change GCS Bucket Location in Terraform

The location of the GCS bucket in the terraform script is hard coded to global, this causes issues when the client has resource location constraint as an organisation policy.

Resolution - Change the default location in the main.tf file to the correct location.

cc: @pallavraj04

Exclude disabled APIs from scanning

Customer would like to have a possibility to exclude from quota scanning APIs which are not enabled. For example they have projects in which Compute API is not enabled, since QMS is scanning all APIs they get errors like below:

""Compute Engine API has not been used in project x before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/compute.googleapis.com/overview?project=x then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.""

Historical graphs per quota

For each quota, display a small graph that shows the utilization over a defined time and allows identification of spikes.

Cloud Functions timing out in APAC region

If you are deploying the solution in APAC region then the terraform script fails while creating the cloud functions due to timeout error after approx 10 minutes because the cloud function tries to download the libraries from US region and hence it fails after default timeout.

cc: @@pallavraj04

Add way to easily navigate to the quota increase screen

Once a user has identified a quota that needs to be modified they need to navigate away from the dashboards to a different screen to request an update to the quota. Can we provide a link next to the metric on Dashboard to raise Quota Increase Request

QMS_app_alerting.csv

Hi, I have faced a few issues and I'd really appreciate any help with them.

  • Job quota-monitoring-app-alert-config triggers configAppAlerts function which throws an error because QMS_app_alerting.csv is not present in the bucket PROJECT_NAME-gcf-source.
    From reading the source code I realised that file is used to initialise BQ table.
    Is that file required?

  • What is the range for threshold variable ?

  • I haven't received any meaningful Alerts but occasionally I receive Alerts via email with odd descriptions like this:

data: Quota metric usage alert details ## 96 quota metric usages above threshold |ProjectId | Scope | Metric | Consumption(%) | |:--------|:--------|:---------|:---------| ... AND this continues...

Improve documentation for best practices when deploying into outside of North America

When deploying into regions outside of North America as a best practice the Cloud Functions should be configured to use a Maven repository located more closely to the Google Cloud region being used. For example if deploying into a region in Asia, the build process should pull from a Maven repository in Asia.

The solution is to add the following files to the root directory of Java Cloud Functions

.mvn/extensions.xml contents:

<extensions xmlns="http://maven.apache.org/EXTENSIONS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/EXTENSIONS/1.0.0 http://maven.apache.org/xsd/core-extensions-1.0.0.xsd">
  <extension>
    <groupId>com.github.gzm55.maven</groupId>
    <artifactId>project-settings-extension</artifactId>
    <version>0.1.1</version>
  </extension>
</extensions>

.mvn/settings.xml contents:

<settings>
  <mirrors>
    <mirror>
      <id>google-maven-central</id>
      <name>GCS Maven Central mirror Asia Pacific</name>
      <url>https://maven-central-asia.storage-download.googleapis.com/maven2/</url>
      <mirrorOf>central</mirrorOf>
    </mirror>
  </mirrors>
</settings>

To deploy the functions take the following steps:

  • Manually download the source zips
    • gsutil cp gs://quota-monitoring-solution-source/v4.2/quota-monitoring-solution-v4.2.zip
    • gsutil cp gs://quota-monitoring-solution-sourcev4.2/quota-monitoring-notification-v4.2.zip
  • Add the .mvn files to each zip
  • Upload the zips to a GCS bucket in the target project
  • Set the Terraform variables
    • source_code_bucket_name
    • source_code_zip
    • source_code_notification_zip
  • Run Terraform

Include hidden quotas

Hidden quotas are not accessible using cloud monitoring APIs. How can we make these quota metrics available on QMS?

Ability to select which GCP regions quotas should be scanned for

Customer would like to have a possibility to pre-select which GCP regions are being scanned for quotas. Currently they are using few out of 30+ regions and having the possibility to limit scanning only to selected regions (or regions detected as used) would save a lot of data and time needed for scanning. With that it would make it possible to run scanning more frequently

Support VPC SC

VPC Service Controls are often required for most users, so we should add more direct support for VPC SC.

Add organization id

Currently, QMS doesn't show the organization id. Adding an organization id to the data will help with data clarity.

Fox Dashboard query to accommodate reserved words in projectId

On file main.tf, in line 309, on query SELECT metric,usage,q_limit.... it is missing the backticks after the last FROM. So, it should be ... FROM ${var.project_id}.${google_bigquery_dataset.dataset.dataset_id}.${google_bigquery_table.default.table_id} GROUP BY...

If backticks are not used, the query fails for some project names. For instance, we had a project name containing the value "-to-" and this query was failing. After adding the `` it worked. In the rest of the code it is used as well.

Workflows aren't running

There is an error in the directory name storing the GitHub workflows which is causing them not to run.

Current directory name:
.github/workflow

Expected directory name:
`.github/workflows

Make alerts configurable by project

Users would like to configure alerts based on the project.

For example, project id 'xyz' send alert to '[email protected]'. The background of this requirement is that currently QMS is deployed centrally. If any team subscribes for alerting, they will receive alerts for all projects. And teams would like to receive alerts only for the projects that belongs to team.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.