Git Product home page Git Product logo

crashlytics-logs-to-datadog's Introduction

Export Firebase Crashlytics BigQuery logs to Datadog using Dataflow

This document discusses how to export Firebase Crashlytics logs from BigQuery tables to Datadog.

Firebase Crashlytics is a lightweight, realtime crash reporter that helps you track, prioritize, and fix stability issues that erode your app quality. Crashlytics saves you troubleshooting time by intelligently grouping crashes and highlighting the circumstances that lead up to them.

Firebase Crashlytics provides BigQuery exports to enable further analysis in BigQuery, that allows combining your Cloud logging exports or your own first-party data. You can then use Data Studio dashboard to visualize this data.

Datadog is a log monitoring platform and Google Cloud partner that provides application and infrastructure monitoring service, it has native integration with Google Cloud.

This document is intended for a technical audience whose responsibilities include logs management or data analytics. This document assumes that you're familiar with Dataflow, have some familiarity with Bash scripts and basic knowledge of Google Cloud.

Architecture

Architecture Diagram

The batch Dataflow pipeline process the Crashlytics logs in BigQuery as follows:

  1. Read the BigQuery table (or partition)
  2. Transform the BigQuery TableRow into a JSON string, and incorporate into Datadog log entry format.
  3. The pipeline uses two optimizations
    • Batch log messages into 5Mb (or 1000 entries) batches to reduce the number of API calls
    • GZip the request to reduce network bandwidth

Objectives

  • Create a service account with limited access.
  • Create a Dataflow Flex template pipeline to send Crashlytics logs to Datadog using Send Log API
  • Verify Crashlytics imported all Crashlytics logs.

Costs

This tutorial uses billable components of Google Cloud, including the following:

Use the pricing calculator to generate a cost estimate based on your projected usage.

Before you begin

For this tutorial, you need a Google Cloud project. To make cleanup easiest at the end of the tutorial, we recommend that you create a new project for this tutorial.

  1. Create a Google Cloud project.

  2. Make sure that billing is enabled for your Google Cloud project.

  3. Open Cloud Shell.

    At the bottom of the Cloud Console, a Cloud Shell session opens and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.

  4. Enable APIs for Cloud DLP, Cloud KMS, Compute Engine, Cloud Storage, Dataflow, and BigQuery services:

    gcloud services enable \
    compute.googleapis.com \
    storage.googleapis.com \
    dataflow.googleapis.com \
    bigquery.googleapis.com \
    cloudbuild.googleapis.com
    

    Using browser Enable APIs

Setting up your environment

  1. In Cloud Shell, clone the source repository and go to the directory for this tutorial:

    git clone https://github.com/GoogleCloudPlatform/crashlytics-logs-to-datadog.git
    cd crashlytics-logs-to-datadog/
    
  2. Use a text editor to modify the set_environment.sh file to set the required environment variables:

    # The Google Cloud project to use for this tutorial
    export PROJECT_ID="<your-project-id>"
    
    # The Compute Engine region to use for running Dataflow jobs
    export REGION_ID="<compute-engine-region>"
    
    # define the GCS bucket to use for Dataflow templates and temporary location.
    export GCS_BUCKET="<name-of-the-bucket>"
    
    # Name of the service account to use (not the email address)
    export PIPELINE_SERVICE_ACCOUNT_NAME="<service-account-name-for-runner>"
    
    # The API Key created in Datadog for making API calls
    # https://app.datadoghq.com/account/settings#api
    export DATADOG_API_KEY="<your-datadog-api-key>"
    
  3. Run the script to set the environment variables:

    source set_environment.sh
    

Creating resources

The tutorial uses following resources:

  • A service account to run Dataflow pipelines, enabling fine-grained access control
  • A Cloud Storage bucket for temporary data storage and test data

Create service accounts

We recommend that you run pipelines with fine-grained access control to improve access partitioning, by provisioning the least permissions required for each service-account. If your project doesn't have a user-created service account, create one using following instructions.

  1. Create a service account to use as the user-managed controller service account for Dataflow:

    gcloud iam service-accounts create  "${PIPELINE_SERVICE_ACCOUNT_NAME}" \
    --project="${PROJECT_ID}" \
    --description="Service Account for Datadog export pipelines." \
    --display-name="Datadog logs exporter"
    
  2. Create a custom role with required permissions for accessing Cloud DLP, Dataflow, and Cloud KMS:

    export DATADOG_SENDER_ROLE_NAME="datadog_sender"
    
    gcloud iam roles create "${DATADOG_SENDER_ROLE_NAME}" \
    --project="${PROJECT_ID}" \
    --file=datadog_sender_permissions.yaml
    
  3. Apply the custom role to the service account:

    gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member="serviceAccount:${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
    --role="projects/${PROJECT_ID}/roles/${DATADOG_SENDER_ROLE_NAME}"
    
  4. Assign the dataflow.worker role to allow a Dataflow worker to run with the service-account credentials:

    gcloud projects add-iam-policy-binding "${PROJECT_ID}" \
    --member="serviceAccount:${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
    --role="roles/dataflow.worker"
    

Create the Cloud Storage bucket

Create a Cloud Storage bucket for storing test data and Dataflow staging location:

gsutil mb -p "${PROJECT_ID}" -l "${REGION_ID}" "gs://${GCS_BUCKET}"

Build and launch Dataflow pipeline

Test and compile the pipeline code.

./gradlew clean build shadowJar

The pipeline supports following options:

parameter Default value description
sourceBigQueryTableId - A Fully qualified BigQuery TableId in <projectId>:<datasetId>.<tableId>[:$<parition_date>]
bigQuerySqlQuery - The BigQuery SQL query results to send to Datadog.
shardCount 10 The number of parallel processes to send to Datadog. Using a higher number can result in overloading the Datadog API.
preserveNulls false Allow null values from BigQuery source to be serialized.
datadogApiKey - Provide the API key from Datadog console
datadogEndpoint https://http-intake.logs.datadoghq.com/v1/input Refer Datadog logging endpoints
datadogSource crashlytics-bigquery Refer to the Datadog log entry structure
you can customize these parameters to suit your needs
datadogTags user:crashlytics-pipeline Refer to the Datadog log entry structure
you can customize these parameters to suit your needs
datadogLogHostname crashlytics Refer to the Datadog log entry structure
you can customize these parameters to suit your needs

Use only one of sourceBigQueryTableId or bigQuerySqlQuery.

Define the Crashlytics exported BigQuery table name

export CRASHLYTICS_BIGQUERY_TABLE="<projectId>:<datasetId>.<tableId>"

Note: Make sure the serviceAccount has access to this BigQuery table.

You can directly launch the pipeline from the shell using following command:

bq_2_datadog_pipeline \
--project="${PROJECT_ID}" \
--region="${REGION_ID}" \
--runner="DataflowRunner" \
--serviceAccount="${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
--gcpTempLocation="gs://${GCS_BUCKET}/temp" \
--stagingLocation="gs://${GCS_BUCKET}/staging" \
--tempLocation="gs://${GCS_BUCKET}/bqtemp" \
--datadogApiKey="${DATADOG_API_KEY}" \
--sourceBigQueryTableId="${CRASHLYTICS_BIGQUERY_TABLE}"

You can monitor the Dataflow job on Cloud Console. The pipeline DAG looks as follows:

Pipeline DAG

Create Dataflow flex template

Dataflow templates allow you to use the Google Cloud Console, the gcloud command-line tool, or REST API calls to set up your pipelines on Google Cloud and run them. Classic templates are staged as execution graphs on Cloud Storage while Flex Templates bundle the pipeline as a container image in your project’s Container Registry. This allows you to decouple building and running pipelines, as well as integrate with orchestration systems for daily execution. You can learn more about differences between classis and flex templates.

To build a flex template, define the location to store template spec file containing all the necessary information to run the job:

export TEMPLATE_PATH="gs://${GCS_BUCKET}/dataflow/templates/bigquery-to-datadog.json"
export TEMPLATE_IMAGE="us.gcr.io/${PROJECT_ID}/dataflow/bigquery-to-datadog:latest"

Build Dataflow Flex template

gcloud dataflow flex-template build "${TEMPLATE_PATH}" \
--image-gcr-path="${TEMPLATE_IMAGE}" \
--sdk-language="JAVA" \
--flex-template-base-image=JAVA11 \
--metadata-file="bigquery-to-datadog-pipeline-metadata.json" \
--service-account-email="${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
--jar="build/libs/crashlytics-logs-to-datadog-all.jar" \
--env="FLEX_TEMPLATE_JAVA_MAIN_CLASS=\"com.google.cloud.solutions.bqtodatadog.BigQueryToDatadogPipeline\""    

Launch pipeline using flex-template

Launch the pipeline using the flex template created in the previous step.

gcloud dataflow flex-template run "bigquery-to-datadog-`date +%Y%m%d-%H%M%S`" \
--region "${REGION_ID}" \
--template-file-gcs-location "${TEMPLATE_PATH}" \
--service-account-email "${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
--parameters sourceBigQueryTableId="${CRASHLYTICS_BIGQUERY_TABLE}" \
--parameters datadogApiKey="${DATADOG_API_KEY}"

Verify logs in Datadog console

Visit Datadog logs viewer to verify the logs are available in Datadog.

Datadog screenshot

Cleaning up

To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, you can delete the project.

Deleting a project has the following consequences:

  • If you used an existing project, you'll also delete any other work that you've done in the project.
  • You can't reuse the project ID of a deleted project. If you created a custom project ID that you plan to use in the future, delete the resources inside the project instead. This ensures that URLs that use the project ID, such as an appspot.com URL, remain available.

To delete a project, do the following:

  1. In the Cloud Console, go to the Projects page.
  2. In the project list, select the project you want to delete and click Delete.
  3. In the dialog, type the project ID, and then click Shut down to delete the project.

What's next

crashlytics-logs-to-datadog's People

Contributors

anantdamle avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.