Export Firebase Crashlytics BigQuery logs to Datadog using Dataflow
This document discusses how to export Firebase Crashlytics logs from BigQuery tables to Datadog.
Firebase Crashlytics is a lightweight, realtime crash reporter that helps you track, prioritize, and fix stability issues that erode your app quality. Crashlytics saves you troubleshooting time by intelligently grouping crashes and highlighting the circumstances that lead up to them.
Firebase Crashlytics provides BigQuery exports to enable further analysis in BigQuery, that allows combining your Cloud logging exports or your own first-party data. You can then use Data Studio dashboard to visualize this data.
Datadog is a log monitoring platform and Google Cloud partner that provides application and infrastructure monitoring service, it has native integration with Google Cloud.
This document is intended for a technical audience whose responsibilities include logs management or data analytics. This document assumes that you're familiar with Dataflow, have some familiarity with Bash scripts and basic knowledge of Google Cloud.
Architecture
The batch Dataflow pipeline process the Crashlytics logs in BigQuery as follows:
- Read the BigQuery table (or partition)
- Transform the BigQuery TableRow into a JSON string, and incorporate into Datadog log entry format.
- The pipeline uses two optimizations
- Batch log messages into 5Mb (or 1000 entries) batches to reduce the number of API calls
- GZip the request to reduce network bandwidth
Objectives
- Create a service account with limited access.
- Create a Dataflow Flex template pipeline to send Crashlytics logs to Datadog using Send Log API
- Verify Crashlytics imported all Crashlytics logs.
Costs
This tutorial uses billable components of Google Cloud, including the following:
Use the pricing calculator to generate a cost estimate based on your projected usage.
Before you begin
For this tutorial, you need a Google Cloud project. To make cleanup easiest at the end of the tutorial, we recommend that you create a new project for this tutorial.
-
Make sure that billing is enabled for your Google Cloud project.
-
At the bottom of the Cloud Console, a Cloud Shell session opens and displays a command-line prompt. Cloud Shell is a shell environment with the Cloud SDK already installed, including the gcloud command-line tool, and with values already set for your current project. It can take a few seconds for the session to initialize.
-
Enable APIs for Cloud DLP, Cloud KMS, Compute Engine, Cloud Storage, Dataflow, and BigQuery services:
gcloud services enable \ compute.googleapis.com \ storage.googleapis.com \ dataflow.googleapis.com \ bigquery.googleapis.com \ cloudbuild.googleapis.com
Using browser Enable APIs
Setting up your environment
-
In Cloud Shell, clone the source repository and go to the directory for this tutorial:
git clone https://github.com/GoogleCloudPlatform/crashlytics-logs-to-datadog.git cd crashlytics-logs-to-datadog/
-
Use a text editor to modify the
set_environment.sh
file to set the required environment variables:# The Google Cloud project to use for this tutorial export PROJECT_ID="<your-project-id>" # The Compute Engine region to use for running Dataflow jobs export REGION_ID="<compute-engine-region>" # define the GCS bucket to use for Dataflow templates and temporary location. export GCS_BUCKET="<name-of-the-bucket>" # Name of the service account to use (not the email address) export PIPELINE_SERVICE_ACCOUNT_NAME="<service-account-name-for-runner>" # The API Key created in Datadog for making API calls # https://app.datadoghq.com/account/settings#api export DATADOG_API_KEY="<your-datadog-api-key>"
-
Run the script to set the environment variables:
source set_environment.sh
Creating resources
The tutorial uses following resources:
- A service account to run Dataflow pipelines, enabling fine-grained access control
- A Cloud Storage bucket for temporary data storage and test data
Create service accounts
We recommend that you run pipelines with fine-grained access control to improve access partitioning, by provisioning the least permissions required for each service-account. If your project doesn't have a user-created service account, create one using following instructions.
-
Create a service account to use as the user-managed controller service account for Dataflow:
gcloud iam service-accounts create "${PIPELINE_SERVICE_ACCOUNT_NAME}" \ --project="${PROJECT_ID}" \ --description="Service Account for Datadog export pipelines." \ --display-name="Datadog logs exporter"
-
Create a custom role with required permissions for accessing Cloud DLP, Dataflow, and Cloud KMS:
export DATADOG_SENDER_ROLE_NAME="datadog_sender" gcloud iam roles create "${DATADOG_SENDER_ROLE_NAME}" \ --project="${PROJECT_ID}" \ --file=datadog_sender_permissions.yaml
-
Apply the custom role to the service account:
gcloud projects add-iam-policy-binding "${PROJECT_ID}" \ --member="serviceAccount:${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \ --role="projects/${PROJECT_ID}/roles/${DATADOG_SENDER_ROLE_NAME}"
-
Assign the
dataflow.worker
role to allow a Dataflow worker to run with the service-account credentials:gcloud projects add-iam-policy-binding "${PROJECT_ID}" \ --member="serviceAccount:${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \ --role="roles/dataflow.worker"
Create the Cloud Storage bucket
Create a Cloud Storage bucket for storing test data and Dataflow staging location:
gsutil mb -p "${PROJECT_ID}" -l "${REGION_ID}" "gs://${GCS_BUCKET}"
Build and launch Dataflow pipeline
Test and compile the pipeline code.
./gradlew clean build shadowJar
The pipeline supports following options:
parameter | Default value | description |
---|---|---|
sourceBigQueryTableId |
- | A Fully qualified BigQuery TableId in <projectId>:<datasetId>.<tableId>[:$<parition_date>] |
bigQuerySqlQuery |
- | The BigQuery SQL query results to send to Datadog. |
shardCount |
10 | The number of parallel processes to send to Datadog. Using a higher number can result in overloading the Datadog API. |
preserveNulls |
false |
Allow null values from BigQuery source to be serialized. |
datadogApiKey |
- | Provide the API key from Datadog console |
datadogEndpoint |
https://http-intake.logs.datadoghq.com/v1/input | Refer Datadog logging endpoints |
datadogSource |
crashlytics-bigquery | Refer to the Datadog log entry structure you can customize these parameters to suit your needs |
datadogTags |
user:crashlytics-pipeline | Refer to the Datadog log entry structure you can customize these parameters to suit your needs |
datadogLogHostname |
crashlytics | Refer to the Datadog log entry structure you can customize these parameters to suit your needs |
Use only one of sourceBigQueryTableId
or bigQuerySqlQuery
.
Define the Crashlytics exported BigQuery table name
export CRASHLYTICS_BIGQUERY_TABLE="<projectId>:<datasetId>.<tableId>"
Note: Make sure the serviceAccount has access to this BigQuery table.
You can directly launch the pipeline from the shell using following command:
bq_2_datadog_pipeline \
--project="${PROJECT_ID}" \
--region="${REGION_ID}" \
--runner="DataflowRunner" \
--serviceAccount="${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
--gcpTempLocation="gs://${GCS_BUCKET}/temp" \
--stagingLocation="gs://${GCS_BUCKET}/staging" \
--tempLocation="gs://${GCS_BUCKET}/bqtemp" \
--datadogApiKey="${DATADOG_API_KEY}" \
--sourceBigQueryTableId="${CRASHLYTICS_BIGQUERY_TABLE}"
You can monitor the Dataflow job on Cloud Console. The pipeline DAG looks as follows:
Create Dataflow flex template
Dataflow templates allow you to use the Google Cloud Console, the gcloud command-line tool, or REST API calls to set up your pipelines on Google Cloud and run them. Classic templates are staged as execution graphs on Cloud Storage while Flex Templates bundle the pipeline as a container image in your project’s Container Registry. This allows you to decouple building and running pipelines, as well as integrate with orchestration systems for daily execution. You can learn more about differences between classis and flex templates.
To build a flex template, define the location to store template spec file containing all the necessary information to run the job:
export TEMPLATE_PATH="gs://${GCS_BUCKET}/dataflow/templates/bigquery-to-datadog.json"
export TEMPLATE_IMAGE="us.gcr.io/${PROJECT_ID}/dataflow/bigquery-to-datadog:latest"
Build Dataflow Flex template
gcloud dataflow flex-template build "${TEMPLATE_PATH}" \
--image-gcr-path="${TEMPLATE_IMAGE}" \
--sdk-language="JAVA" \
--flex-template-base-image=JAVA11 \
--metadata-file="bigquery-to-datadog-pipeline-metadata.json" \
--service-account-email="${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
--jar="build/libs/crashlytics-logs-to-datadog-all.jar" \
--env="FLEX_TEMPLATE_JAVA_MAIN_CLASS=\"com.google.cloud.solutions.bqtodatadog.BigQueryToDatadogPipeline\""
Launch pipeline using flex-template
Launch the pipeline using the flex template created in the previous step.
gcloud dataflow flex-template run "bigquery-to-datadog-`date +%Y%m%d-%H%M%S`" \
--region "${REGION_ID}" \
--template-file-gcs-location "${TEMPLATE_PATH}" \
--service-account-email "${PIPELINE_SERVICE_ACCOUNT_EMAIL}" \
--parameters sourceBigQueryTableId="${CRASHLYTICS_BIGQUERY_TABLE}" \
--parameters datadogApiKey="${DATADOG_API_KEY}"
Verify logs in Datadog console
Visit Datadog logs viewer to verify the logs are available in Datadog.
Cleaning up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, you can delete the project.
Deleting a project has the following consequences:
- If you used an existing project, you'll also delete any other work that you've done in the project.
- You can't reuse the project ID of a deleted project. If you created a custom project ID that you plan to use in the
future, delete the resources inside the project instead. This ensures that URLs that use the project ID, such as
an
appspot.com
URL, remain available.
To delete a project, do the following:
- In the Cloud Console, go to the Projects page.
- In the project list, select the project you want to delete and click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
What's next
- Learn more about Firebase Crashlytics
- Learn how can Cloud Logging and Cloud Monitoring help solve your needs.
- Learn more about AI on Google Cloud.
- Learn more about Cloud developer tools.
- Learn more about deploying Datadog on Google Cloud
- Try out other Google Cloud features for yourself. Have a look at our tutorials.