Git Product home page Git Product logo

argon's Introduction

Argon

DEPRECATED: Offline Reports to BigQuery export is now integrated into product. You can read more about this in the documentation for CM360 and DV360.


Please note: this is not an officially supported Google product.

This middleware automates the import of both Campaign Manager 360 (CM360) and Display & Video 360 (DV360) Offline Reporting files into BigQuery. This allows you to maintain a robust long-term view of your reporting data. It can be deployed to Cloud Functions, Cloud Run, or any other Cloud Provider that supports Docker or Serverless Functions. You can trigger jobs by issuing POST calls with configured JSON, which allows for use with tools like Cloud Scheduler. Argon always checks schemas, uploads all values as string type, and appends a File ID column to track ingestions.

Setup

Google Cloud Project

  • Use a Google Cloud project, where you are the Owner.

  • Create a BigQuery dataset - tables will be created automatically per report, and appended to for every new report file.

  • Create a new IAM Service Account (Eg. [email protected]), and grant these roles:

    • BigQuery Admin (bigquery.admin)
    • Cloud Scheduler Job Runner (cloudscheduler.jobRunner)
    • For Google Cloud Functions:
      • Cloud Functions Invoker (cloudfunctions.invoker)
    • For Google Cloud Run:
      • Cloud Run Invoker (run.invoker)
  • Add your own account as a principal for the new Service Account, and grant these roles:

    • Service Account User (iam.serviceAccountUser)
    • Service Account Token Creator (iam.serviceAccountTokenCreator)
  • Enable the necessary APIs in API Explorer, or via gcloud services enable :

    • DV: DoubleClick Bid Manager API (doubleclickbidmanager.googleapis.com)
    • CM: DCM/DFA Reporting And Trafficking API (dfareporting.googleapis.com)
    • Cloud Build API (cloudbuild.googleapis.com)
    • For Google Cloud Functions:
      • Cloud Functions API (cloudfunctions.googleapis.com)
    • For Google Cloud Run:
      • Cloud Run Admin API (run.googleapis.com)
      • Artifact Registry API (artifactregistry.googleapis.com)
  • Deploy argon to:

  • Note down your deployed Endpoint's URL.

Google Marketing Platform

Accounts

  • Ensure that the CM Account has the following Permissions:

    • Properties > Enable account for API access
    • Reporting > View all generated reports
  • Create a CM/ DV User Profile with the Service Account's email address with the respective role:

    • DV: Reporting only
    • CM: Advanced Agency Admin, with permissions:
      • View all generated files
      • View all saved reports

Report

Warning: Argon does not support pre-existing reports, as they can cause hard-to-debug issues. Kindly create a new report as detailed below, and do not change the Dimension/Metrics/Events selections once Argon has started ingesting files. Always create a new Report, if you want to change the report template. All columns are string type, and date-like fields will be transformed to suit BQ date parsing. Argon will also append an additional column (file_id), to keep track of ingested files. If you change the schema in Bigquery, Argon's schema check will fail.

  • Choose the necessary report template in "Offline Reporting".

  • Choose the "CSV" File type.

  • Select the required Dimensions, Metrics, and Rich Media Events.

  • Add the service account's email address to the "Share with > +add people", and use the "Link" option.

  • If you want historical data to be backfilled initially, select the appropriate backfill Date Range with "Custom".

  • If this range is significant, break it up into much smaller chunks, otherwise ingestion timeouts will result in partial uploads.

  • Save and run the report, for each chunk, if necessary.

  • Now, edit the report again, and select a Date Range of "Yesterday".

  • Activate the Schedule for repeats "Daily" every "1 day" and choose a far-off in the future "Expiry" date.

  • Save (and do not run) the report.

Google Cloud Scheduler

  • Create a Scheduler Job with:

    • Frequency: 0 */12 * * * (repeating every 12 hours)

    • Target type: HTTP

    • URL: Cloud Function URL

    • HTTP Method: POST

    • Auth header: Add OIDC token

    • Service account: Previously created Service Account

    • Audience: Deployed Argon URL

    • Body:

      {
        "product": "[PRODUCT]", // required: CM or DV
        "reportId": [REPORT_ID],
        "profileId": [PROFILE_ID], // required: for CM reports
        "datasetName": "[DATASET_NAME]",
        "projectId": "[BIGQUERY_PROJECT]", // default: current cloud project
        "single": [SINGLE_FILE_MODE], // default: true
        "ignore": [IGNORE_FILE_IDS], // default: []
        "newest": [ORDERING_MODE], // default: false
        "replace": [REPLACE_TABLE_MODE], // default: false, append only
        "email": "[EMAIL_ADDRESS]" // default: no impersonation
      }
  • Notes:

    • Use projectId if the output BigQuery dataset lives outside the currently deployed cloud project.

    • Set single to false, to process more than one file per run. Beware with files that are multiple GBs large, the Cloud Function will timeout after 540s. This will result in partial ingestion or corrupted data.

    • Set ignore to a list of Report File IDs, to skip wrongly generated or unnecessary report files.

    • Set newest to true, to order report files by most recent first, instead of ordering by oldest first.

    • Set replace to true, to replace the BigQuery table on running, instead of appending to it.

    • Set email to a Service Account email address, to impersonate it for local development or testing purposes.

  • Save the job and run once to ingest any initially generated historical data files. Alternatively, you can run Argon on your local machine to ingest larger files how-to.

  • If it fails, check the logs for error messages and ensure all the above steps have been appropriately followed, with the correct permissions.

  • Moving forward, Cloud Scheduler will trigger Argon for regular ingestion.

  • Argon will always attempt to ingest the oldest file that is not present in the BigQuery table and not ignored in your config body.

  • Warning: All failed file ingestions will be logged. You will need to manually drop rows with the corresponding File IDs, to force Argon to try and re-ingest them on future runs. Or use ignore to skip them.

Commands

Install the following on your local machine:

Alternatively, you can use the Google Cloud Shell which comes with all of these tools pre-installed.

# Clone the source code
git clone https://github.com/google/argon.git
cd argon

# Install dependencies
npm install

# Authenticate with GCP
gcloud auth login

# Build from source, outputs to ./dist/
npm run build

Deploy to Google Cloud Platform

Using local source:

# Deploy to Cloud Functions
gcloud functions deploy argon \
  --trigger-http \
  --source ./dist/ \
  --runtime nodejs16 \
  --memory 512M \
  --timeout 540s \
  --service-account "[email protected]"

# Deploy to Cloud Run
gcloud run deploy argon \
    --source ./dist/ \
    --memory 512M \
    --timeout 3600s \
    --service-account "[email protected]"

Using pre-built Docker image:

# Choose your GCP image destination
GCP_CONTAINER_URL="gcr.io/PROJECT-ID/argon:latest"
# OR
GCP_CONTAINER_URL="LOCATION-docker.pkg.dev/PROJECT-ID/argon/argon:latest"

# Pull pre-built image from GitHub Container Registry
docker pull ghcr.io/google/argon:latest

# Tag image locally
docker tag ghcr.io/google/argon:latest $GCP_CONTAINER_URL

# Push image to GCP
docker push $GCP_CONTAINER_URL

# Deploy to Cloud Run
gcloud run deploy argon \
    --image $GCP_CONTAINER_URL
    --memory 512M \
    --timeout 3600s \
    --service-account "[email protected]"

Ingest large report files

# Run a local server, default PORT=8080
npm run watch

# Send a POST from a separate shell terminal
# or using any REST API client
# config.json contains your desired Argon config
curl \
  -H "Content-Type: application/json" \
  --data @config.json \
  localhost:8080

Local Development

# Lint your changes against the guidelines
npm run lint

# Apply formatting rules to your changes
npm run format

Docker

Argon can be containerized using Pack.

# Build & Run a Docker image, from your local source

pack build argon \
  --path ./dist/ \
  --builder gcr.io/buildpacks/builder:v1 \
  --env GOOGLE_FUNCTION_SIGNATURE_TYPE=http \
  --env GOOGLE_FUNCTION_TARGET=argon

# Run a local server, default PORT=8080
docker run --rm -p 8080:8080 argon

argon's People

Contributors

achembarpu avatar ceoloide avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

argon's Issues

Error caused by Gaxios

From time to time (≈ 2 time a week) I get an error in my logs, that GaxiosService is unavailable.
As Argon is running multiple times a day, my CM reports get imported eventually.
Is there something I can do to prevent this error or is this simply cause by some 3rd party services which ar not reachable from time to time?

image

500 error on API connection

Long time watcher, first time caller!

This project would ease my workload a lot setting up our internal reporting, so hoping to get some pointers for getting it working. All help is appreciated :)

I've tried to set up Argon based on the setup guide, but are hitting a wall when running the function (500 error on API). My goal is to set up some automated DV360 reports to BQ, for use in Data Studio.

Things I've done:

  • I've created the service account, set it up with our DV360 account (reporting only), created a report, and shared it with the service account (link).
  • Enabled DBM and DCM/DFA API's in the GCP project (The API console reports requests to DBM API)
  • Set up the dataset
  • Deployed the cloud function without errors.
  • Set up at scheduler based on the instructions.
  • Tried to trigger the scheduler, but it failed.
  • Tried to repeat everything a couple of times.

The result I get
So I went into the testing tab on the cloud function, tried to use the same triggering event to trace what could be wrong. The code seems to go through the steps ok until it hits the API client where it gets a 500 error. See the attached images.

What could I be missing?

Skjermbilde 2020-03-26 kl  16 08 12
Skjermbilde 2020-03-26 kl  16 03 18

Reach Reports

The POST works and a table is created with the correct header/files but there is no data inserted into the table.

Have tried this on a few Reach reports with the same issue and confirmed there is indeed data with both single and multiple file flag enabled.

Cannot read property 'isValid' of undefined

There's a report file (in csv) available, however the data.items is always empty (see it after putting some logging in place), and it throws this error.
Cannot read property 'isValid' of undefined.
It happens in the very first condition of for loop.

for (const item of data.items) { if (item.status.match(REPORT_AVAIL_PATTERN)) { const reportDate = item.dateRange.endDate; if (reportDate in lookbackDates) { reportFiles[item.id] = item; delete lookbackDates[reportDate]; // remove found dates } latestDate = DateTime.fromFormat(reportDate, DATE_FORMAT); } }

What are we missing here?

missing node in script part of package.json

Hi,
I had to add node in the script part, otherwise after deploying, it App Engine will always say
/bin/sh: 1: exec: app.js: not found
So the final working package.json looked like below.

{
  "name": "argon",
  "version": "0.0.1",
  "description": "DCM Reporting to BigQuery connector",
  "main": "app.js",
  "engines": {
    "node": ">=10"
  },
  "scripts": {
    "start": "node app.js",
    "start-dev": "nodemon app.js",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Google Inc.",
  "license": "Apache-2.0",
  "dependencies": {
    "@google-cloud/bigquery": "^2.0.6",
    "google-auth-library": "^3.0.1",
    "hapi": "^18.0.0",
    "luxon": "^1.10.0"
  },
  "devDependencies": {
    "eslint": "^5.12.1",
    "eslint-config-google": "^0.11.0",
    "nodemon": "^1.18.9"
  }
}

Unable to deploy in Cloud Function due to missing required package

I have been trying to redeploy for the last 3 days, without success. The Cloud Function Logs shows this:

Provided module can't be loaded.
Did you list all required modules in the package.json dependencies?
Detailed stack trace: Error: Cannot find module 'split2'
Require stack:
 - /workspace/argon.js
 - /workspace/index.js
 - /layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/loader.js
 - /layers/google.nodejs.functions-framework/functions-framework/node_modules/@google-cloud/functions-framework/build/src/main.js
at Function.Module._resolveFilename (internal/modules/cjs/loader.js:815:15)
at Function.Module._load (internal/modules/cjs/loader.js:667:27)
at Module.require (internal/modules/cjs/loader.js:887:19)
at require (internal/modules/cjs/helpers.js:74:18)
at Object.<anonymous> (/workspace/argon.js:19:15)
at Module._compile (internal/modules/cjs/loader.js:999:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1027:10)
at Module.load (internal/modules/cjs/loader.js:863:32)
at Function.Module._load (internal/modules/cjs/loader.js:708:14)
at Module.require (internal/modules/cjs/loader.js:887:19)
Could not load the function, shutting down.

Is the first time I see this issue. I have successfully deploy Argon in the past. The source of the issue seems to be 'split2', imported in https://github.com/google/argon/blob/104181130a3f12a0de2bf5e82935d0419a847992/package.json#L28

Cloud Functions

Hi All,

Before deploying the function, what should the index.js file contain?
Here is what I indicated for the package.json :

  "name": "argon",
  "version": "0.0.1",
  "description": "DCM Reporting to BigQuery connector",
  "main": "app.js",
  "engines": {
    "node": ">=10"
  },
  "scripts": {
    "start": "node app.js",
    "start-dev": "nodemon app.js",
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Google Inc.",
  "license": "Apache-2.0",
  "dependencies": {
    "@google-cloud/bigquery": "^2.0.6",
    "google-auth-library": "^3.0.1",
    "hapi": "^18.0.0",
    "luxon": "^1.10.0"
  },
  "devDependencies": {
    "eslint": "^5.12.1",
    "eslint-config-google": "^0.11.0",
    "nodemon": "^1.18.9"
  }
}
but I don't know what to write in the index.js
Do you have an idea?

Thanks a lot !

ERROR: Provide DCM Profile ID and Account Email ID.

I have deployed the app, however, calls to the service with cloud scheduler keep on failing with the following error.
ERROR: Provide DCM Profile ID and Account Email ID.
I put some logging in place to see what payload is actually received in App Engine, that shows null (mind you I do send the request with body as mentioned in the instructions).
I am pretty certain that all the requirements for this app to run are in place. Has anybody else got this app to work?

Matching data between CM and DV

Hi,
Thanks for the sharing !

Do you know which SQL query would work in order to match CM and DV data ?
Taking for example that the Placement Name CM and the Line Item DV would be the same so that the data match.
Thanks

Bidmanager API v1 will get deprecated by April 15th - are there plans to update Argon?

Hi there,

As Argon is currently still using V1 of the API which will be replaced by 1.1. on April 15th 2021, will Argon get updated?
Please see the message we received below.

Thank you! :)


As previously announced, we will be sunsetting these deprecated services of the API on April 15, 2021. Requests to these services will no longer work after this date. To avoid an interruption in service, you must migrate to either a newer version of the DBM API or the Display & Video 360 (DV360) API, depending on the services you currently use.

If you are using the DBM API version 1 Reporting service, you must migrate to the DBM API version 1.1 Reporting service. If you are using any version of the DBM API SDF Download or Line Item service, you must migrate to the DV360 API.
To learn about changes between versions and get tips for migrating, visit the API developer site and read the Google Ads Developer Blog post about this upcoming sunset. Also consider subscribing to the Developer Blog to stay up to date about new releases, upcoming sunsets, and changes to the DBM API or DV360 API.
If you have technical questions regarding new versions of the API, please reach out via the support contact form.
Sincerely,
The DoubleClick Bid Manager API Team

Reach Reports

I've seen there was an issue related to this but was closed as no answer was given and I'd like to re open the discussion cause I'm having the same problem.

_"The POST works and a table is created with the correct header/files but there is no data inserted into the table.

Have tried this on a few Reach reports with the same issue and confirmed there is indeed data with both single and multiple file flag enabled."_

I can confirm the service account has access to the report and I can see the File ID's being processed in the logs. I guess the problem has to do with the last couple lines of the reports where it says "Grand Total" and the 2 following lines, which makes the function "crash" when it compares the schema to the one in BQ and therefore it doesn't ingests any data.

Please see attached a screenshot of the reach reports last lines and the error in the CF logs.

Captura de Pantalla 2022-01-12 a la(s) 11 10 29

Captura de Pantalla 2022-01-12 a la(s) 11 23 16

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.