Is this a new bug in dbt-athena? <li class="task

[Bug] adapter response return incorrect `data_scanned_in_bytes` when incremental model is running about dbt-athena HOT 3 OPEN

jvyoralek commented on June 12, 2024 1

[Bug] adapter response return incorrect `data_scanned_in_bytes` when incremental model is running

from dbt-athena.

Comments (3)

nicor88 commented on June 12, 2024

This is a know issue, and it's particularly tricky because the pure implementation on what was done here: #353 doesn't work anymore, due to how we handle partitioned limitation introduced by #360

Specifically for tables with more than 100 partitions as you noticed, there will be a CTA plus many batch inserts. Adding an accumulation of every run operations and then returned the final sum, is quite an effort therefore when we implemented this #375 we preferred simplicity over accurateness.

Could you please clarify why such feature will be relevant for you? What use cases do you have?

from dbt-athena.

jvyoralek commented on June 12, 2024

The use case involves monitoring the AWS Athena cost of model population. We have a bunch of models defined in dbt, using AWS Athena for storage. These models can be populated automatically or manually from Dagster, which is the UI for model orchestration.

We conceived an idea to incorporate the AWS Athena cost (the number of bytes scanned during model population) into the model metadata within Dagster for each run. This addition could help us identify models that are problematic from a cost perspective.

Example how metadata could look in Dagster

{ 
  "unique_id": "model.project.model1",
  "invocation_id": "c8814bf2-e82a-412b-95b3-8df55b7b0bf1",
  "exucution_type": "incremental",
  "execution_duration_seconds": 1708,
  "rows_affected": 313,
  "total_data_scanned_mb": 122942,
  "total_spent_usd": 0.59
}

... populated from dbt run_results.json file and Dagster internal variables.

However, in the case of incremental models, for example, this approach is problematic in the current version. It will return only the last part of the population, which could be just a small portion of the real 'price'."

Does it make sense?

from dbt-athena.

nicor88 commented on June 12, 2024

Thanks make sense, it will be neat to have what you requested, not sure how much effort changes requires. As we are community based, we really rely a lot on the OSS contribution, therefore feel free to take a spin to it, and we can guide/review what you propose.

from dbt-athena.

Recommend Projects

[Bug] adapter response return incorrect `data_scanned_in_bytes` when incremental model is running about dbt-athena HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent