Comments (3)
This is a know issue, and it's particularly tricky because the pure implementation on what was done here: #353 doesn't work anymore, due to how we handle partitioned limitation introduced by #360
Specifically for tables with more than 100 partitions as you noticed, there will be a CTA plus many batch inserts. Adding an accumulation of every run operations and then returned the final sum, is quite an effort therefore when we implemented this #375 we preferred simplicity over accurateness.
Could you please clarify why such feature will be relevant for you? What use cases do you have?
from dbt-athena.
The use case involves monitoring the AWS Athena cost of model population. We have a bunch of models defined in dbt, using AWS Athena for storage. These models can be populated automatically or manually from Dagster, which is the UI for model orchestration.
We conceived an idea to incorporate the AWS Athena cost (the number of bytes scanned during model population) into the model metadata within Dagster for each run. This addition could help us identify models that are problematic from a cost perspective.
Example how metadata could look in Dagster
{
"unique_id": "model.project.model1",
"invocation_id": "c8814bf2-e82a-412b-95b3-8df55b7b0bf1",
"exucution_type": "incremental",
"execution_duration_seconds": 1708,
"rows_affected": 313,
"total_data_scanned_mb": 122942,
"total_spent_usd": 0.59
}
... populated from dbt run_results.json
file and Dagster internal variables.
However, in the case of incremental models, for example, this approach is problematic in the current version. It will return only the last part of the population, which could be just a small portion of the real 'price'."
Does it make sense?
from dbt-athena.
Thanks make sense, it will be neat to have what you requested, not sure how much effort changes requires. As we are community based, we really rely a lot on the OSS contribution, therefore feel free to take a spin to it, and we can guide/review what you propose.
from dbt-athena.
Related Issues (20)
- [Bug] `truncate()` partition transformation does not work when it includes more than 100 partitions HOT 1
- Bug Hitting `ThrottlingException` on `GetWorkGroup` with threads turned up HOT 5
- [Bug] Iceberg table materialization shouldn't s3_data_naming=table
- [Bug] Adapter error when FIPS mode is enabled HOT 4
- [Bug] Resolution failure for `create_table_as` macro when upgrading to 1.7.2 HOT 1
- upgrade to support dbt-core v1.8.0 HOT 6
- [Feature] Control glue database/schema for tmp tables generated by incremental models HOT 1
- [Bug] force_batch deletes data from model_tmp_not_partitioned before coping to the final table HOT 2
- [Feature] Rename unique_key to unique_columns or merge_on_columns HOT 3
- [Feature] Support configurable management of Table Optimisers for Iceberg tables HOT 3
- [Bug] Error when Python Model Goes To Write To Database HOT 14
- [Feature] Custom strategy for incremental models when table type is iceberg
- [Bug] dbt source freshness expected a timestamp but received a string HOT 2
- [Feature] Athena dbt-external-tables impl as independent package HOT 5
- [Bug] Clone materialization raises an error when cloning Python models HOT 2
- TABLE_NOT_FOUND Error During Unit Testing in dbt-athena 1.8 Due to Jinja Macro Dependency HOT 3
- Hive vs Iceberg timestamps in unit tests HOT 4
- [Bug] TABLE_NOT_FOUND {{tmp_relation}} when there are zero batches to process in incremental model HOT 1
- [Feature] Allow to define a different schema for tmp tables created during table materialization
- [Lake Formation] Allow lf_tags_config.tags to set multiple values
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-athena.