Git Product home page Git Product logo

Comments (3)

nicor88 avatar nicor88 commented on June 12, 2024

This is a know issue, and it's particularly tricky because the pure implementation on what was done here: #353 doesn't work anymore, due to how we handle partitioned limitation introduced by #360

Specifically for tables with more than 100 partitions as you noticed, there will be a CTA plus many batch inserts. Adding an accumulation of every run operations and then returned the final sum, is quite an effort therefore when we implemented this #375 we preferred simplicity over accurateness.

Could you please clarify why such feature will be relevant for you? What use cases do you have?

from dbt-athena.

jvyoralek avatar jvyoralek commented on June 12, 2024

The use case involves monitoring the AWS Athena cost of model population. We have a bunch of models defined in dbt, using AWS Athena for storage. These models can be populated automatically or manually from Dagster, which is the UI for model orchestration.

We conceived an idea to incorporate the AWS Athena cost (the number of bytes scanned during model population) into the model metadata within Dagster for each run. This addition could help us identify models that are problematic from a cost perspective.

Example how metadata could look in Dagster

{ 
  "unique_id": "model.project.model1",
  "invocation_id": "c8814bf2-e82a-412b-95b3-8df55b7b0bf1",
  "exucution_type": "incremental",
  "execution_duration_seconds": 1708,
  "rows_affected": 313,
  "total_data_scanned_mb": 122942,
  "total_spent_usd": 0.59
}

... populated from dbt run_results.json file and Dagster internal variables.

However, in the case of incremental models, for example, this approach is problematic in the current version. It will return only the last part of the population, which could be just a small portion of the real 'price'."

Does it make sense?

from dbt-athena.

nicor88 avatar nicor88 commented on June 12, 2024

Thanks make sense, it will be neat to have what you requested, not sure how much effort changes requires. As we are community based, we really rely a lot on the OSS contribution, therefore feel free to take a spin to it, and we can guide/review what you propose.

from dbt-athena.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.