Comments (3)
This is expected behavior, as python models are integrated into the rest of your dbt project using SQL (for example, on an incremental model, the merge behavior is conducted in SQL), and that SQL would be executed on the AP Cluster. We are investigating ways for python model behavior to be more 'spark-like', but for now I would say this is an enhancement request, rather than a bug, as it is consistent with the structure imposed by dbt-core.
from dbt-databricks.
Thanks, Benc. It clears my doubts.
from dbt-databricks.
@benc-db Would it be possible to use a more simple approach when running a python model inside a job cluster like following:
- dbt creates a new notebook for the python model
- the new notebook is executed withing dbt using python command
dbutils.notebook.run("....")
(see Run a Databricks notebook from another notebook) inside a own process
I am not sure but it looks to me, that the strict seperation between execution (dbt python code) and the model execution (putting model into an isolated space) seems to be a bit oversized on Databricks job clusters, because the job will run nevertheless on spark on the master node. But maybe I am not getting the full picture of this issue...
from dbt-databricks.
Related Issues (20)
- Support for SQLAlchemy 2.X.X to be installed alongside dbt-databricks HOT 3
- Download artefacts from Databricks Job HOT 4
- OAuth token caching fails in Docker/K8S HOT 7
- SUPER SLOW dbt run -m model_name HOT 7
- run_query doesn't support array inside named_struct
- Materialized views & streaming tables don't support redefinition without dropping the full table HOT 2
- OAuth token caching fails on Windows 11 laptop HOT 1
- When using the Incremental Append Strategy, the newly added columns in SQL are not being added. HOT 2
- Databricks workflow with SQL Warehouse Pro crashes with http error 503 with version 1.7.4 HOT 21
- Sources ignore default catalog 1.7.4 previously was working HOT 4
- Respect profile schema when creating `__dbt_tmp` views HOT 3
- Running dbt-databricks on a job cluster
- Creating materialized views do not set table properties HOT 3
- Wrong command for dropping Materialized Views HOT 3
- When a column is removed, the incremental schema change ignore option fails to function and results in a failure. HOT 1
- [BUG] REDIRCT_URL mismatched with Databricks Cloud OAuth Application HOT 10
- DBT python model passing date variables with different data types HOT 3
- The tblproperties are not applied when using Python Model to create a table HOT 1
- Migrate to using decoupled new decoupled core/adapters architecture HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-databricks.