Comments (14)
@Avinash-1394 do you think this could be the issue? https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-troubleshooting-tables.html#notebooks-spark-troubleshooting-tables-illegal-argument-exception
my guess is that the database was created initially by a Athena SQL model, which didn’t set the location field at the database level. Then the user tries to write to the same database with a python model and it throws the error.
from dbt-athena.
@cuff-links Let me get this right, without having a location in the database this won't work? Also I really recommend to have database creation outside dbt, this apply to whatever system you use. I'm not sure then that we need to fix the issue here.
from dbt-athena.
@Avinash-1394 maybe you have some clue here? since you implemented the python models feature.
from dbt-athena.
I think the get_spark_df
function needs to be slightly modified for sources. dbt default does use the same logic for both sources and ref but somehow that does not seem to be valid here.
I will try and recreate this issue.
from dbt-athena.
@iamrobo That is does make sense. I didn't test the source feature when I added python models so I will have to recreate this to fully understand what is happening. I just assumed it will be very similar to ref but looks like there are some differences I didn't foresee.
from dbt-athena.
@Avinash-1394 do you think this could be the issue? https://docs.aws.amazon.com/athena/latest/ug/notebooks-spark-troubleshooting-tables.html#notebooks-spark-troubleshooting-tables-illegal-argument-exception
my guess is that the database was created initially by a Athena SQL model, which didn’t set the location field at the database level. Then the user tries to write to the same database with a python model and it throws the error.
This is correct! I went in and modified the table directly and added a location and it did work.
from dbt-athena.
@cuff-links Glad you found a resolution so quickly. Do you mind providing some steps you took to fix so that it is documented in the GH issue itself? I will let @nicor88 decide if the issue should be closed or kept open.
from dbt-athena.
@Avinash-1394 if we don't have any action need in term of code changes we can close it.
To me seems that the issue is being fixed, and there are no changes in the adapter to be made. If this is correct we can close it.
from dbt-athena.
@cuff-links Glad you found a resolution so quickly. Do you mind providing some steps you took to fix so that it is documented in the GH issue itself? I will let @nicor88 decide if the issue should be closed or kept open.
@nicor88 I would say that it's more of a workaround. The issue still persists. I was following the dialog here and saw that there were comments made about the location field on the database. It was empty so I filled it out in AWS and reran the model and it was able to work. But I think there should still be a fix since this would take manual intervention if the same situation happens. I think the models should be able to be used interchangeably.
from dbt-athena.
@nicor88, would it make sense to add the location to the database creation in the SQL materialization macros, or to add the location if it doesn't already exists in the python materialization macros, or just throw a better exception?
from dbt-athena.
I believe that we should throw a better error, and then we need to add to the documentation somewhere (in the readme) a section about python models, where we need to specify that the location must be set.
I'm unsure now that we should add the location in the SQL macro materialization. If the db is created from scratch could make sense, but if the db already exists we shouldn't touch it.
@cuff-links in your case the source db was already existing right?
from dbt-athena.
I believe that we should throw a better error, and then we need to add to the documentation somewhere (in the readme) a section about python models, where we need to specify that the location must be set.
I'm unsure now that we should add the location in the SQL macro materialization. If the db is created from scratch could make sense, but if the db already exists we shouldn't touch it.
@cuff-links in your case the source db was already existing right?
Yes, that is correct. I agree that a better error and some documentation would be good for this.
from dbt-athena.
Should this still be open?
from dbt-athena.
@cuff-links yes, any of the actions about were not taken.
Fell free to pick this up, at least the documentation part.
from dbt-athena.
Related Issues (20)
- [Bug] `truncate()` partition transformation does not work when it includes more than 100 partitions HOT 1
- Bug Hitting `ThrottlingException` on `GetWorkGroup` with threads turned up HOT 5
- [Bug] Iceberg table materialization shouldn't s3_data_naming=table
- [Bug] Adapter error when FIPS mode is enabled HOT 4
- [Bug] Resolution failure for `create_table_as` macro when upgrading to 1.7.2 HOT 1
- upgrade to support dbt-core v1.8.0 HOT 6
- [Feature] Control glue database/schema for tmp tables generated by incremental models HOT 1
- [Bug] force_batch deletes data from model_tmp_not_partitioned before coping to the final table HOT 2
- [Feature] Rename unique_key to unique_columns or merge_on_columns HOT 3
- [Feature] Support configurable management of Table Optimisers for Iceberg tables HOT 3
- [Feature] Custom strategy for incremental models when table type is iceberg
- [Bug] dbt source freshness expected a timestamp but received a string HOT 2
- [Feature] Athena dbt-external-tables impl as independent package HOT 5
- [Bug] Clone materialization raises an error when cloning Python models HOT 2
- TABLE_NOT_FOUND Error During Unit Testing in dbt-athena 1.8 Due to Jinja Macro Dependency HOT 3
- Hive vs Iceberg timestamps in unit tests HOT 4
- [Bug] TABLE_NOT_FOUND {{tmp_relation}} when there are zero batches to process in incremental model HOT 1
- [Feature] Allow to define a different schema for tmp tables created during table materialization
- [Lake Formation] Allow lf_tags_config.tags to set multiple values
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dbt-athena.