Comments (10)
Any update on this?
from databricks-sql-python.
- I think it does depend on the row count. I did a test where I executed the problematic query with a limit of 100/1.000/10.000 and it worked;
- cursor.fetchall() is empty itself;
- cursor.description contains info about the columns
- using the flag
use_cloud_fetch = false
seems to work. From what I know, it should work better with the cloud fetch on, given the amount of rows, correct?
from databricks-sql-python.
Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with
use_cloud_fetch = false
(+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!
Sounds good, we will in the meantime try and reduce the query results size, optimize it in order to continue development. I assume there's nothing wrong in, for now, disabling use_cloud_fetch
?
And lastly, I'm using Azure Workspace. That should have been on the opening line of the ticket, I do apologize :)
from databricks-sql-python.
Yes, you can just disable CloudFetch and go on. You'll still be able to get large results, just maybe less efficient than with CloudFetch enabled, that's it
from databricks-sql-python.
@andrefurlan-db thoughts?
from databricks-sql-python.
- if you run other queries - do you see the same behavior? Do you think it may depend on rows count?
- if you run the same query but limit rows count explicitly (say, to 10 rows or so) - does the behavior change?
- did you check if the result of
cursor.fetchall()
itself is empty, or you checked pandas dataframe and it's empty? - if
cursor.fetchall()
actually returns data but dataframe if empty - have you checked what's incursor.description
? - if you pass
use_cloud_fetch = false
toadb_sql.connect(
- does it change anything?
from databricks-sql-python.
Thank you @pimonteiro! Yes, CloudFetch should improve handling of very large results. I asked you to disable it to narrow down the scope of the issue. If you're able to get your data with use_cloud_fetch = false
(+ considering that with smaller results everything is fine) - then probably there's some bug in CloudFetch-related code. I need to poke around, and when I'll have other questions - I'll get back to you.
P.S. I see you provided library version and warehouse details - which is very nice, thank you! Can you please also tell us if you're using AWS or Azure workspace. Thanks!
from databricks-sql-python.
Hi, we're facing the same issue. Any update on this?
from databricks-sql-python.
@pimonteiro @akshay-s-ciq sorry, no much updates for now. I'm still trying to figure this out. However, recently we've got a very similar bug report but for Go driver. But we're still not sure what's going on.
Also, considering all the mentioned above, may I ask you to run same query but using a Nodejs connector? If any of you volunter to help - I can prepare a test project for you
from databricks-sql-python.
Related Issues (20)
- Proxy authentication not working
- Is it possible to insert an arrow table? HOT 2
- Insert performance is woeful! 😢 HOT 11
- cursor.execute returing error 'NoneType' object is not iterable HOT 15
- [Document] Support databricks-cli authentication HOT 1
- Dash/Minus in column name causes UNBOUND_SQL_PARAMETER in bind values HOT 2
- Extremely slow import times on Python 3.12 HOT 9
- [sqlalchemy] execute("select 1") gives TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType' HOT 8
- `databricks.sql.connect` hangs in a long retrying loop when an invalid access token is used HOT 3
- Idea: arrow_record_batches cursor method
- SQLAlchemy engine from workspace level service principle? HOT 2
- Unable to write list/array type data HOT 2
- Issue with version 3.1.1
- Failure on cursor.fetchall() HOT 2
- Fixing a couple type problems. (adding py.typed, typing connect, returning Any from fetchall (which I failed to fix!)) HOT 2
- support new Cursor attribute that provides information on completed commands HOT 1
- loosen, update, or widen pyarrow dependencies HOT 2
- ImportError: cannot import name 'sql' from partially initialized module 'databricks' HOT 2
- Unpin Thrift
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from databricks-sql-python.