Comments (10)
The pin was added in:
To fix the issue described in:
...but that just avoids the problem whilst causing another problem; this library can't be used with the latest pandas
:/
from databricks-sql-python.
I'm opening this issue to track any progress towards compatibility with the latest pandas
version.
from databricks-sql-python.
Bump! I would like to upgrade to the latest version but am stuck on 3.0.1 because of this pin 😔
from databricks-sql-python.
Does 3.0.1 work with latest pandas? That would be an interesting data point.
from databricks-sql-python.
Does 3.0.1 work with latest pandas? That would be an interesting data point.
I've been using 3.0.1 in combination with pandas
2.2.2 with no issues:
❯ pip list | rg 'pandas|databricks'
databricks-connect 14.3.1
databricks-sdk 0.20.0
databricks-sql-connector 3.0.1
pandas 2.2.2
...but that's apparently because I don't query all int
data sources.
Running:
with engine.connect() as conn:
res = conn.execute(sa.text("select 1")).scalar_one()
gives:
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
from databricks-sql-python.
It seems like it doesn't like assigning a None into an integer array:
> /opt/python/envs/dev310/lib/python3.10/site-packages/pandas/core/internals/managers.py(1703)as_array()
1701 pass
1702 else:
-> 1703 arr[isna(arr)] = na_value
1704
1705 return arr.transpose()
ipdb> arr
array([[1]], dtype=int32)
ipdb> isna(arr)
array([[False]])
ipdb> na_value
ipdb> na_value is None
True
If we go up the stack we can see we get type errors if we try to assign anything other than an integer:
> /opt/python/envs/dev310/lib/python3.10/site-packages/databricks/sql/client.py(1149)_convert_arrow_table()
1147 )
1148
-> 1149 res = df.to_numpy(na_value=None)
1150 return [ResultRow(*v) for v in res]
1151
ipdb> df.to_numpy(na_value=None)
*** TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'
ipdb> df.to_numpy(na_value=float('NaN'))
*** ValueError: cannot convert float NaN to integer
ipdb> df.to_numpy(na_value=-99)
array([[1]], dtype=int32)
Casting to object
before assigning does seem to work:
ipdb> df.astype(object).to_numpy(na_value=None)
array([[1]], dtype=object)
from databricks-sql-python.
The problematic function:
databricks-sql-python/src/databricks/sql/client.py
Lines 1130 to 1166 in a6e9b11
from databricks-sql-python.
I can work around the issue by disabling pandas:
with engine.connect() as conn:
cursor = conn.connection.cursor()
cursor.connection.disable_pandas = True
res = cursor.execute("select 1").fetchall()
>>> res
[Row(1=1)]
...but obviously the casting to numpy needs to be fixed.
from databricks-sql-python.
Probably casting to object before assigning a None
value is the right fix.
from databricks-sql-python.
I second this. I cannot use pd.read_sql_query()
because of this requirement.
Also, it would be good if you delete the distutils dependency
from databricks-sql-python.
Related Issues (20)
- Is it possible to insert an arrow table? HOT 2
- Insert performance is woeful! 😢 HOT 11
- cursor.execute returing error 'NoneType' object is not iterable HOT 15
- [Document] Support databricks-cli authentication HOT 1
- Dash/Minus in column name causes UNBOUND_SQL_PARAMETER in bind values HOT 2
- Extremely slow import times on Python 3.12 HOT 9
- [sqlalchemy] execute("select 1") gives TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType' HOT 8
- `databricks.sql.connect` hangs in a long retrying loop when an invalid access token is used HOT 3
- Idea: arrow_record_batches cursor method
- SQLAlchemy engine from workspace level service principle? HOT 2
- Unable to write list/array type data HOT 2
- Issue with version 3.1.1
- Failure on cursor.fetchall() HOT 2
- Fixing a couple type problems. (adding py.typed, typing connect, returning Any from fetchall (which I failed to fix!)) HOT 2
- Connector reads 0 rows although Cluster returned results HOT 10
- support new Cursor attribute that provides information on completed commands HOT 1
- loosen, update, or widen pyarrow dependencies HOT 2
- ImportError: cannot import name 'sql' from partially initialized module 'databricks' HOT 2
- Unpin Thrift
- Original thrift file HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from databricks-sql-python.