Comments (9)
Oh wow! Fantastic find. That would also explain some of the other instabilities we observed with Python.12. Yep. It will be easy to disable --coverage for Python 3.12 for us!
from databricks-sql-python.
Looks good so far - I have not seen side effects observed in main for Python 3.12 yet - after several builds. Enabling databricks now back apache/airflow#38207 and we will know for sure once we merge, because it has been failing almost always before.
from databricks-sql-python.
Analysis
After eluding me for several months I've finally snaked out the issue. The good news is that it's not an issue with databricks-sql-connector
. The bad news is the only workaround at present is to run your tests without the --cov
flag.
tl;dr There is a bug in the coverage
library when running on Python 3.12. nedbat/coveragepy#1665 To work around this, you have to remove the --cov
from your pytest invocation.
I created a minimum reproducible example.
- From a fresh virtual environment run
pip install databricks-sql-connector pytest pytest-cov
- Create a file called
test_file.py
and give it these contents:
# test_file.py
def test_downloader_import_time():
start = time.time()
from databricks.sql.cloudfetch import downloader
end = time.time()
difference = end - start
assert difference < 1, f"It took {difference} seconds to import the downloader"
- Invoke
pytest
without coverage enabled and the test will pass:
$ python -m pytest test_file.py
================ 1 passed in 0.07s ===============
- Invoice
pytest --cov
and the test will fail after about 200 seconds.
$ python -m pytest test_file.py --cov
FAILED test_file.py::test_downloader_import_time - AssertionError: It took 207.19054412841797 seconds to import the downloader
Workarounds
From what I can tell, there is no way to avoid this completely until coverage
is updated to support PEP-669, which is currently only experimentally supported. I can make the downloader
package import more quickly by pushing the TSparkArrowResultLink
import into a if typing.TYPE_CHECKING
conditional. But the connector simply cannot run without importing ttypes.py
eventually, so this only pushes the long import time further into the package initialization process.
cc: @benc-db
from databricks-sql-python.
PR here: apache/airflow#38194 - once I see it working I will likely be able to remove databricks provider exclusion for 3.12 :) . We are already using the version of coverage that should have PEP 669 support so I just enabled it for coverage tests and 🤞
from databricks-sql-python.
We are already using the version of coverage that should have PEP 669 support so I just enabled it for coverage tests and 🤞
Very curious to see how this goes for you. I tried to using a newer version of coverage for my reproduction above but didn't see any noticeable difference (perhaps 5% faster).
from databricks-sql-python.
I will let you know :)
from databricks-sql-python.
Generally speaking - what we saw in 3.12 builds WITH coverage enabled was that sometimes it took way longer to complete the tests (30%-50% slower) - but not always. That was really puzzling that I saw it in our canary
builds (main) and did not see it in regular PRs - and that was what puzzled me.
However, the coverage theory fits it perfectly - because our PRs are running usually a subset of tests, those that are relevant to the change coming in the PR, so we do not run coverage there - we only use coverage in the canary
(main) builds that are running full suite of tests.
It happened frequently enough (multiple times a day in canary builds) to see the result of that change rather quickly.
from databricks-sql-python.
so far,so good
from databricks-sql-python.
Eventually I just disabled coverage for Python 3.12. The tests with coverage on Python 3.12 took a long time and some of them even timed out inside coverage's sysmon.
from databricks-sql-python.
Related Issues (20)
- Obtain query execution time or query ID HOT 2
- Proxy authentication not working
- Is it possible to insert an arrow table? HOT 2
- Insert performance is woeful! 😢 HOT 11
- cursor.execute returing error 'NoneType' object is not iterable HOT 15
- [Document] Support databricks-cli authentication HOT 1
- Dash/Minus in column name causes UNBOUND_SQL_PARAMETER in bind values HOT 2
- [sqlalchemy] execute("select 1") gives TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType' HOT 8
- `databricks.sql.connect` hangs in a long retrying loop when an invalid access token is used HOT 3
- Idea: arrow_record_batches cursor method
- SQLAlchemy engine from workspace level service principle? HOT 2
- Unable to write list/array type data HOT 1
- Issue with version 3.1.1
- Failure on cursor.fetchall() HOT 2
- Fixing a couple type problems. (adding py.typed, typing connect, returning Any from fetchall (which I failed to fix!)) HOT 2
- Connector reads 0 rows although Cluster returned results HOT 9
- support new Cursor attribute that provides information on completed commands HOT 1
- loosen, update, or widen pyarrow dependencies HOT 2
- ImportError: cannot import name 'sql' from partially initialized module 'databricks' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from databricks-sql-python.