Comments (28)
the fact that we haven't been able to reproduce it locally (by running just this test) may be indicative of test pollution, which could be checked by running only this test in CI too.
I just confirmed this over in #4786:
- running just
pytest yt/visualization/volume_rendering/tests/test_scene.py::test_annotations
passed in CI - expanding to run all the visualization tests,
pytest yt/visualization/
, passed in CI
from yt.
I gotta run now, thank you for your help so far!!
from yt.
Ok, we have an answer! and not pyarrow related after all.
So turns out when you import firefly, it modifies the default matplotlib figure.dpi
value:
import matplotlib as mpl
rcparams0 = mpl.rcParams.copy()
import firefly.data_reader
rcparams1 = mpl.rcParams.copy()
print(rcparams0['figure.dpi'])
print(rcparams1['figure.dpi'])
prints
100
120
In yt, Scene._show_mpl()
has a dpi=100
argument that is used to set the dimensions of the figure but is not explicitly passed to the matplotlib Figure
on instantiation. So after firefly gets imported, the default dpi changes and the yt figure changes size when saved. The fix on the yt side is to just pass along the dpi
argument to Figure
as well (which maybe we should have been doing all along). I'm not sure why this was only exposed now...
from yt.
Notes:
- the test still passes in minimal env tests
- the fact that we haven't been able to reproduce it locally (by running just this test) may be indicative of test pollution, which could be checked by running only this test in CI too.
from yt.
was trying to re-run the failed test with pytest --lf
as well to see if it'd fail on the second go, but i couldn't get it working in the github action...
might also be worth doing a run that randomizes test order?
from yt.
might also be worth doing a run that randomizes test order?
Ideally yes, though I suspect that would be very unstable if we did it now because pollution in our test is possibly high, and it's likely that some of them are only passing because of other tests' side effects (I've put a lid on this for a while ...).
from yt.
ok, well i've narrowed down the subset of tests that you need to run to get the failure: just running the tests in yt/data_objects/tests
and the failing test:
pytest --color=yes yt/data_objects/tests yt/visualization/volume_rendering/tests/test_scene.py::test_annotations
no indication of why yet. Also, when re-running the failed test, it does pass on second run.
from yt.
very close to an answer! I used the https://github.com/asottile/detect-test-pollution pytest plugin to find that yt/data_objects/tests/test_firefly.py::test_firefly_JSON_string
is the source of the issue. Running just these two tests you can reproduce the failure:
pytest --color=yes \
yt/data_objects/tests/test_firefly.py::test_firefly_JSON_string \
yt/visualization/volume_rendering/tests/test_scene.py
when reversing the order, tests pass no problem. Looking at test_firefly_JSON_string
, there's nothing obvious... but those firefly reader objects seem to do a lot with writing to disk and symlinks and default data directories so maybe there's something there?
from yt.
Great job ! I may be able to give a second read to those tests as early as tomorrow. Meanwhile: friendly ping @agurvich
from yt.
OK. I think i'm done for the day, but one extra note: I still get the failure when I change test_firefly_JSON_string
to not actually call writeToDisk
, so it's something in the initial creation of the firefly object in ad.create_firefly_object
(could be a yt thing, a firefly thing or a shared library between the two or something else very strange? really still no good idea...)
from yt.
but those firefly reader objects seem to do a lot with writing to disk and symlinks and default data directories so maybe there's something there?
don't talk about my trash like that! hahaha. I don't have a working yt
development environment set-up, but it sounds like I might need to sniff something out?
The situation with the symlinks is that the firefly data needs to be written out to disk in order to be read in by the browser (unless it's passed through a websocket using flask). However, because of browser security features, websites (or webapps) can't read files just anywhere on disk, they have to be subdirectories of the app itself. So when you're hosting this app locally on your computer, the data has to be visible from a path that's a subdirectory of the app files (which int he case of being installed via pip, is hidden in your site-packages
directory). My solution at the time was to let users specify where they wanted to write their data to and then symlink from there to Firefly's data directory. So what yt
should be doing here is creating a Firefly object, passing some array data to it, using the Firefly object's built-in writeToDisk
to write the data to a named temporary file, and then make a symlink from there to Firefly's data directory. I don't totally remember how I set up the unit test and I'm not sure what is actually failing here (is there a failing test or is there an incomprehensible error?). Where can I see the CI output (without building and running locally myself)?
Or will I have to build yt
?
from yt.
don't talk about my trash like that! hahaha.
haha, many apologies :D I don't actually think that the symlinks are the culprit, I was just grasping at straws cause I'm rather baffled by this failure.
Here's a quick summary of the issue:
- a test unrelated to firefly started failing recently (
yt/visualization/volume_rendering/tests/test_scene.py::test_annotations
). neither myself or @neutrinoceros have been able to reproduce this issue locally, it only fails in our CI pipeline on github with python 3.12. - we figured out it a test pollution issue: re-running the failed test by itself passes just fine
- I bisected down to the point where I can re-create the failure with just two tests: a yt test that creates a firefly object (
yt/data_objects/tests/test_firefly.py::test_firefly_JSON_string
) and ourtest_annotations
test. It only fails when the firefly related test is run before thetest_annotations
test, reversing the order passes. So some global state somewhere is somehow being changed? maybe??
here's an example of the failing test that reads back in an image that was written to disk.
from yt.
E DeprecationWarning:
E Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
E (to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
E but was not found to be installed on your system.
E If this would cause problems for you,
E please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
../../../miniconda3/envs/oss/lib/python3.12/site-packages/pandas/__init__.py:221: DeprecationWarning
==================================== short test summary info ====================================
FAILED yt/data_objects/tests/test_firefly.py::test_firefly_JSON_string - DeprecationWarning:
So here is what I get when I build locally, which seems consistent with ageller/Firefly#177 .
The solution there was to remove the dependency on pandas
. This was the objectively correct thing to do, using pandas to write a JSON
string was silly, but happened because we were already using pandas
elsewhere and knew it would work. It looks like what's going on is that pandas
is being imported in a different file (in this case, firefly/data_reader/reader.py
(where the "Firefly Object" aka Reader
class instance is defined). There, we use it to read a csv. We could swap to numpy
and use genfromtxt
or something, but I like how pandas
is able to load tabular data in a pretty generic/bulletproof way (at least, it doesn't die it just returns nonsense which depending on your philosophy is better/worse).
My hypothesis is that depending on the order/whether the test has been run is that Pyarrow
is being loaded into the environment after this test is run and perhaps it's not happening locally for either of you because you have Pyarrow
installed(?). I'll be honest I don't totally understand the concept of this deprecation warning. The current version of pandas
doesn't require Pyarrow
. If I ever want to upgrade to the next version, whatever software distribution system I'm using (conda or pip) will install Pyarrow
for me.
from yt.
@chrishavlin I was so close to finishing my message before you posted, haha! What do you think of my hypothesis? What is the best way to proceed?
from yt.
Ah... looking at the error you posted above, I would be very surprised if the Pyarrow
thing were the actual culprit,
E assert (614, 614, 4) == (512, 512, 4)
E At index 0 diff: 614 != 512
E Full diff:
E - (512, 512, 4)
E + (614, 614, 4)
/home/runner/work/yt/yt/yt/visualization/volume_rendering/tests/test_scene.py:104: AssertionError
=========================== short test summary info ============================
FAILED yt/visualization/volume_rendering/tests/test_scene.py::test_annotations - assert ([61](https://github.com/yt-project/yt/actions/runs/7661070299/job/20879720772?pr=4786#step:7:62)4, 614, 4) == (512, 512, 4)
At index 0 diff: 614 != 512
Full diff:
- (512, 512, 4)
+ (614, 614, 4)
========================= 1 failed, 2 passed in 6.[63](https://github.com/yt-project/yt/actions/runs/7661070299/job/20879720772?pr=4786#step:7:64)s ==========================
Error: Process completed with exit code 1.
How could something like this fail because of whether or not something is imported into the environment...? Maybe the Pyarrow
thing is just a red herring and a totally separate failure in my local environment.
from yt.
and it doesn't look like Pyarrow is being installed in the test run either (but I liked the idea :) )
from yt.
Not sure if it's progress, but I'm getting the same error locally now that I've installed Pyarrow
....
however running a second time does not make the test pass
from yt.
ooOOoo, I'd say it is progress. maybe pyarrow is being installed and I didn't search the logs properly
from yt.
and I just reproduced it as well after installing PyArrow!
from yt.
3rd time was not the charm, still getting the above assertion error in test_annotations when I do
pytest --color=yes\
yt/data_objects/tests/test_firefly.py::test_firefly_JSON_string\
yt/visualization/volume_rendering/tests/test_scene.py
seemingly no matter how many times I do it (which seems unlike the situation in the CI environment?).
from yt.
Hey!!! I'll call that progress!!!!
from yt.
Well, it passes if you change the order so that test_scene
runs first:
pytest --color=yes \
yt/visualization/volume_rendering/tests/test_scene.py \
yt/data_objects/tests/test_firefly.py::test_firefly_JSON_string
from yt.
Ahhh I see, yes, can confirm, that passed for me.
from yt.
NP. At this point I'm gonna leave it to you and @neutrinoceros now that you can get your hands dirty locally, but let me know if you want me in the trenches as well.
from yt.
Thank you both for working on this. There seem to be some confusion around pyarrow
, so let me try to clarify the situation:
- In yt's CI, we don't install pyarrow yet, but we promote warnings to errors
pandas
2.2.0 started emitting aDeprecationWarning
to warn thatpyarrow
was to become a dependency topandas
3.0- there are three ways to get around this warning:
- tell pytest to ignore it (this is what I did in #4784)
- install pyarrow (pandas assumes that you're fine with having to install pyarrow in the future if you already have it)
- downgrade pandas
- this actually has nothing to do with ageller/Firefly#177: what happened there was that Firefly was using private pandas API which was moved in pandas 2.2 . I fixed it by using public API instead. It so happens that said public API lives in the standard lib, but it wouldn't have been a problem to keep using pandas if it provided such public API.
So, I don't think pyarrow is necessary to reproduce locally, it just helps dodging pandas' deprecation warning, but I think downgrading pandas would have the same effect. To help reduce confusion I'd suggest merging #4784 😇
from yt.
and @agurvich thanks for you help yesterday!
from yt.
I think I found where this change happens:
https://github.com/agurvich/abg_python/blob/408f0daffb2f84130b4df096b72b77a5affdd733/src/abg_python/plot_utils.py#L26
abg_python is a dependency to Firefly, and it leaks Matplotlib configuration on import.
The reason we're finding out now is that the culprit commit is from Nov 2022, but there was no release of abc_python
between April 2022 and 3 days ago when I came knocking on Alex's door agurvich/abg_python#4
so, at the risk of being called names: ping @agurvich 🙈
from yt.
moving the discussion upstream as there's nothing left to be done on our side: agurvich/abg_python#5
from yt.
Related Issues (20)
- How to change the particle size in function "ParticlePhasePlot" HOT 2
- BUG: Multiple fields break sanitization HOT 2
- BUG: segault on manylinux2014 image HOT 5
- Deprecation warning in GDF (and maybe more?) HOT 3
- Try to project gas particles (SPH) to a mesh with octree structure HOT 2
- ImportError when compiling with gcc 14.1.1 and conda HOT 6
- DOC: docs builds are failing HOT 7
- Incorrect parameter sanitation to np.logspace HOT 2
- CPython 3.13 support (tracker issue)
- Question: How the weight field operates to an yt.create_profile average? HOT 6
- BUG: segfault on macOS (amr64) HOT 11
- Nose testing image comparison: label which image is which? HOT 3
- Editable Installations may be broken in conda environment HOT 7
- BUG: Segfault in smoothing length calculation on Mac HOT 9
- Tipsy Frontend BUG?: oddities with smoothing lengths/positions and bounding box HOT 8
- failing enzo answer tests HOT 6
- CI: test_geo_projection failing on main
- BLD/TST: building fails on `windows-latest` HOT 3
- BLD: can we remove dev-only extra targets ? HOT 3
- DEP: can we drop support for Python 3.9 now ? HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yt.