aazuspan / eerepr Goto Github PK
View Code? Open in Web Editor NEWInteractive Code Editor-style reprs for Earth Engine objects in a Jupyter notebook
License: MIT License
Interactive Code Editor-style reprs for Earth Engine objects in a Jupyter notebook
License: MIT License
Python 3.7 support was added for Colab (#15), but Colab now supports modern Python versions so we can drop that.
At the same time, we can simplify the packaging by dropping the redundant setup.py
and setup.cfg
for a pure pyproject.toml
solution.
HTML reprs get big fast and can bloat notebook sizes. I tried minifying in #10 and found it wasn't worth the processing time, but there were some free optimizations that could have comparable effects.
Try shortening CSS class names and simplifying frequently repeated HTML elements to drop repr sizes.
Benchmark:
obj = ee.ImageCollection("COPERNICUS/S2_SR").limit(1000)
rep = obj._repr_html_()
mb = len(rep) / 1e6
mb
# 68.84692
Currently reprs are tested against data pulled from EE. The upside is that ensures tests will fail if there's a breaking change server-side, but it also makes for very slow tests and requires internet and authentication to run tests.
I think breaking changes from EE are unlikely enough that I should just test against local JSON data instead and add a script to generate test data from Earth Engine.
Add a demo notebook and links to run in Binder/Colab.
First off, this is a much needed module that will keep many of us from jumping back and forth between Python and JS for the ease of object inspection. Thank you for your efforts.
I am currently working out of a Jupyter Lab setup built off a modified Docker image. The base structure is complicated enough that I don't want to tinker with it much, but my issue is that the image has python 3.7.1. The eerepr setup.cfg file has a python_requires directive of >=3.8, which made pip installing impossible. I manually cloned the repo from github, modified the file to accept 3.7.1, and ran an install from within my computing environment. It seems to work fine under light casual testing. So my question: Is the >=3.8 directive in place because of known issues with earlier versions of python, or is it possible (though perhaps untested), that your module works fine with <3.8?
Generating HTML for large collections can take a significant amount of time. Not surprising considering that we're iteratively building a string from potentially hundreds of thousands of nested properties, and Python is famously slow at iterating. There are a few different routes I could take to try and reduce processing time, and some of them can be combined.
StringIO
buffer or building a huge list of substrings and joining to see if there are any gains there.html
module in Rust and use PyO3 to provide Python bindings. This would add a build dependency on Maturin and generally complicate builds, but it should ensure the fastest possible solution.Here's a rough performance benchmark from my laptop to serve as a baseline.
import ee
import eerepr
ee.Initialize()
obj = ee.ImageCollection("COPERNICUS/S2_SR").limit(1000)
info = obj.getInfo()
%timeit -r 5 -n 5 eerepr.html.convert_to_html(info)
# 1.35 s ± 45.4 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)
ee.List.shuffle(seed=False)
returns non-deterministic results from the same invocation, which means that equality checks on server side objects incorrectly pass causing false-positive cache hits. You can easily confirm this with:
l = ee.List.sequence(1, 10)
l.shuffle(seed=False) == l.shuffle(seed=False) # True
l.shuffle(seed=False).getInfo() == l.shuffle(seed=False).getInfo() # False
Repeated calls will generate different results but hit the same cache, displaying the wrong data. This will affect lists and anything generated from shuffled lists, e.g. a FeatureCollection
generated from a list of coordinates.
To fix this, I'll need to add a function that parses each Earth Engine object's invocation string (just get the string repr for the object) and prevents caching if List.shuffle
is found.
I've tested every other Earth Engine method that takes a random seed, and this is the only one that acts non-deterministically.
Thanks for the reminder @grovduck!
Hi Aaron, per our discussion, justing wanting to document that testing limits might be useful at some point. I suppose the goal would be to find thresholds that optimize mostly for user experience and a bit for ee servers (limit unnecessarily large or unmanageable requests) e.g.,
Here are the ideas we'd listed:
No action needed yet, just wanting to document and have a place to add conversation.
I am trying to integrate eerepr into geemap. Just noticed that eerepr requires Python >=3.8. Any reason why Python 3.7 is not supported? Since Colab still uses Python 3.7, although I can use pip install eerepr --ignore-requires-python
to install it, it is not ideal as a dependency for other packages.
Notebook file sizes can get huge if you print HTML reprs for a few big collections. I should experiment with minifying the HTML repr to see how that affects performance and file size. The minify-html project looks promising since it supports minifying HTML, CSS, and JS with no external dependencies.
A minor decrease in file size won't be worth adding a dependency, so I could make this a configurable setting with an optional dependency.
ipywidgets
supports async widgets via threading. Rather than waiting for data, turning it into HTML, and directly displaying the HTML, I could return a loading HTML
widget and use threading to update the widget contents once data is retrieved from the server and formatted. This would make the experience more similar to the code editor by not blocking the entire kernel.
The main downside would be adding a dependency on ipywidgets
, but if most users are using this alongside geemap
then that's not a big issue. Other considerations would be:
HTML
widget like I currently do, or do I need to handle that differently?HTML
widgets do not run scripts.HTML
widget display correctly when rendered statically like they currently do? That's not a dealbreaker, but it would be nice.ipywidgets
compatibility be? I've had issues in the past with ipywidgets>7
, especially in Jupyter Lab, so ideally this would work in version 7 or 8._repr_html_
should return an HTML string, not a widget, so I think I would need to use _ipython_display_
or _repr_mimebundle_
instead and return the corresponding method from the associated widget.Here's a rough implementation idea:
def _ipython_display_(obj: ee.Element):
"""Display an Earth Engine object in an async HTML widget"""
html = ipywidgets.HTML("<span>Loading...</span>")
threading.Thread().start(build_repr, args=(obj, html))
return html._ipython_display_()
def _build_repr(obj: ee.Element, html: ipywidgets.HTML) -> None:
"""Format an HTML repr string and add it to an HTML widget"""
info = obj.getInfo()
rep = _format_repr(info)
html.value = rep
Release as soon as it's relatively stable.
It's easy to accidentally print the repr for a huge collection which can crash a notebook. I should add some kind of configurable setting that prevents displaying reprs above a certain size. Instead, it should fall back to the string repr and give a user warning about the repr size with instructions to adjust the setting.
The property order displayed by eerepr
doesn't match the Code Editor.
For example, the Code Editor repr for ee.ImageCollection("COPERNICUS/S2_SR").first()
is in the order [type, id, version, bands, properties]
whereas eerepr
displays it as [type, bands, id, version, properties]
. This isn't a huge deal, but it seems like Code Editor usually puts scalars before lists/objects which is easier to read. Some objects in the Code Editor are sorted alpha (like an image's properties
), but some definitely aren't. It may just be that the order gets mangled when it's passed from the server to Python.
I should try to match the Code Editor order if it's a simple task, and otherwise I should just implement a logical sorting method--probably alpha with scalars before lists and dicts.
Add black
and ruff
pre-commit hooks. In the process, we can simplify the dev dependencies to just pre-commit
, as everything will run through there.
Collapsing behavior is based on pseudo-random UUIDs generated for each <ul>
in an objects repr. Because the repr is cached, printing the same object twice will return exactly the same repr with the same UUIDs. When you try to collapse the second repr, the first repr will (probably) get collapsed instead.
For some reason, collapsing works correctly in Jupyter classic, Colab, and VS Code, but is broken in Jupyter Lab (including Binder) or when a notebook is rendered statically (nbviewer). I'm guessing this is an implementation detail since the UUID issue isn't platform-dependent.
The quick solution is to cache the Earth Engine data by wrapping getInfo
instead of caching the repr, but regenerating the repr may be a non-negligible performance hit for huge collections. If performance is too slow I'll need to think about another solution, e.g. caching the repr but re-generating each pair of UUIDs.
Whenever a new Earth Engine object is retrieved by tests.test_html.load_info
, it is saved in the tests/data/data.json
cache. If an old test is removed or an unused object is accidentally committed, the cache could get out of sync with the tests.
It would be easy enough to fix this just by resetting the cache, but the ideal solution for long term maintenance would be to check that each object is actually used by the tests and either warn the user or delete any objects that aren't used. This may be tricky, especially in cases where a user runs a subset of tests, but it's worth poking around to see if there's a simple solution for this.
Map.draw_features[0]
AttributeError Traceback (most recent call last)
File D:\Code_base\anaconda\envs\GEE\lib\site-packages\IPython\core\formatters.py:342, in BaseFormatter.__call__(self, obj)
340 method = get_real_method(obj, self.print_method)
341 if method is not None:
--> 342 return method()
343 return None
344 else:
File D:\Code_base\anaconda\envs\GEE\lib\site-packages\eerepr\repr.py:82, in _ee_repr(obj)
77 if _is_nondeterministic(obj):
78 # We don't want to cache nondeterministic objects, so we'll add add a unique attribute
79 # that causes ee.ComputedObject.__eq__ to return False, preventing a cache hit.
80 setattr(obj, "_eerepr_id", uuid.uuid4())
---> 82 rep = _repr_html_(obj)
83 mbs = len(rep) / 1e6
84 if mbs > options.max_repr_mbs:
File D:\Code_base\anaconda\envs\GEE\lib\site-packages\eerepr\repr.py:62, in _repr_html_(obj)
60 css = _load_css()
61 js = _load_js()
---> 62 body = convert_to_html(info)
64 return (
65 "<div>"
66 f"<style>{css}</style>"
(...)
71 "</div>"
72 )
File D:\Code_base\anaconda\envs\GEE\lib\site-packages\eerepr\html.py:32, in convert_to_html(obj, key)
30 return list_to_html(obj, key)
31 elif isinstance(obj, dict):
---> 32 return dict_to_html(obj, key)
34 key_html = f"<span class='ee-k'>{key}:</span>" if key is not None else ""
35 return (
36 "<li>"
37 f"{key_html}"
38 f"<span class='ee-v'>{obj}</span>"
39 "</li>"
40 )
File D:\Code_base\anaconda\envs\GEE\lib\site-packages\eerepr\html.py:58, in dict_to_html(obj, key)
56 """Convert a Python dictionary to an HTML <li> element."""
57 obj = _sort_dict(obj)
---> 58 label = _build_label(obj)
60 header = f"{key}: " if key is not None else ""
61 header += label
File D:\Code_base\anaconda\envs\GEE\lib\site-packages\eerepr\html.py:249, in _build_label(obj)
246 if obj_type not in labelers:
247 obj_type = "_Typed"
--> 249 return labelers[obj_type](obj)
File D:\Code_base\anaconda\envs\GEE\lib\site-packages\eerepr\html.py:113, in _build_feature_label(obj)
111 def _build_feature_label(obj: dict) -> str:
112 n = len(obj.get("properties", []))
--> 113 geom_type = obj.get("geometry", {}).get("type")
114 type_label = f"{geom_type}, " if geom_type is not None else ""
115 noun = "property" if n == 1 else "properties"
AttributeError: 'NoneType' object has no attribute 'get'
<ee.feature.Feature at 0x23a08257cd0>
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.