open-eo / openeo-python-client Goto Github PK
View Code? Open in Web Editor NEWPython client API for OpenEO
Home Page: https://open-eo.github.io/openeo-python-client/
License: Apache License 2.0
Python client API for OpenEO
Home Page: https://open-eo.github.io/openeo-python-client/
License: Apache License 2.0
At the openEO hackathon it came out that users were confused by the functions of the imagecollection.
They thought that the method calls returns information about the metadata and not applying a process, maybe a process class can help to separate it better.
In that manner it also came up that it makes sense to return Process classes from "list_processes" instead of just a description.
Thoughts to create time series of single point:
Due to different (but also similar) authentication methods of the providers it makes probably sense to make an abstraction for authentication inside of a session.
One possibility:
Hi all,
I downloaded the openEO python client 0.4.0 version (the zip file) and after running "openeo.version()" I got 0.3.0.
It might be that openeo.version() attribute is not updated...
Generally, it does not stop one to work further, but it might be confusion.
The notebook that is linked as "Basic concepts and examples" gives the URL of the demo server as:
http://openeo.vgt.vito.be
But it just returns a 404.
Elsewhere, I found the link:
http://openeo.vgt.vito.be/openeo
This URL returns:
OpenEO GeoPyspark backend. /openeo/timeseries
which doesn't make much sense to me, but it's something. Please update the URL.
(Not sure if this should be an issue here or in the geopyspark-driver repo)
I just tried to install the Python client.
c:/dev/openeo-python-client
cmd
) and changed directory to c:/dev/openeo-python-client
pip install --user -e .
with this result:Obtaining file:///C:/Dev/openeo-python-client
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Dev\openeo-python-client\setup.py", line 4, in <module>
from sphinx.setup_command import BuildDoc
ModuleNotFoundError: No module named 'sphinx'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in C:\Dev\openeo-python-client\
How to proceed?
During the hackathon the current way of adding processes to an imagecollection by calling functions directly on it was confusing to some users.
Another approach came up, by creating a module with all processes in static functions, which will return the new combined image collection.
(e.g. new_imagecollection = openeo.processes.ndvi(imagecollection.band1, imagecollection.band2)
This would be more common for python users in the remote sensing area and it can be implemented besides the current way of doing it.
There should be more error specific information for the job.download() function
e.g.
sending a job with wrong date format --> the download function only returns "{}"
Printing the ImageCollection just returns the defautl object string from python. It would be nice to return a String with the information of the used product and the applied processes in a neat way.
The client could parse timeseries data directly into a Pandas DataFrame, instead of requiring the user to do so manually.
This relates to Open-EO/openeo-api#46 - more metadata is needed to know that the response is in fact something that can be parsed into one.
Currently, "ImageCollection" is the term used to describe process graph objects, whose output is not necessarily an image collection (case in point: timeseries). Perhaps another term would be more suitable.
Since the back ends may be capable of a different amount of processes and they can be retrieved by the GET /processes end point, it would be a major improvement to generate the process functions dynamically when a back end provider is chosen.
e.g.: https://stackoverflow.com/questions/23812760/dynamic-functions-creation-from-json-python
It is at least something I want to look into.
Currently, the python client gets band names (to be used for band_filter()
and band()
) from non standard bands
path in the collection metadata.
The spec however specifies where band names (and common names) should be specified: properties/eo:bands
related: #76 , Open-EO/openeo-api#208
Users think it returns the information of max_time/min_time, but it already adds a process...
Also it is confusing if it searches for maximum time or maximum value
The python client is currently quite confusing concerning band handling/filtering:
ImageCollection.band_filter(bands)
expects a list of band names according to the docs. However the only way I actually get this function to work is with a single integer 0
. When specifying a list of strings I get java.lang.String cannot be cast to java.lang.Integer
, when specifying a single string I get error ('band must be an int, tuple, or list. Recieved', <class 'str'>, 'instead.')
, when specifying an int other than 0 I get Band 1 does not exist
ImageCollection.band(name)
expects a band name (string) and works properlyfilter_bands
(not band_filter
) and defines three possible arguments for band selection: bands
, common_names
and wavelengths
https://open-eo.github.io/openeo-api/processreference/#filter_bandsI think it's best to get these things more in harmony
The methods image() and imagecollection() of the session should be merged to one using the key element of the core API (now: "product_id")
So it should be checked if all current examples and notebooks work without the 'collection_id' key.
If not the function should be marked as deprecated.
When running the setup.py with pip install --user -e ., I get 'unknown option: test_requirements'.
I am on a macOS High Sierra. Working on a Python3.7 virtual environment.
See complete error message below. Setup_tools is installed.
Installing collected packages: openeo-api
Running setup.py develop for openeo-api
Complete output from command /.virtualenvs/myvenv/bin/python3.7 -c "import setuptools, tokenize;file='/Documents/Notebooks/bed_services/openeo/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" develop --no-deps --user --prefix=:
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/distutils/dist.py:274: UserWarning: Unknown distribution option: 'test_requirements'
warnings.warn(msg)
usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
or: -c --help [cmd1 cmd2 ...]
or: -c --help-commands
or: -c cmd --help
error: option --user not recognized
----------------------------------------
Command "/.virtualenvs/myvenv/bin/python3.7 -c "import setuptools, tokenize;file='/Documents/Notebooks/bed_services/openeo/setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" develop --no-deps --user --prefix=" failed with error code 1 in /Documents/Notebooks/bed_services/openeo/
session.imagecollection("CGS_SENTINEL2_RADIOMETRY_V101") in the "Open-EO Proof Of Concept: Compositing" notebook returns an error : KeyError: 'content-type'
During the OenEO hackathon the users wanted to know at least how much data is left if a process like daterange_filter was added. So somehow the user should be able to see if any data gets returned.
I don't know if this issue should be shifted to the API.
How should asynchronous batch jobs (see in the coreAPI doc be handled in the python client ?
Workflow: commit job -> queue job -> when job has finished: get jobs data
Some possibilities on my opinion:
The git/github level name of this project is "openeo-python-client" (https://github.com/Open-EO/openeo-python-client)
When installing/building the package the name "openeo-api" / "openeo_api" is used. e.g.:
$ docker run --rm -it -w/app python bash
# git clone https://github.com/Open-EO/openeo-python-client.git
# cd openeo-python-client/
# pip install -r requirements.txt
....
# pip install -e .
...
Installing collected packages: openeo-api
Running setup.py develop for openeo-api
Successfully installed openeo-api
# pip freeze | grep openeo
-e git+https://github.com/Open-EO/openeo-python-client.git@..#egg=openeo_api
This is pretty confusing, especially because there is a related project with that openeo-api name: https://github.com/Open-EO/openeo-api
Morever, to use it, one has to use the name "openeo":
# python
>>> import openeo
>>> openeo
<module 'openeo' from '/usr/local/lib/python3.7/site-packages/openeo/__init__.py'>
I have a loop that fetches results from e.g GEE and EURAC, but it somehow does not work when using EURAC.
I am getting the following exception after around a minute of waiting:
requests.exceptions.MissingSchema: Invalid URL '/jobs/781ddbd4-306f-48c2-9a2f-8cd387a18c5b/results': No schema supplied. Perhaps you meant http:///jobs/781ddbd4-306f-48c2-9a2f-8cd387a18c5b/results?
Backend url: http://saocompute.eurac.edu/openEO_0_3_0/openeo
Process graph:
{
"min": -1,
"max": 1,
"imagery": {
"imagery": {
"red": "B04",
"nir": "B08",
"imagery": {
"extent": [
"2018-06-04T00:00:00Z",
"2018-06-04T23:59:59Z"
],
"imagery": {
"extent": {
"west": 8.71,
"south": 47,
"east": 9.8,
"north": 47.8
},
"imagery": {
"process_id": "get_collection",
"name": "openEO_S2_32632_10m_L2A"
},
"process_id": "filter_bbox"
},
"process_id": "filter_daterange"
},
"process_id": "NDVI"
},
"process_id": "min_time"
},
"process_id": "stretch_colors"
}
Using the synchronous way with .download() also does not work. But this may be due to the fact that the function is deprecated?
AttributeError: 'RESTConnection' object has no attribute 'download_job'
Instead of requiring the users to download a raster file and then import it, the client could do that directly and return an imported object that is ready to be visualised etc. That is also useful for the synchronous case, where the raw raster is returned right away.
Hi,
I was using apply_dimension method on python client to run a udf. The method is available, but there is no help associated with it.
Could you please add the documentation of apply_dimension method?
Suggestions for needed parameters and their retrieval methods within an EO product:
it seems that currently no token is present or auth header is not set correctly on requests to POST /result or POST /jobs. This causes errors when connecting to eurac or GEE backend.
In the solution for the hackathon I found the following code:
import openeo
from openeo.auth.auth_bearer import BearerAuth
endpoint = "http://..."
username = "..."
password = "..."
session = openeo.session(username, endpoint=endpoint)
# The GEE back end uses a Bearer Token for authentication, therefore it has to be imported
session.auth(username, password, BearerAuth)
It is quite confusing that you need to specify the username twice. Why do you need it for the openeo.session?
Running the following code gives me a 16.8 MB geotiff with 4 bands that does not seem to contain the actual data in any band.
import openeo
session = openeo.session('nobody', endpoint='http://openeo.vgt.vito.be/openeo')
s2_coll = session.image('CGS_SENTINEL2_RADIOMETRY_V101')
udf_code_file = '/home/cpa/workspace/openeo-hackathon/raster_collections_ndvi.py'
with open(udf_code_file) as fp:
udf_str = fp.read()
ndvi = s2_coll.date_range_filter("2017-10-10", "2017-10-30") \
.bbox_filter(left=6.8371137, top=50.5647147, right=6.8566699, bottom=50.560007, srs='EPSG:4326')
nvdi = ndvi.apply_tiles(udf_str).max_time()
job = ndvi.download('ndvi.tiff', 'GTIFF')
A small change in the file which calls apply_tiles
directly gives the correct result.
import openeo
session = openeo.session('nobody', endpoint='http://openeo.vgt.vito.be/openeo')
s2_coll = session.image('CGS_SENTINEL2_RADIOMETRY_V101')
udf_code_file = '/home/cpa/workspace/openeo-hackathon/raster_collections_ndvi.py'
with open(udf_code_file) as fp:
udf_str = fp.read()
ndvi = s2_coll.date_range_filter("2017-10-10", "2017-10-30") \
.bbox_filter(left=6.8371137, top=50.5647147, right=6.8566699, bottom=50.560007, srs='EPSG:4326').apply_tiles(udf_str).max_time()
job = ndvi.download('ndvi_working.tiff', 'GTIFF')
When running the unit tests from test_bandmath.py on python 3.5 you randomly get:
no failure
failure in test_evi:
self.assertDictEqual(expected_graph,actual_graph)
E AssertionError: {'red[578 chars]ex': 2, 'data': {'from_argument': 'data'}}, 'r[1423 chars]lse}} != {'red[578 chars]ex': 1, 'data': {'from_argument': 'data'}}, 'r[1423 chars]lse}}
failure in test_evi:
tests/test_bandmath.py:50:
> evi_cube = (2.5 * (B08 - B04)) / ((B08 + 6.0 * B04 - 7.5 * B02) + 1.0)
openeo/rest/imagecollectionclient.py:264: in __sub__
return self.subtract(other)
openeo/rest/imagecollectionclient.py:151: in subtract
return self._reduce_bands_binary(operator, other)
> input1_id = list(merged.processes.keys())[list(merged.processes.values()).index(input1)]
E ValueError: {'result': True, 'arguments': {'data': [{'from_node': 'arrayelement2'}, {'from_node': 'product1'}]}, 'process_id': 'sum'} is not in list
The second part of the first task in openeo-hackathon/test-cases/README.md is:
Make sure that the client is properly working by connecting to one the back-ends and requesting the capabilities that are provided by the back-end.
The API docs linked in this repo's README were my first choice to look up how to do that, but they don't include anything related to a capabilities
request.
Indeed the word "capabilities" doesn't appear anywhere in this repo (full search).
In the hackathon the idea came up to enable the users to visualize the process graph in a nice, like an actual image of an graph.
Some tools were also mentioned to use it e.g. Tensor Board
There is already a python package with a similiar solution in the Dask package
It would be better to use python date or datetime (e.g. in the date_range_filter) and not a string for the date.
It is confusing, that they are not the same. So that the first one returns only information about the collection and the second one returns a ImageCollection based class...
I tried to go through the hackathon tasks in openeo-hackathon/test-cases/README.md and stumbled across a few things. I'll open one issue for each. I'm not a dedicated Python guy either, although I've used it before (at least a little bit).
The very first task was:
Please install one of the openEO clients.
I managed to do so after figuring out that I have to use Python3. I guessed that using Python2.7 was the error because I was getting an invalid syntax
error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "openeo/__init__.py", line 17, in <module>
from .catalog import EOProduct
File "openeo/catalog.py", line 11
def collection_identifier(self) -> str:
^
SyntaxError: invalid syntax
Could you please specifically mention that Python3 is required? This would make "getting started" easier for users of all levels. Also, a hint that the dependency installation has to be done with pip3
might be helpful, too.
After using python3
and pip3
(instead of python
and pip
) the installation worked fine on my Ubuntu 16.04 machine ๐
Open question: is it necessary to keep supporting 0.3-style API in the python client? I have not really an idea whether there are actual users that use the python client with 0.3-only backends.
There are quite some differences between the 0.3 and 0.4 versions of the API (how process graphs work, terminology, etc), and supporting both in the python client code makes things a bit tedious and hard to maintain.
It would benefit speed of development and code quality if we could get rid of all the 0.3-related code and implementation details.
There may be a better name for the function than just timeseries
. Maybe something like pixel_timeseries
or get_pixel_timeseries
?
Some examples do use deprecated functions (e.g. PoC_EURAC.ipynb)
The examples should be updated to the newest version (0.3.1)
(session
.imagecollection('BIOPAR_FAPAR_V1_GLOBAL')
.bbox_filter(west=-60, south=-5, east=-50, north=0, crs="EPSG:4326")
# .....
)
gives:
OSError: Received an exception from the server for url: http://openeo.vgt.vito.be/openeo/0.4.0/result and POST message: {"process_graph": {"filterbbox1": {"arguments": {"north": null, "south": -5, "east": -50, "data": {"from_node": "getcollection1"}, "west": -60, "crs": "EPSG:4326"}, "process_id": "filter_bbox", "result": false}, "filtertemporal1": {"arguments": {"to": "2017-02-25", "from": "2017-02-15", "data": {"from_node": "filterbbox1"}}, "process_id": "filter_temporal", "result": false}, "getcollection1": {"arguments": {"name": "BIOPAR_FAPAR_V1_GLOBAL"}, "process_id": "get_collection", "result": false}, "reduce1": {"arguments": {"dimension": "temporal", "reducer": {"callback": {"r1": {"arguments": {"dimension": {"from_argument": "dimension"}, "data": {"from_argument": "dimension_data"}}, "process_id": "min", "result": true}}}, "data": {"from_node": "filtertemporal1"}}, "process_id": "reduce", "result": "true"}}}{"message":"An error occurred while calling o60799.pyramid_seq.\n: java.lang.NullPointerException\n\tat geotrellis.vector.reproject.Reproject$.apply(Reproject.scala:65)\n\tat geotrellis.vector.reproject.Implicits$ReprojectExtent.reproject(Implicits.scala:57)\n\tat geotrellis.vector.ProjectedExtent.reproject(Extent.scala:53)\n\tat org.openeo.geotrellisaccumulo.PyramidFactory.createQuery(PyramidFactory.scala:130)\n\tat org.openeo.geotrellisaccumulo.PyramidFactory.pyramid_seq(PyramidFactory.scala:156)\n\tat org.openeo.geotrellisaccumulo.PyramidFactory.pyramid_seq(PyramidFactory.scala:106)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.GatewayConnection.run(GatewayConnection.java:238)\n\tat java.lang.Thread.run(Thread.java:748)\n"}
note:
{"arguments": {"north": null, "south": -5, "east": -50, ....
Could we get a pypi release for 0.4 once it is ready, please? This would allow us to use this library as a dependency for other projects, e.g. the backend-result-validator. See Open-EO/openeo-result-validation-engine#43
As also discussed in the core api (https://github.com/Open-EO/openeo-api/issues/49)
, the processes will define the keyword of the process graph.
Right now there is a python file imagery.py and imagecollection.py for the keywords "imagery" and "collection" that do basically the same, the redundancy should be removed in the future.
Public authorities / NGOs / companies may have restrictions to install software (as e.g. Python, R) on their office computers. A remotely hosted openEO web editor is a way for them also to use openEO, but is it also possible to run the openEO client using a portable python installation? Would be nice to check this and add a hint to the documentation.
connection.list_jobs() should return a list of (REST)Job objects
There is duplication between requirements.txt
openeo-python-client/requirements.txt
Lines 1 to 6 in e72ca6f
setup.py
Lines 46 to 51 in e72ca6f
As far as I understand the python packaging world, projects that are to be used as library in an existing env or application, like the op openeo client, should only define dependencies in install_requires of setup.py
and should not provide a requirements.txt
file. The latter is more intended for projects that are standalone applications.
Rename the function for the udfs somehow that "udf" is mentioned (e.g. image.apply_tiles()), because otherwise it is hard to find the funtion that applies the udf.
Hey, I wanted update the processes of the client to v0.4.1, but at the moment we define processes at two places in the code:
I feel like the first one is the one the python developer in this project like the most and the second one is the correct one according to the client guidelines.
So from my point of view we can keep both, but we should define the processes only at one place, so there are three possibilities I can think of:
Define all processes in "ImagecollectionClient" (which is atm more up-to-date regarding existing processes) and use it for "Processes" also (internally).
Define all processes in "Processes" (which is atm more up-to-date regarding to the guidelines, and therefore all possible parameters) and use it for "ImagecollectionClient" also (internally).
Create a new Class, where the Processes are defined and used in both classes.
What do you think?
python client should properly support OpenID Connect as authentication mechanism.
also see #10
(internal VITO ticket: EP-2209)
Error handling to user file upload / download / deletion should be improved by adding exceptions, and more relevant return values than true or false.
Base class ImageCollection
(in openeo.imagecollection
)
date_range_filter(self, start_date,end_date)
pass
)filter_daterange
instead)filter_daterange(self, extent)
date_range_filter()
Child classes (ImageCollectionClient
in openeo.rest
and GeotrellisTimeSeriesImageCollection
in GeoPyspark driver):
date_range_filter(self, start_date,end_date)
with actual implementationfilter_daterange(self, extent)
Same pattern for bbox_filter
and filter_bbox
So the deprecated methods are implemented and the new non-deprecated calls forward to deprecated implementations? Shouldn't it be the other way around?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.