multiply-org / multiply-core Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 5.0 15.13 MB

The core functionality of the MULTIPLY platform

Python 100.00%

multiply-core's People

Contributors

Stargazers

Watchers

Forkers

pygeo jgomezdans detrident joristimmermans

multiply-core's Issues

Rasterising state mask from vector file

@TonioF asked...

When you say that you resample it to the state mask, is that because you assume the state mask has the spatial extent and resolution specified by the user? I am asking because it might happen that we have to resample the state mask, too, doesn't it? (if it is provided as vector data, we first have to bring it to a grid, so we can take the requested one right away, of course)

The state mask is just a numpy array with True/False for different pixels. However, because different observations and prior data can come with different projections etc, it is important that the geographical reference is defined too. The easiest way is to store the state mask in a GDAL-compatible dataset. If the user wants to use a vector file, I have added a rasterise function to core to this end.

Emulator Engine

Has the emulator engine to be part of the platform itself?
The practical workflow will be that you have an RT model and calibrate an GP emulator based on that. The calibrated emulator is then used for the inference engine.

Thus I would consider the emulators as well as the RT models themselves to be actually some software outside of the actual platform. Only the calibrated emulators will be directly used and should be therefore part of the platform.

@jgomezdans to comment

License?

Tonio mentioned to push decision on License for workshop.

Do we really need to wrap a function with a class and a static method?

In reproject, the function is wrapped around in a class and is decorated with a static method. In Python, you'd just put the function definition without the class boiler plate, although this might not be the convention in other languages (Java, c#). Is there a specific reason for this?

Tiling

Currently no tiling is foreseen in the dummy version. This is o.k. for the beginning.

The question I have is where the tiling is done in the end. This is in particular also relevant for the parallelization.
I think, deciding this quite soon is important as it has implications on the structure of the overall code development.

Like I see it, the dummy version could represent the processing for a single workflow. Thus it receives the coordinates of the target area and does the entire processing for this area.

The engine calling this processor (our current dummy) would then be responsible to split the processing of a larger area into different chunks and distribute across different computing nodes as well as collect the results again (map-reduce).

If this is the baseline, then we don't need to think about parallelization at all for the dummy, as this will be done on a higher level in the end.

@TonioF @barsten to comment.

Combined Documentation is not very visible on Github

Only https://multiply-core.readthedocs.io/en/doc/ holds the combined documentation which in my opinion definitely should to be on the "home page"/master page for users searching for documentation (supporting the test users?).
Therefore, I suggest adding the link or better merging the efforts from https://github.com/multiply-org/multiply-core/tree/doc
@TonioF ? @JorisTimmermans

Resampling data throughout MULTIPLY platform

Should there be a central method of resampling input/intermediate(prior)/output data to avoid double coding and preserve consistency throughout the platform?

Passing on (e.g. prior) data as NetCDF files would be beneficial, as the resampling would be very easy. But, read/write operations should be limited.
@JorisTimmermans mentioned that there is an in-memory netcdf-file-container format which could be used.

What are your thoughts on this @TonioF ?

Module naming convention

Avoid to have any '-' in the names of the modules and project names. This is not compliant with the python coding style conventions and results in import errors.

A statement import multiply-dummydoes not work.

In the PEP8 conventions it says that

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

Thus recommendation would be to replace multiply-dummywith multiply_dummy

How to specify configuration?

By implementing the SAR preprocessor, I recognized that it is quite important that we soon agree on how we store and pass the general configuration information between the core and the individual sub-modules.

In general I would be in favor that each sub-module only gets a single argument. This makes it much easier if things evolve over time. IN principle I see two options
a) configuration as dictionary
b) configuration in its own class

Thus for a) the call to a sub-module would look like

config = {'region' : {'some list off coordinates'}, 't_start' : date time.datetime(2000,1,5), t_stop : date time.datetime(2005,11,2)}

S = SARPreProcessor(config)
S. do_something()

For option b) it would be more like

C = Config(t_start='2006,1,1', t_stop='2011,7,31'). # configuration stored as attributes
S = SARPreProcessor(C)
S. do_something()

Both options are extendable. A dictionary can be easily stored on some file (e.g. YML). A class on the other hand can have methods that might help at some stage. Storing the config is also possible through a save function.

Thus, what do people think what option we should choose?

multiply-org / multiply-core Goto Github PK

multiply-core's People

Contributors

Stargazers

Watchers

Forkers

multiply-core's Issues

Rasterising state mask from vector file

Emulator Engine

License?

Do we really need to wrap a function with a class and a static method?

Tiling

Combined Documentation is not very visible on Github

Resampling data throughout MULTIPLY platform

Module naming convention

How to specify configuration?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent