Git Product home page Git Product logo

multiply-core's People

Contributors

bulli92 avatar toniof avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

multiply-core's Issues

Rasterising state mask from vector file

@TonioF asked...

When you say that you resample it to the state mask, is that because you assume the state mask has the spatial extent and resolution specified by the user? I am asking because it might happen that we have to resample the state mask, too, doesn't it? (if it is provided as vector data, we first have to bring it to a grid, so we can take the requested one right away, of course)

The state mask is just a numpy array with True/False for different pixels. However, because different observations and prior data can come with different projections etc, it is important that the geographical reference is defined too. The easiest way is to store the state mask in a GDAL-compatible dataset. If the user wants to use a vector file, I have added a rasterise function to core to this end.

Emulator Engine

Has the emulator engine to be part of the platform itself?
The practical workflow will be that you have an RT model and calibrate an GP emulator based on that. The calibrated emulator is then used for the inference engine.

Thus I would consider the emulators as well as the RT models themselves to be actually some software outside of the actual platform. Only the calibrated emulators will be directly used and should be therefore part of the platform.

@jgomezdans to comment

License?

Tonio mentioned to push decision on License for workshop.

Do we really need to wrap a function with a class and a static method?

In reproject, the function is wrapped around in a class and is decorated with a static method. In Python, you'd just put the function definition without the class boiler plate, although this might not be the convention in other languages (Java, c#). Is there a specific reason for this?

Tiling

Currently no tiling is foreseen in the dummy version. This is o.k. for the beginning.

The question I have is where the tiling is done in the end. This is in particular also relevant for the parallelization.
I think, deciding this quite soon is important as it has implications on the structure of the overall code development.

Like I see it, the dummy version could represent the processing for a single workflow. Thus it receives the coordinates of the target area and does the entire processing for this area.

The engine calling this processor (our current dummy) would then be responsible to split the processing of a larger area into different chunks and distribute across different computing nodes as well as collect the results again (map-reduce).

If this is the baseline, then we don't need to think about parallelization at all for the dummy, as this will be done on a higher level in the end.

@TonioF @barsten to comment.

Resampling data throughout MULTIPLY platform

Should there be a central method of resampling input/intermediate(prior)/output data to avoid double coding and preserve consistency throughout the platform?

Passing on (e.g. prior) data as NetCDF files would be beneficial, as the resampling would be very easy. But, read/write operations should be limited.
@JorisTimmermans mentioned that there is an in-memory netcdf-file-container format which could be used.

What are your thoughts on this @TonioF ?

Module naming convention

Avoid to have any '-' in the names of the modules and project names. This is not compliant with the python coding style conventions and results in import errors.

A statement import multiply-dummydoes not work.

In the PEP8 conventions it says that

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

Thus recommendation would be to replace multiply-dummywith multiply_dummy

How to specify configuration?

By implementing the SAR preprocessor, I recognized that it is quite important that we soon agree on how we store and pass the general configuration information between the core and the individual sub-modules.

In general I would be in favor that each sub-module only gets a single argument. This makes it much easier if things evolve over time. IN principle I see two options
a) configuration as dictionary
b) configuration in its own class

Thus for a) the call to a sub-module would look like

config = {'region' : {'some list off coordinates'}, 't_start' : date time.datetime(2000,1,5), t_stop : date time.datetime(2005,11,2)}

S = SARPreProcessor(config)
S. do_something()

For option b) it would be more like

C = Config(t_start='2006,1,1', t_stop='2011,7,31'). # configuration stored as attributes
S = SARPreProcessor(C)
S. do_something()

Both options are extendable. A dictionary can be easily stored on some file (e.g. YML). A class on the other hand can have methods that might help at some stage. Storing the config is also possible through a save function.

Thus, what do people think what option we should choose?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.