multiply-org / multiply-core Goto Github PK
View Code? Open in Web Editor NEWThe core functionality of the MULTIPLY platform
The core functionality of the MULTIPLY platform
@TonioF asked...
When you say that you resample it to the state mask, is that because you assume the state mask has the spatial extent and resolution specified by the user? I am asking because it might happen that we have to resample the state mask, too, doesn't it? (if it is provided as vector data, we first have to bring it to a grid, so we can take the requested one right away, of course)
The state mask is just a numpy array with True/False for different pixels. However, because different observations and prior data can come with different projections etc, it is important that the geographical reference is defined too. The easiest way is to store the state mask in a GDAL-compatible dataset. If the user wants to use a vector file, I have added a rasterise function to core to this end.
Has the emulator engine to be part of the platform itself?
The practical workflow will be that you have an RT model and calibrate an GP emulator based on that. The calibrated emulator is then used for the inference engine.
Thus I would consider the emulators as well as the RT models themselves to be actually some software outside of the actual platform. Only the calibrated emulators will be directly used and should be therefore part of the platform.
@jgomezdans to comment
Tonio mentioned to push decision on License for workshop.
In reproject, the function is wrapped around in a class and is decorated with a static method. In Python, you'd just put the function definition without the class boiler plate, although this might not be the convention in other languages (Java, c#). Is there a specific reason for this?
Currently no tiling is foreseen in the dummy version. This is o.k. for the beginning.
The question I have is where the tiling is done in the end. This is in particular also relevant for the parallelization.
I think, deciding this quite soon is important as it has implications on the structure of the overall code development.
Like I see it, the dummy version could represent the processing for a single workflow. Thus it receives the coordinates of the target area and does the entire processing for this area.
The engine calling this processor (our current dummy) would then be responsible to split the processing of a larger area into different chunks and distribute across different computing nodes as well as collect the results again (map-reduce).
If this is the baseline, then we don't need to think about parallelization at all for the dummy, as this will be done on a higher level in the end.
Only https://multiply-core.readthedocs.io/en/doc/ holds the combined documentation which in my opinion definitely should to be on the "home page"/master page for users searching for documentation (supporting the test users?).
Therefore, I suggest adding the link or better merging the efforts from https://github.com/multiply-org/multiply-core/tree/doc
@TonioF ? @JorisTimmermans
Should there be a central method of resampling input/intermediate(prior)/output data to avoid double coding and preserve consistency throughout the platform?
Passing on (e.g. prior) data as NetCDF files would be beneficial, as the resampling would be very easy. But, read/write operations should be limited.
@JorisTimmermans mentioned that there is an in-memory netcdf-file-container format which could be used.
What are your thoughts on this @TonioF ?
Avoid to have any '-' in the names of the modules and project names. This is not compliant with the python coding style conventions and results in import errors.
A statement import multiply-dummy
does not work.
In the PEP8 conventions it says that
Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.
Thus recommendation would be to replace multiply-dummy
with multiply_dummy
By implementing the SAR preprocessor, I recognized that it is quite important that we soon agree on how we store and pass the general configuration information between the core and the individual sub-modules.
In general I would be in favor that each sub-module only gets a single argument. This makes it much easier if things evolve over time. IN principle I see two options
a) configuration as dictionary
b) configuration in its own class
Thus for a) the call to a sub-module would look like
config = {'region' : {'some list off coordinates'}, 't_start' : date time.datetime(2000,1,5), t_stop : date time.datetime(2005,11,2)}
S = SARPreProcessor(config)
S. do_something()
For option b) it would be more like
C = Config(t_start='2006,1,1', t_stop='2011,7,31'). # configuration stored as attributes
S = SARPreProcessor(C)
S. do_something()
Both options are extendable. A dictionary can be easily stored on some file (e.g. YML). A class on the other hand can have methods that might help at some stage. Storing the config is also possible through a save
function.
Thus, what do people think what option we should choose?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.