multiply-org / prior-engine Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
The extensibility of the prior enginge with the abstraction allows users to write their own code to be evaluated from the prior engine, like tonio stated:
From #19 (closed because duplicate and split up in multiple sub issues):
Tonio:
The framework, however, would go beyond that. It would allow users to not only add their own data, but to write their own code which can be interpreted by the prior engine.
This functionality must be documented.
@TonioF minor idea..or enhancement:
could you add the webhooks for ReadTheDocs in the prior-engine
repo? I have auto-generated the documentation, but until the hooks at the repo are set, it does not update correctly..
We have included a prior engine specific logging..however this may cause trouble in the auto-documentation process..see #40
I read that
Modules, that is, libraries intended to be used by other programs, should only emit log messages as a best practice. They should never configure how log messages are handled. That is the responsibility of the application that imports and uses the modules.
Is a MULTIPLY application-wide logger (possibly based in multiply-core) envisaged? And if so should we just instatiate loggers via
import logging
log = logging.getLogger(__name__)
def do_something():
log.debug("Doing something!")
As referenced in #21:
I did set up logging in the prior engine to avoid print statements and let us (later the user?) decide, how much (Level, e.g. debug, info, error, ..) information should be printed on screen and in a log file.
Maybe @TonioF can comment on this. What do you think of the approach of an file based logger configuration and per submodule loading of the prior_logger
as class instance? (I actually did not really test it yet).
It shall be possible to add new prior creators without changing code within the prior engine package.
There are still many tests missing..
Module | statements | missing | excluded | coverage |
---|---|---|---|---|
Total | 754 | 596 | 0 | 21% |
multiply_prior_engine/init.py | 2 | 1 | 0 | 50% |
multiply_prior_engine/prior.py | 50 | 17 | 0 | 66% |
multiply_prior_engine/prior_engine.py | 108 | 66 | 0 | 39% |
multiply_prior_engine/soilmoisture_prior.py | 162 | 119 | 0 | 27% |
multiply_prior_engine/vegetation_prior.py | 424 | 389 | 0 | 8% |
prior_engine_runner.py | 8 | 4 | 0 | 50% |
Sphinx auto-doc of modules is not working on ReadTheDocs:
WARNING: autodoc: failed to import module 'vegetation_prior_creator' from module 'multiply_prior_engine'; the following exception was raised:
Traceback (most recent call last):
File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/conda/doc/lib/python3.6/site-packages/sphinx/ext/autodoc/importer.py", line 154, in import_module
__import__(modname)
File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/checkouts/doc/docs/prior-engine/multiply_prior_engine/__init__.py", line 27, in <module>
PriorLogger(level='debug', handlers=['console'])
File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/checkouts/doc/docs/prior-engine/multiply_prior_engine/prior_logger.py", line 50, in __init__
config_dict['handlers']['file']['filename'] = filename
TypeError: 'Mock' object is not subscriptable
This fails for all prior-engine modules, not only for vegetation_prior_creator
.
I understand that not all python modules are installed on readthedocs but many are 'mocked'/mimicked, hence config_dict
as being a mocked object from config_dict = yaml.load(..)
is not subscriptable.
Does this problem only occur because the PriorLogger is placed in the __init.py__
file and hence is loaded(/executed?) even for gathering the auto-doc?
@TonioF do you have experience with this?
Prior code shall be presented in the form of plug-ins: In this way, we avoid having to change the prior engine code everytime we introduce a new prior. It would facilitate the adding of new priors.
How should the prior data be passed on to the inference-engine
?
A secondary question to this is, if the MULTIPLY platform will have a central way/module of resampling input/intermediate(prior)/output data (see issue in multiply-core)
Passing on prior data as NetCDF files would be beneficial, as the resampling would be very easy. But, read/write operations should be limited.
@JorisTimmermans mentioned that there is an in-memory netcdf-file-container format which could be used.
What are your thoughts on this @jgomezdans and @TonioF?
I think, this should be done in the prior_engine
itself.
Also, I'm thinking of an OrderedDict
of the prior config, to keep the order of entered priors.. is that suitable? does this work with YAML - imported config files?
check and add changes from #39 to soilmoisturepriorcreator as well
@JorisTimmermans
For consistency please change the documentation name for the repository to
MULTIPLY - Prior Engine
Methods for importing user prior data have to be written.
So far, a standardized form of an csv file can be read (limitation should be in README..).
@JorisTimmermans we should agree upon a config file structure. Discussion is now initialized 😃
I don't know about your experience with this, so I just start rambling.
Maybe you saw my sample config already.. to explain my layout ideas:
Prior
section:priors
subsection:sm_clim
subsubsection:Prior:
priors:
sm_clim:
type: climatology
climatology_file: ../aux_data/CCI_SM_climatology_eur_merged_inv.nc
# sm_recent:
# type: recent
# sm_static:
# type: static
# sm_dynamic:
# type: dynamic
# model:
# - API
# - other
#? veg_pft:
#? type: pft
#? database: /aux_data/some_DB
#? veg_spec:
#? type: species
#? database: /user_data/some_DB
The original list structure from the very-alpha-version of the config file is replaced by a more dictionary centered way. this would in my mind make it easier to access the right values rather than have explicit references to list numbers which should be avoided as @TonioF already was very correct about (#3).
@TonioF (also José?) could guide me (us?) here with a 'good practice' advice and feasibility assessment..
@McWhity also brought up a nice way to access the dictionary structure:
class AttributeDict(object):
"""
A class to convert a nested Dictionary into an object with key-values
accessibly using attribute notation (AttributeDict.attribute) instead of
key notation (Dict["key"]). This class recursively sets Dicts to objects,
allowing you to recurse down nested dicts (like: AttributeDict.attr.attr)
"""
def __init__(self, **entries):
self.add_entries(**entries)
def add_entries(self, **entries):
for key, value in entries.items():
if type(value) is dict:
self.__dict__[key] = AttributeDict(**value)
else:
self.__dict__[key] = value
def __getitem__(self, key):
"""
Provides dict-style access to attributes
"""
return getattr(self, key)
Then load:
def _get_config(self):
"""
Load configuration from self.config.bb.pre_process()
writes to self.config.
"""
with open(self.config, 'r') as cfg:
self.config = yaml.load(cfg)
self.config = AttributeDict(**self.config)
and access in our case e.g.:
self.config.Prior.priors.sm_clim.type
(also shorten and reassign subsubsection to single prior specific config (e.g. SoilmoisturePrior.config
).
Should we use this approach as well? (Could be implemented module independent as well..)
What are your thoughts on this generally? still not that important? split the config files?
PS: would be nice if you could just wink at my lack of experience with config file input to python scripts, if its too obvious, and state the right way.
At the moment the ROI in mentioned in the sample config as POLYGON ( (0, 0), (0,1 )) #blabla
.
Is this a 'convention' I don't know and if so how is it meant to be loaded? with shapely module? gdal? so far this is not clear to me (for the moment parsing the string..)
--> How will the the coordinates finally be given? Has there been a decision yet?
get default variables from multiply_core/variables/variables.py.
e.g. in soilmoistureprior creator:
try:
defaultvariables = multiply_core.variables.get_defaul_variables()
except:
defaultvariables = [sm, lai, cab, ...]
or so... just to avoid duplicity in the multiply code
@JorisTimmermans and I discussed the interface of the prior-engine
and came to the conclusion that the module may/should get a list of variables as input from the inference-engine
. These represent the 'parameters/variables' asked for by the user, to provide prior data accordingly.
The selection which priors are to be calculated/provided will then be switched from the config file to the actual needs of the inference-engine
.
What are your thoughts on this, @TonioF and @jgomezdans ?
The information provided should consist of:
These could be provided as dictionary or single variables..depends on @jgomezdans .
So far, this is implemented in write_config
with shutil.copyfile
. But, in my case it is not copying the file, nor is an exeption raised if '.bak' is added to the file name.. any ideas @TonioF ?
/home/thomas/Code/prior-engine/multiply_prior_engine
)Other options:
subprocess
/ os.system
: should be avoided to retain os independence.. with open(source, 'r') as src, open(dest, 'w') as dst: dst.write(src.read())
Here's the code:
def write_config(self, configuration, **kwargs):
"""Write configuration to a YAML file.
:param configuration: configuration dictionary to write to file.
:Keyword Arguments:
* *path_to_config* (``str``) --
path to config file. if None, a tempfile will be created.
* *new_config_filename* (``str``) --
Filename of new user config. Only has effect if path_to_config
is specified.If None, a temporary filename will be used.
:returns: config file name
:rtype: string
"""
path_to_config = kwargs.get('path_to_config', None)
new_config_filename = kwargs.get('new_config_filename', None)
if new_config_filename is not None and path_to_config is None:
warnings.warn('Entered config file name ({}) will be omitted '
'--> no path specified!'
.format(new_config_filename), Warning)
self.check_path_to_config_or_create(path_to_config)
assert os.path.isfile(self.configfile)
if self.configfile == default_config:
if os.path.exists(default_config):
src = os.path.abspath(default_config)
dest = src + '.bak'
logger.info('Creating {}.'.format(dest))
print('Creating {}.'.format(dest))
# TODO not writing file
shutil.copyfile(src, dest)
logger.info('User config file: {}'.format(self.configfile))
with open(self.configfile, 'w') as cfg:
cfg.write(yaml.dump(configuration, default_flow_style=False))
return self.configfile
@JorisTimmermans @RaT0M
please add a file called "authors.rst" with your names (e.g. * Thomas weiß <"[email protected]">) in the main directory of this repository. I will link then this information to our main MULTIPLY documentation page
There shall be a first initial version of the prior engine. It shall implement the interfaces and return usable - not necessarily scientifically sound - result.
interpolate sm climatology (at time of retrieval?) for smoother transition from month to month.
Here is my idea how the Data Access Component could be integrated:
The Prior Engine will have its own Data Access Component which is decoupled from the rest of the system. It deals only with (a) the prior .vrt-files, (b) the .tiff- or other files which form the input to the .vrt-files, and (c) any required aux-data. For (a) and (b), a data store will be used which will also be available for the orchestrator. The Data Access Component will be configured to find the aux data that is required by the prior engine (or its prior creators, to be exact) . This data may not be not be available locally from the beginning, so it might need to be downloaded. All this data will be stored in the .multiply-folder in the user's home directory.
The workflow is this:
The Prior Engine will be asked for a prior file for a variable that covers some spatial and temporal extent. The Data Access Component will check whether a prior (.vrt-)file exists that meets these requirements. If so, it will be returned. If not (or if the user wishes to use his / her dedicated auxdata files), the prior engine will be triggered to compute such a prior file. After this has been done, the file (and, if necessary, files it references) will be permanently stored by the Data Access Component for future use.
This could be done by writing a wrapper script around the prior engine or even by integrating it into the prior engine module itself. An open question is how to design this so that the prior creators would get the aux data they need from the Data Access Component.
Status of CLI for user defined priors:
basic command (entry point in setup.py
) is multiply_prior_engine.user_prior.main
(could be run from prior_engine_runner.py
in parent folder if ModuleNotFoundError
).
There are 4 sub commands (like in in e.g. git --> git branch
, git merge
, ..):
The commands are documented already.
Please check the CLI out, find help like in every cli with the -h
-flag.
What do you think?
Config file
parameter loading with method AttributeDict
may be implemented for more readable code.
Read config via:
def _get_config(self):
"""
Load configuration from self.config.bb.pre_process()
writes to self.config.
"""
with open(self.config, 'r') as cfg:
self.config = yaml.load(cfg)
self.config = AttributeDict(**self.config)
and access in our case e.g.:
self.config.Prior.priors.sm_clim.type
The README file should cover the CLI.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.