multiply-org / prior-engine Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 6.0 106.77 MB

License: GNU General Public License v3.0

Python 99.92% Batchfile 0.03% Shell 0.05%

prior-engine's People

Contributors

Watchers

Forkers

pygeo tramsauer marcyin mcwhity joristimmermans

prior-engine's Issues

Document extensibility of prior engine through framework

The extensibility of the prior enginge with the abstraction allows users to write their own code to be evaluated from the prior engine, like tonio stated:

From #19 (closed because duplicate and split up in multiple sub issues):
Tonio:

The framework, however, would go beyond that. It would allow users to not only add their own data, but to write their own code which can be interpreted by the prior engine.

This functionality must be documented.

Documentation

@TonioF minor idea..or enhancement:
could you add the webhooks for ReadTheDocs in the prior-engine repo? I have auto-generated the documentation, but until the hooks at the repo are set, it does not update correctly..

Modify logging in prior engine

We have included a prior engine specific logging..however this may cause trouble in the auto-documentation process..see #40

I read that

Modules, that is, libraries intended to be used by other programs, should only emit log messages as a best practice. They should never configure how log messages are handled. That is the responsibility of the application that imports and uses the modules.

Is a MULTIPLY application-wide logger (possibly based in multiply-core) envisaged? And if so should we just instatiate loggers via

import logging

log = logging.getLogger(__name__)

def do_something():
    log.debug("Doing something!")

Logging / printing

As referenced in #21:

I did set up logging in the prior engine to avoid print statements and let us (later the user?) decide, how much (Level, e.g. debug, info, error, ..) information should be printed on screen and in a log file.

Maybe @TonioF can comment on this. What do you think of the approach of an file based logger configuration and per submodule loading of the prior_logger as class instance? (I actually did not really test it yet).

Introduce Prior Engine Framework

It shall be possible to add new prior creators without changing code within the prior engine package.

Tests missing

There are still many tests missing..

Module	statements	missing	coverage
Total	754	596	21%
multiply_prior_engine/init.py	2	1	50%
multiply_prior_engine/prior.py	50	17	66%
multiply_prior_engine/prior_engine.py	108	66	39%
multiply_prior_engine/soilmoisture_prior.py	162	119	27%
multiply_prior_engine/vegetation_prior.py	424	389	8%
prior_engine_runner.py	8	4	50%

Fix auto documentation

Sphinx auto-doc of modules is not working on ReadTheDocs:

WARNING: autodoc: failed to import module 'vegetation_prior_creator' from module 'multiply_prior_engine'; the following exception was raised:
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/conda/doc/lib/python3.6/site-packages/sphinx/ext/autodoc/importer.py", line 154, in import_module
    __import__(modname)
  File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/checkouts/doc/docs/prior-engine/multiply_prior_engine/__init__.py", line 27, in <module>
    PriorLogger(level='debug', handlers=['console'])
  File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/checkouts/doc/docs/prior-engine/multiply_prior_engine/prior_logger.py", line 50, in __init__
    config_dict['handlers']['file']['filename'] = filename
TypeError: 'Mock' object is not subscriptable

This fails for all prior-engine modules, not only for vegetation_prior_creator.

I understand that not all python modules are installed on readthedocs but many are 'mocked'/mimicked, hence config_dict as being a mocked object from config_dict = yaml.load(..) is not subscriptable.
Does this problem only occur because the PriorLogger is placed in the __init.py__ file and hence is loaded(/executed?) even for gathering the auto-doc?
@TonioF do you have experience with this?

Set up prior engine framework

Prior code shall be presented in the form of plug-ins: In this way, we avoid having to change the prior engine code everytime we introduce a new prior. It would facilitate the adding of new priors.

Interface prior-engine OUTPUT

How should the prior data be passed on to the inference-engine?

A secondary question to this is, if the MULTIPLY platform will have a central way/module of resampling input/intermediate(prior)/output data (see issue in multiply-core)

Passing on prior data as NetCDF files would be beneficial, as the resampling would be very easy. But, read/write operations should be limited.
@JorisTimmermans mentioned that there is an in-memory netcdf-file-container format which could be used.

What are your thoughts on this @jgomezdans and @TonioF?

Merging of multiple available priors for one variable

I think, this should be done in the prior_engine itself.
Also, I'm thinking of an OrderedDict of the prior config, to keep the order of entered priors.. is that suitable? does this work with YAML - imported config files?

add wgs84 check to soilmoisturepriorcreator

check and add changes from #39 to soilmoisturepriorcreator as well

documentation name

@JorisTimmermans
For consistency please change the documentation name for the repository to
MULTIPLY - Prior Engine

Import User Prior

Methods for importing user prior data have to be written.

So far, a standardized form of an csv file can be read (limitation should be in README..).

config file structure - explanation of ideas and discussion

@JorisTimmermans we should agree upon a config file structure. Discussion is now initialized 😃
I don't know about your experience with this, so I just start rambling.

Maybe you saw my sample config already.. to explain my layout ideas:

Prior section:
Section in global config file reserved for prior stuff
priors subsection:
all uncommented entries are used/provided to inference engine. this may be toggled on/off by the user.
e.g. sm_clim subsubsection:
holding information relevant to the specific module/method in prior engine.
here: type, filepath, ..

Prior:
  priors:
    sm_clim:
      type: climatology
      climatology_file: ../aux_data/CCI_SM_climatology_eur_merged_inv.nc
    # sm_recent:
    #       type: recent
    # sm_static:
    #       type: static
    # sm_dynamic:
    #     type: dynamic
    #     model:
    #         - API
    #         - other

    #? veg_pft:
    #?   type: pft
    #?   database: /aux_data/some_DB
    #? veg_spec:
    #?   type: species
    #?   database: /user_data/some_DB

The original list structure from the very-alpha-version of the config file is replaced by a more dictionary centered way. this would in my mind make it easier to access the right values rather than have explicit references to list numbers which should be avoided as @TonioF already was very correct about (#3).

@TonioF (also José?) could guide me (us?) here with a 'good practice' advice and feasibility assessment..

@McWhity also brought up a nice way to access the dictionary structure:

class AttributeDict(object):
    """
    A class to convert a nested Dictionary into an object with key-values
    accessibly using attribute notation (AttributeDict.attribute) instead of
    key notation (Dict["key"]). This class recursively sets Dicts to objects,
    allowing you to recurse down nested dicts (like: AttributeDict.attr.attr)
    """
    def __init__(self, **entries):
        self.add_entries(**entries)

    def add_entries(self, **entries):
        for key, value in entries.items():
            if type(value) is dict:
                self.__dict__[key] = AttributeDict(**value)
            else:
                self.__dict__[key] = value

    def __getitem__(self, key):
        """
        Provides dict-style access to attributes
        """
        return getattr(self, key)

Then load:

    def _get_config(self):
        """
        Load configuration from self.config.bb.pre_process()
           writes to self.config.
        """
        with open(self.config, 'r') as cfg:
            self.config = yaml.load(cfg)
            self.config = AttributeDict(**self.config)

and access in our case e.g.:

    self.config.Prior.priors.sm_clim.type

(also shorten and reassign subsubsection to single prior specific config (e.g. SoilmoisturePrior.config).

Should we use this approach as well? (Could be implemented module independent as well..)

What are your thoughts on this generally? still not that important? split the config files?

PS: would be nice if you could just wink at my lack of experience with config file input to python scripts, if its too obvious, and state the right way.

ROI description in config

At the moment the ROI in mentioned in the sample config as POLYGON ( (0, 0), (0,1 )) #blabla.
Is this a 'convention' I don't know and if so how is it meant to be loaded? with shapely module? gdal? so far this is not clear to me (for the moment parsing the string..)

--> How will the the coordinates finally be given? Has there been a decision yet?

Make use of default variable store in multiply_core

get default variables from multiply_core/variables/variables.py.
e.g. in soilmoistureprior creator:

try:
    defaultvariables = multiply_core.variables.get_defaul_variables()
except:
    defaultvariables = [sm, lai, cab, ...]

or so... just to avoid duplicity in the multiply code

Interface prior-engine: INPUT

@JorisTimmermans and I discussed the interface of the prior-engine and came to the conclusion that the module may/should get a list of variables as input from the inference-engine. These represent the 'parameters/variables' asked for by the user, to provide prior data accordingly.
The selection which priors are to be calculated/provided will then be switched from the config file to the actual needs of the inference-engine.

What are your thoughts on this, @TonioF and @jgomezdans ?

The information provided should consist of:

variables/parameters like LAI, soil moisture
timestep (day of interest, ..)

These could be provided as dictionary or single variables..depends on @jgomezdans .

Creating backup of config file

So far, this is implemented in write_config with shutil.copyfile. But, in my case it is not copying the file, nor is an exeption raised if '.bak' is added to the file name.. any ideas @TonioF ?

using '/tmp/test.test' as destination did work (instead of my home path: /home/thomas/Code/prior-engine/multiply_prior_engine)
using my copied home path tmp did not work (also if '-' replaced with '_')

Other options:

subprocess / os.system: should be avoided to retain os independence..

statement like:

    with open(source, 'r') as src, open(dest, 'w') as dst: dst.write(src.read())

Here's the code:

   def write_config(self, configuration, **kwargs):
       """Write configuration to a YAML file.

       :param configuration: configuration dictionary to write to file.
       :Keyword Arguments:
           * *path_to_config* (``str``) --
             path to config file. if None, a tempfile will be created.
           * *new_config_filename* (``str``) --
             Filename of new user config. Only has effect if path_to_config
             is specified.If None, a temporary filename will be used.

       :returns: config file name
       :rtype: string

       """
       path_to_config = kwargs.get('path_to_config', None)
       new_config_filename = kwargs.get('new_config_filename', None)

       if new_config_filename is not None and path_to_config is None:
           warnings.warn('Entered config file name ({}) will be omitted '
                         '--> no path specified!'
                         .format(new_config_filename), Warning)

       self.check_path_to_config_or_create(path_to_config)

       assert os.path.isfile(self.configfile)
       if self.configfile == default_config:
           if os.path.exists(default_config):
               src = os.path.abspath(default_config)
           dest = src + '.bak'
           logger.info('Creating {}.'.format(dest))
           print('Creating {}.'.format(dest))
           # TODO not writing file
           shutil.copyfile(src, dest)
       logger.info('User config file: {}'.format(self.configfile))

       with open(self.configfile, 'w') as cfg:
           cfg.write(yaml.dump(configuration, default_flow_style=False))
       return self.configfile

name of developers for MULTIPLY documentation

@JorisTimmermans @RaT0M
please add a file called "authors.rst" with your names (e.g. * Thomas weiß <"[email protected]">) in the main directory of this repository. I will link then this information to our main MULTIPLY documentation page

Set up initial version of prior engine

There shall be a first initial version of the prior engine. It shall implement the interfaces and return usable - not necessarily scientifically sound - result.

Interpolate sm climatology

interpolate sm climatology (at time of retrieval?) for smoother transition from month to month.

Make use of Data Access Component

Here is my idea how the Data Access Component could be integrated:
The Prior Engine will have its own Data Access Component which is decoupled from the rest of the system. It deals only with (a) the prior .vrt-files, (b) the .tiff- or other files which form the input to the .vrt-files, and (c) any required aux-data. For (a) and (b), a data store will be used which will also be available for the orchestrator. The Data Access Component will be configured to find the aux data that is required by the prior engine (or its prior creators, to be exact) . This data may not be not be available locally from the beginning, so it might need to be downloaded. All this data will be stored in the .multiply-folder in the user's home directory.
The workflow is this:
The Prior Engine will be asked for a prior file for a variable that covers some spatial and temporal extent. The Data Access Component will check whether a prior (.vrt-)file exists that meets these requirements. If so, it will be returned. If not (or if the user wishes to use his / her dedicated auxdata files), the prior engine will be triggered to compute such a prior file. After this has been done, the file (and, if necessary, files it references) will be permanently stored by the Data Access Component for future use.
This could be done by writing a wrapper script around the prior engine or even by integrating it into the prior engine module itself. An open question is how to design this so that the prior creators would get the aux data they need from the Data Access Component.

Clarify where to retrieve prior-creator-specific aux data from the Data Access Component
Extend Data Access Component by download functionality (this one is actually an issue for Data Access)
Put auxdata on remote server somewhere
Write wrapper script around prior engine

CLI user_prior.py

Status of CLI for user defined priors:

basic command (entry point in setup.py) is multiply_prior_engine.user_prior.main (could be run from prior_engine_runner.py in parent folder if ModuleNotFoundError).

There are 4 sub commands (like in in e.g. git --> git branch, git merge, ..):

show
add
remove
import

The commands are documented already.
Please check the CLI out, find help like in every cli with the -h-flag.
What do you think?

Implement config loadig via AttributeDict

Config file parameter loading with method AttributeDict may be implemented for more readable code.

Read config via:

    def _get_config(self):
        """
        Load configuration from self.config.bb.pre_process()
           writes to self.config.
        """
        with open(self.config, 'r') as cfg:
            self.config = yaml.load(cfg)
            self.config = AttributeDict(**self.config)

and access in our case e.g.:

    self.config.Prior.priors.sm_clim.type

README update

The README file should cover the CLI.

multiply-org / prior-engine Goto Github PK

prior-engine's People

Contributors

Watchers

Forkers

prior-engine's Issues

Recommend Projects

Recommend Topics

Recommend Org