Git Product home page Git Product logo

prior-engine's People

Contributors

bulli92 avatar dependabot[bot] avatar joristimmermans avatar toniof avatar tramsauer avatar

Watchers

 avatar  avatar  avatar  avatar

prior-engine's Issues

Document extensibility of prior engine through framework

The extensibility of the prior enginge with the abstraction allows users to write their own code to be evaluated from the prior engine, like tonio stated:

From #19 (closed because duplicate and split up in multiple sub issues):
Tonio:

The framework, however, would go beyond that. It would allow users to not only add their own data, but to write their own code which can be interpreted by the prior engine.

This functionality must be documented.

Documentation

@TonioF minor idea..or enhancement:
could you add the webhooks for ReadTheDocs in the prior-engine repo? I have auto-generated the documentation, but until the hooks at the repo are set, it does not update correctly..

Modify logging in prior engine

We have included a prior engine specific logging..however this may cause trouble in the auto-documentation process..see #40

I read that

Modules, that is, libraries intended to be used by other programs, should only emit log messages as a best practice. They should never configure how log messages are handled. That is the responsibility of the application that imports and uses the modules.

Is a MULTIPLY application-wide logger (possibly based in multiply-core) envisaged? And if so should we just instatiate loggers via

import logging

log = logging.getLogger(__name__)

def do_something():
    log.debug("Doing something!")

Logging / printing

As referenced in #21:

I did set up logging in the prior engine to avoid print statements and let us (later the user?) decide, how much (Level, e.g. debug, info, error, ..) information should be printed on screen and in a log file.

Maybe @TonioF can comment on this. What do you think of the approach of an file based logger configuration and per submodule loading of the prior_logger as class instance? (I actually did not really test it yet).

Tests missing

There are still many tests missing..

Module statements missing excluded coverage
Total 754 596 0 21%
multiply_prior_engine/init.py 2 1 0 50%
multiply_prior_engine/prior.py 50 17 0 66%
multiply_prior_engine/prior_engine.py 108 66 0 39%
multiply_prior_engine/soilmoisture_prior.py 162 119 0 27%
multiply_prior_engine/vegetation_prior.py 424 389 0 8%
prior_engine_runner.py 8 4 0 50%

Fix auto documentation

Sphinx auto-doc of modules is not working on ReadTheDocs:

WARNING: autodoc: failed to import module 'vegetation_prior_creator' from module 'multiply_prior_engine'; the following exception was raised:
Traceback (most recent call last):
  File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/conda/doc/lib/python3.6/site-packages/sphinx/ext/autodoc/importer.py", line 154, in import_module
    __import__(modname)
  File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/checkouts/doc/docs/prior-engine/multiply_prior_engine/__init__.py", line 27, in <module>
    PriorLogger(level='debug', handlers=['console'])
  File "/home/docs/checkouts/readthedocs.org/user_builds/multiply-core/checkouts/doc/docs/prior-engine/multiply_prior_engine/prior_logger.py", line 50, in __init__
    config_dict['handlers']['file']['filename'] = filename
TypeError: 'Mock' object is not subscriptable

This fails for all prior-engine modules, not only for vegetation_prior_creator.

I understand that not all python modules are installed on readthedocs but many are 'mocked'/mimicked, hence config_dict as being a mocked object from config_dict = yaml.load(..) is not subscriptable.
Does this problem only occur because the PriorLogger is placed in the __init.py__ file and hence is loaded(/executed?) even for gathering the auto-doc?
@TonioF do you have experience with this?

Set up prior engine framework

Prior code shall be presented in the form of plug-ins: In this way, we avoid having to change the prior engine code everytime we introduce a new prior. It would facilitate the adding of new priors.

Interface prior-engine OUTPUT

How should the prior data be passed on to the inference-engine?

A secondary question to this is, if the MULTIPLY platform will have a central way/module of resampling input/intermediate(prior)/output data (see issue in multiply-core)

Passing on prior data as NetCDF files would be beneficial, as the resampling would be very easy. But, read/write operations should be limited.
@JorisTimmermans mentioned that there is an in-memory netcdf-file-container format which could be used.

What are your thoughts on this @jgomezdans and @TonioF?

Merging of multiple available priors for one variable

I think, this should be done in the prior_engine itself.
Also, I'm thinking of an OrderedDict of the prior config, to keep the order of entered priors.. is that suitable? does this work with YAML - imported config files?

Import User Prior

Methods for importing user prior data have to be written.

So far, a standardized form of an csv file can be read (limitation should be in README..).

config file structure - explanation of ideas and discussion

@JorisTimmermans we should agree upon a config file structure. Discussion is now initialized 😃
I don't know about your experience with this, so I just start rambling.

Maybe you saw my sample config already.. to explain my layout ideas:

  • Prior section:
    Section in global config file reserved for prior stuff
  • priors subsection:
    all uncommented entries are used/provided to inference engine. this may be toggled on/off by the user.
  • e.g. sm_clim subsubsection:
    holding information relevant to the specific module/method in prior engine.
    here: type, filepath, ..
Prior:
  priors:
    sm_clim:
      type: climatology
      climatology_file: ../aux_data/CCI_SM_climatology_eur_merged_inv.nc
    # sm_recent:
    #       type: recent
    # sm_static:
    #       type: static
    # sm_dynamic:
    #     type: dynamic
    #     model:
    #         - API
    #         - other

    #? veg_pft:
    #?   type: pft
    #?   database: /aux_data/some_DB
    #? veg_spec:
    #?   type: species
    #?   database: /user_data/some_DB

The original list structure from the very-alpha-version of the config file is replaced by a more dictionary centered way. this would in my mind make it easier to access the right values rather than have explicit references to list numbers which should be avoided as @TonioF already was very correct about (#3).

@TonioF (also José?) could guide me (us?) here with a 'good practice' advice and feasibility assessment..

@McWhity also brought up a nice way to access the dictionary structure:

class AttributeDict(object):
    """
    A class to convert a nested Dictionary into an object with key-values
    accessibly using attribute notation (AttributeDict.attribute) instead of
    key notation (Dict["key"]). This class recursively sets Dicts to objects,
    allowing you to recurse down nested dicts (like: AttributeDict.attr.attr)
    """
    def __init__(self, **entries):
        self.add_entries(**entries)

    def add_entries(self, **entries):
        for key, value in entries.items():
            if type(value) is dict:
                self.__dict__[key] = AttributeDict(**value)
            else:
                self.__dict__[key] = value

    def __getitem__(self, key):
        """
        Provides dict-style access to attributes
        """
        return getattr(self, key)

Then load:

    def _get_config(self):
        """
        Load configuration from self.config.bb.pre_process()
           writes to self.config.
        """
        with open(self.config, 'r') as cfg:
            self.config = yaml.load(cfg)
            self.config = AttributeDict(**self.config)

and access in our case e.g.:

    self.config.Prior.priors.sm_clim.type

(also shorten and reassign subsubsection to single prior specific config (e.g. SoilmoisturePrior.config).

Should we use this approach as well? (Could be implemented module independent as well..)

What are your thoughts on this generally? still not that important? split the config files?

PS: would be nice if you could just wink at my lack of experience with config file input to python scripts, if its too obvious, and state the right way.

ROI description in config

At the moment the ROI in mentioned in the sample config as POLYGON ( (0, 0), (0,1 )) #blabla.
Is this a 'convention' I don't know and if so how is it meant to be loaded? with shapely module? gdal? so far this is not clear to me (for the moment parsing the string..)

--> How will the the coordinates finally be given? Has there been a decision yet?

Interface prior-engine: INPUT

@JorisTimmermans and I discussed the interface of the prior-engine and came to the conclusion that the module may/should get a list of variables as input from the inference-engine. These represent the 'parameters/variables' asked for by the user, to provide prior data accordingly.
The selection which priors are to be calculated/provided will then be switched from the config file to the actual needs of the inference-engine.

What are your thoughts on this, @TonioF and @jgomezdans ?

The information provided should consist of:

  • variables/parameters like LAI, soil moisture
  • timestep (day of interest, ..)

These could be provided as dictionary or single variables..depends on @jgomezdans .

Creating backup of config file

So far, this is implemented in write_config with shutil.copyfile. But, in my case it is not copying the file, nor is an exeption raised if '.bak' is added to the file name.. any ideas @TonioF ?

  • using '/tmp/test.test' as destination did work (instead of my home path: /home/thomas/Code/prior-engine/multiply_prior_engine)
  • using my copied home path tmp did not work (also if '-' replaced with '_')

Other options:

  • subprocess / os.system: should be avoided to retain os independence..
  • statement like:
        with open(source, 'r') as src, open(dest, 'w') as dst: dst.write(src.read()) 

Here's the code:

   def write_config(self, configuration, **kwargs):
       """Write configuration to a YAML file.

       :param configuration: configuration dictionary to write to file.
       :Keyword Arguments:
           * *path_to_config* (``str``) --
             path to config file. if None, a tempfile will be created.
           * *new_config_filename* (``str``) --
             Filename of new user config. Only has effect if path_to_config
             is specified.If None, a temporary filename will be used.

       :returns: config file name
       :rtype: string

       """
       path_to_config = kwargs.get('path_to_config', None)
       new_config_filename = kwargs.get('new_config_filename', None)

       if new_config_filename is not None and path_to_config is None:
           warnings.warn('Entered config file name ({}) will be omitted '
                         '--> no path specified!'
                         .format(new_config_filename), Warning)

       self.check_path_to_config_or_create(path_to_config)

       assert os.path.isfile(self.configfile)
       if self.configfile == default_config:
           if os.path.exists(default_config):
               src = os.path.abspath(default_config)
           dest = src + '.bak'
           logger.info('Creating {}.'.format(dest))
           print('Creating {}.'.format(dest))
           # TODO not writing file
           shutil.copyfile(src, dest)
       logger.info('User config file: {}'.format(self.configfile))

       with open(self.configfile, 'w') as cfg:
           cfg.write(yaml.dump(configuration, default_flow_style=False))
       return self.configfile

Set up initial version of prior engine

There shall be a first initial version of the prior engine. It shall implement the interfaces and return usable - not necessarily scientifically sound - result.

Make use of Data Access Component

Here is my idea how the Data Access Component could be integrated:
The Prior Engine will have its own Data Access Component which is decoupled from the rest of the system. It deals only with (a) the prior .vrt-files, (b) the .tiff- or other files which form the input to the .vrt-files, and (c) any required aux-data. For (a) and (b), a data store will be used which will also be available for the orchestrator. The Data Access Component will be configured to find the aux data that is required by the prior engine (or its prior creators, to be exact) . This data may not be not be available locally from the beginning, so it might need to be downloaded. All this data will be stored in the .multiply-folder in the user's home directory.
The workflow is this:
The Prior Engine will be asked for a prior file for a variable that covers some spatial and temporal extent. The Data Access Component will check whether a prior (.vrt-)file exists that meets these requirements. If so, it will be returned. If not (or if the user wishes to use his / her dedicated auxdata files), the prior engine will be triggered to compute such a prior file. After this has been done, the file (and, if necessary, files it references) will be permanently stored by the Data Access Component for future use.
This could be done by writing a wrapper script around the prior engine or even by integrating it into the prior engine module itself. An open question is how to design this so that the prior creators would get the aux data they need from the Data Access Component.

  • Clarify where to retrieve prior-creator-specific aux data from the Data Access Component
  • Extend Data Access Component by download functionality (this one is actually an issue for Data Access)
  • Put auxdata on remote server somewhere
  • Write wrapper script around prior engine

CLI user_prior.py

Status of CLI for user defined priors:

basic command (entry point in setup.py) is multiply_prior_engine.user_prior.main (could be run from prior_engine_runner.py in parent folder if ModuleNotFoundError).

There are 4 sub commands (like in in e.g. git --> git branch, git merge, ..):

  • show
  • add
  • remove
  • import

The commands are documented already.
Please check the CLI out, find help like in every cli with the -h-flag.
What do you think?

Implement config loadig via AttributeDict

Config file parameter loading with method AttributeDict may be implemented for more readable code.

Read config via:

    def _get_config(self):
        """
        Load configuration from self.config.bb.pre_process()
           writes to self.config.
        """
        with open(self.config, 'r') as cfg:
            self.config = yaml.load(cfg)
            self.config = AttributeDict(**self.config)

and access in our case e.g.:

    self.config.Prior.priors.sm_clim.type

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.