hydrologie / xhydro Goto Github PK
View Code? Open in Web Editor NEWHydrological analysis library built with xarray.
Home Page: https://xhydro.readthedocs.io
License: Apache License 2.0
Hydrological analysis library built with xarray.
Home Page: https://xhydro.readthedocs.io
License: Apache License 2.0
Une documentation unilingue anglophone n'est pas souhaitable
Voici une page qui décrit comment mettre en place aisément cette solution
https://ihenrywu.medium.com/how-to-create-a-readthedocs-online-document-in-multiple-languages-dbdc1e67068d
No response
While netCDF/zarr and CF standards are widely used for storing and exchanging n-dimensional arrays in the realm of climate sciences, there is presently no comparable standard or specification in place for n-d array hydrometric data (WaterML exists but it consist of XML files and still require lots of processing to use with modern python stack).
Furthermore, as we report to diverse organizations, each entity already has its own unique methods for organizing and sharing hydrometric data (ex: miranda for Ouranos).
To foster collaboration, facilitate development, enable rigorous testing with real data, and enhance reproducibility of studies conducted through Xhydro, substantial benefits can be gained by standardizing hydrometric data and ensuring its universal accessibility on the internet through open-source means wherever feasible.
More specifically, this would involve:
While it may appear as a significant undertaking, I have already dedicated several months to implementing a solution, drawing upon the advancements achieved in PAVICS/PAVICS-Hydro. I am excited to present what I have so far and seek valuable feedback from experts in the field.
Here is a simplified overview of the solution currently being developed, which follows a similar approach to accessing large-scale climate data as described in this GitHub issue:
Here is an example for an actual study that we are working on right now. The requirements are:
This can be achieved simply with the following query by leveraging xdatasets's capabilities :
Below is the list of retrieved data that can be easily viewed :
The hydrometric data specification presented above is the result of extensive deliberation and collaboration with @TC-FF drawing from our real-world experience of utilizing this kind of data.. Through this process, we have determined that this format enables the representation of a wide range of hydrometric data types (flow rates, water levels, basin-scale or station-specific weather data), at various time intervals, with different temporal aggregations (maximum, minimum, mean, sum, etc.), spatial aggregations (such as a point (outlet or station) or polygon (basin)), and includes information about the data source. We are seeking valuable feedback on the proposed data specification for representing hydrometric datasets, including suggestions for improved variable naming, adherence to conventions, and potential modifications to the data model itself. This can include for example adding timezones info, time bounds, etc. Your input on these aspects would be greatly appreciated.
Also note that, we intend to have approximately 20 000 daily-updated gauged basins in xdatasets with precomputed climate variables at each basin (temperatures, precipitation, radiation, dew point, SWE, etc.) from different sources (ERA5, ERA5-Land, Daymet, etc.) by the end of July. To retrieve the additional variables, one will simply need to include them in the query. The majority of basins are located in North America, with additional regions worldwide utilized for training deep learning algorithms. For this, we build upon the work already accomplished in HYSETS and CARAVAN, but our focus is on making it operational and easily queryable.
There is much more details to be said regarding the various components of the presented solution. Additionally, xdatasets offers a broader range of capabilities (such as working with climate datasets such as ERA5 directly) than the simple example presented here, with even more ambitious plans in the roadmap. However, considering the length of this post, I will conclude here to let you absorb all the details. If you have any questions or suggestions, please don't hesitate to reach out.
Il y a plusieurs petits détails qui ne sont pas bons ou pas assez précis dans nos instructions d'installation et dans "Contributing", dont l'utilisation de conda
(alors que mamba
serait préférable) et à quel moment (et pour quels besoins) faire pip install -e .
vs pip install xhydro
.
No response
No response
environment.yml
currently has many dependencies that should either be removed or moved to environment-docs
(such as the sphinx
ones). environment.yml
should be kept as lean as possible, then populated as we add functions.
We also have 5 files where dependencies are listed. I think that we can get rid of a few of them.
Original conversation in #11 (comment)
Sur nos clusters de calcul, l'installation via conda est impossible (et nous ne sommes potentiellement pas les seuls)
xHydro est donc actuellement impossible à utiliser à cause de EMSPY. Est-ce que ESMPY pourrait être un wheel ? Ou est-ce que ça pourrait être sorti du init et que les fonctions qui nécessitent ESMPY soit dans un seul module ?
No response
Actuellement, la modélisation hydrologique est gérée par une fonction qsim = run_hydrological_model()
qui utilise un dictionnaire model_config
en entrée et retourne des débits. Cela fonctionne correctement pour un modèle simple à la GR4J, mais devient très compliqué très vite pour un modèle plus complexe à la Hydrotel ou Raven, où une bonne partie des paramètres et des fonctionnalités se cachent dans des fichiers de configuration.
La localisation et le nom des fichiers pertinents (météo, sorties) dépend d'informations qui se trouvent à travers quelques fichiers CSV.
Peu importe notre décision, dans le cas d'Hydrotel, model_config
devra contenir des paramètres comme simulation_options
ou output_options
pour permettre de consulter et modifier les fichiers CSV.
Solution 1: Continuer avec 'approche par dictionnaire
La liste de fonctions actuelle n'est pas suffisante. On aura absolument besoin de coder des fonctions supplémentaires.
qsim = run_hydrological_model(model_config, return_outputs=True)
pour exécuter le modèle et retourner des débits.ds_in = get_inputs(model_config)
pour retrouver les bons fichiers d'entrée.qsim = get_streamflow(model_config)
pour retrouver le bon fichier et retourner des débits, après que le modèle ait été exécuté.Bref, model_config
est toujours un intrant nécessaire. Un enjeu est qu'il pourrait être difficile d'avoir une fonction unique, puisque certains modèles pourraient demander des arguments supplémentaire.
Solution 2: Implémenter une classe avec une liste prédéfinie de fonctions
Je ne vois pas d'enjeu à garder l'approche model_config
ici. Une fois le modèle initialisé, on pourrait avoir une liste de sous-fonctions similaires à la Solution 1:
model = HydrologicalModel(model="Hydrotel", model_config)
qsim = model.run(return_outputs=True)
pour exécuter le modèle et retourner des débits.ds_in = model.get_inputs()
pour retrouver les bons fichiers d'entrée.qsim = model.get_streamflow()
pour retrouver le bon fichier et retourner des débits, après que le modèle ait été exécuté.Ici, model_config
est seulement utilisé une fois, car ses attributs sont ajoutés à la classe pendant un __init__()
. Ça simplifie potentiellement les appels aux autres fonctions. Cela ouvre aussi la porte à plus facilement avoir une liste de paramètres qui diffère d'un modèle hydrologique à un autre pour les fonctions comme .get_inputs()
, mais on voudra probablement éviter cela au maximum pour ne pas ajouter trop de complexité...
Links following the format (:pull:number)
are used throughout a few files such as HISTORY.rst
to link to previous PRs, Issues, and Users. However, the proper sphinx
hooks need to be implemented for them to work.
No response
No response
Avec le setup actuel (xhydro 0.3.0, git clone master puis "pip install -e ."), on a pydantic 2.5.2 qui s'installe avec xarray 2023.10.1.
Cependant, ravenpy a besoin du package pydantic<2.0, >=1.10.8. et de xarray<2023.9.9, >=2022.12.0.
Quand j'essaie d'installer ravenpy, import ravenpy retourne juste un paquet d'erreurs de pydantic et d'erreurs de TypeError @validator, cannot be applied to fields with a schema of str.
Si on peut s'arranger pour avoir une certaine continuité, on pourra mieux gérer ces packages. Je ne sais pas comment régler ce genre de problème alors je m'en remets aux experts!
We are currently adding the notebooks at the root level of our documentation vs. under a section such as Usage in sphinx. To keep things more clean and prepare for the addition of other notebooks, could we bring the notebooks under the Usage section ?
xscen
possède déjà plusieurs fonctionnalités qui pourraient être réutilisées directement dans xhydro
, notamment pour le calcul d'indicateurs (excluant les indicateurs fréquentiels plus avancés) et pour les fonctions nécessaires aux analyses hydroclimatiques (climatological_mean
, compute_deltas
, ensemble_stats
, generate_weights
, produce_horizon
).
Plutôt que de copier-coller le code, je propose d'importer les fonctions xscen
pertinentes et de les passer au __init__
, ce qui signifie qu'on peut par exemple directement utiliser xhydro.ensemble_stats
, sans avoir à se soucier du fait que le code se trouve dans une autre librairie. Il existe également la mécanique pour transposer la documentation vers le ReadTheDocs.
J'ai créé une branche, pour que vous puissiez voir à quoi ça pourrait ressembler: https://github.com/hydrologie/xhydro/tree/indicators/xhydro
On demande les périodes de retours et ce qui est affiché dans la dimension return-period sont des fréquences
No response
No response
J'ai créé un schéma de ce que l'on cherche à accomplir avec xhydro. Ce schéma devrait être inclus quelque part ici.
Une page dans la documentation qui est mise à jour avant chaque nouveau release.
No response
J'obtiens une erreur à la ligne:
ds = xd.Query(
*{
"datasets":{
"deh":{
"id" :["020"],
"regulated":["Natural"],
"variables":["streamflow"],
}
}, "time":{"start": "1970-01-01",
"minimum_duration":(15*365, 'd')},
}
).data.squeeze().load()
L'erreur:
ValidationError Traceback (most recent call last)
Cell In[2], line 1
----> 1 ds = xd.Query(
2 *{
3 "datasets":{
4 "deh":{
5 "id" :["020"],
6 "regulated":["Natural"],
7 "variables":["streamflow"],
8 }
9 }, "time":{"start": "1970-01-01",
10 "minimum_duration":(15*365, 'd')},
11
12 }
13 ).data.squeeze().load()
15 # This dataset lacks some of the aforementioned attributes, so we need to add them.
16 ds["id"].attrs["cf_role"] = "timeseries_id"
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\xdatasets\core.py:122, in Query.init(self, datasets, space, time, catalog_path)
119 self.space = self._resolve_space_params(**space)
120 self.time = self._resolve_time_params(**time)
--> 122 self.load_query(datasets=self.datasets, space=self.space, time=self.time)
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\xdatasets\core.py:256, in Query.load_query(self, datasets, space, time)
253 except:
254 pass
--> 256 ds_one = self._process_one_dataset(
257 dataset_name=dataset_name,
258 variables=variables_name,
259 space=space,
260 time=time,
261 **kwargs,
262 )
263 dsets.append(ds_one)
265 try:
266 # Try naively merging datasets into single dataset
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\xdatasets\core.py:299, in Query._process_one_dataset(self, dataset_name, variables, space, time, **kwargs)
296 dataset_category = "user-provided"
298 elif isinstance(dataset_name, str):
--> 299 dataset_category = [
300 category
301 for category in self.catalog._entries.keys()
302 for name in self.catalog[category]._entries.keys()
303 if name == dataset_name
304 ][0]
306 if dataset_category in ["atmosphere"]:
307 with warnings.catch_warnings():
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\xdatasets\core.py:302, in (.0)
296 dataset_category = "user-provided"
298 elif isinstance(dataset_name, str):
299 dataset_category = [
300 category
301 for category in self.catalog._entries.keys()
--> 302 for name in self.catalog[category]._entries.keys()
303 if name == dataset_name
304 ][0]
306 if dataset_category in ["atmosphere"]:
307 with warnings.catch_warnings():
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\base.py:472, in Catalog.getitem(self, key)
463 """Return a catalog entry by name.
464
465 Can also use attribute syntax, like cat.entry_name
, or
(...)
468 cat['name1', 'name2']
469 """
470 if not isinstance(key, list) and key in self:
471 # triggers reload_on_change
--> 472 s = self._get_entry(key)
473 if s.container == "catalog":
474 s.name = key
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\utils.py:43, in reload_on_change..wrapper(self, *args, **kwargs)
40 @functools.wraps(f)
41 def wrapper(self, *args, **kwargs):
42 self.reload()
---> 43 return f(self, *args, **kwargs)
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\base.py:355, in Catalog._get_entry(self, name)
353 ups = [up for name, up in self.user_parameters.items() if name not in up_names]
354 entry._user_parameters = ups + (entry._user_parameters or [])
--> 355 return entry()
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\entry.py:60, in CatalogEntry.call(self, persist, **kwargs)
58 def call(self, persist=None, **kwargs):
59 """Instantiate DataSource with given user arguments"""
---> 60 s = self.get(**kwargs)
61 s._entry = self
62 s._passed_kwargs = list(kwargs)
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\local.py:313, in LocalCatalogEntry.get(self, **user_parameters)
310 return self._default_source
312 plugin, open_args = self._create_open_args(user_parameters)
--> 313 data_source = plugin(**open_args)
314 data_source.catalog_object = self._catalog
315 data_source.name = self.name
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\local.py:613, in YAMLFileCatalog.init(self, path, text, autoreload, **kwargs)
611 self.filesystem = kwargs.pop("fs", None)
612 self.access = "name" not in kwargs
--> 613 super(YAMLFileCatalog, self).init(**kwargs)
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\base.py:128, in Catalog.init(self, entries, name, description, metadata, ttl, getenv, getshell, persist_mode, storage_options, user_parameters)
126 self.updated = time.time()
127 self._entries = entries if entries is not None else self._make_entries_container()
--> 128 self.force_reload()
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\base.py:186, in Catalog.force_reload(self)
184 """Imperative reload data now"""
185 self.updated = time.time()
--> 186 self._load()
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\local.py:648, in YAMLFileCatalog._load(self, reload)
646 logger.warning("Use of '!template' deprecated - fixing")
647 text = text.replace("!template ", "")
--> 648 self.parse(text)
File ~\Anaconda3\envs\xhydro-dev\Lib\site-packages\intake\catalog\local.py:728, in YAMLFileCatalog.parse(self, text)
726 result = CatalogParser(data, context=context, getenv=self.getenv, getshell=self.getshell)
727 if result.errors:
--> 728 raise exceptions.ValidationError(
729 "Catalog '{}' has validation errors:\n\n{}"
730 "".format(self.path, "\n".join(result.errors)),
731 result.errors,
732 )
734 cfg = result.data
736 self._entries = {}
ValidationError: Catalog 'C:/Users/maied01/AppData/Local/Temp/catalogs//hydrology.yaml' has validation errors:
("missing 'module'", {'module': 'intake_xarray'})
No response
J'ai créé un environnement conda en suivant les étapes de la procédure et j'ai lancé un Jupyter Notebook en utilisant Anaconda Navigator.
J'extrait une station (j'ai testé 023422 et 090605), je calcul un indicateur (j'ai testé min et max) et les valeurs affichées par
ds_4fa.streamflow_min_annual et ds_4fa.streamflow_min_annual.values sont différentes
ds = xd.Query(
**{
"datasets":{
"deh":{
"id" :["023422"],
"variables":["streamflow"],'spatial_agg':['watershed']
}
},
}
).data.squeeze().load()
ds["id"].attrs["cf_role"] = "timeseries_id"
ds["streamflow"].attrs = {"long_name": "Streamflow", "units": "m3 s-1", "standard_name": "water_volume_transport_in_river_channel", "cell_methods": "time: mean"}
ds_4fa = xh.indicators.get_yearly_op(ds, op="min", missing="pct", missing_options={"tolerance": 0.15})
ds_4fa.streamflow_min_annual
ds_4fa.streamflow_min_annual.values
No response
We should have some documentation under CONTRIBUTING.rst
(or on its own page) that explains how to generate and edit the .po
files needed for the French translations.
I can help with the more generalized steps (i.e.: project creation and generation of the initial .po
files), but it would be great if @TC-FF could briefly document how best to use the poedit
tool.
https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html
https://poedit.net/
https://userbase.kde.org/Lokalize (KDE-based application**)
Hydrology uses geospatial operations, encompassing tasks such as watershed delineation and extraction of physiographic variables at the watershed scale. PAVICS-Hydro has implemented various functionalities in ravenpy to execute these operations.
It would be interesting to integrate some of these features into xhydro by leveraging the work done in ravenpy while also adding some new functionalities.
The solution would include the functionalities included in ravenpy plus the followings :
Watershed Delineation
Physiographic Variable (or others) Extraction
No response
Pour les tests et la documentation dans xhydro, on va avoir besoin de déposer certaines données hydrologiques simulées quelque part. Pour l’instant, on envisage devoir avoir minimalement :
Une première option serait de mettre les données dans un repo existant, raven-testdata ou xclim-testdata, mais le match ne semble pas idéal pour ni l’un ni l’autre.
Une deuxième option serait de nous créer un repo xhydro-testdata
, soit ici, soit sur Ouranosinc.
Finalement, une troisième option serait de déposer ces données sur xdatasets
. Toutefois, je ne crois pas que ça soit dans le scope de xdatasets
d'avoir de petits jeux de données pour faire des tests ? Vu leur taille, je ne crois pas que ce soit réaliste non plus de déposer les données complètes. @sebastienlanglois
Edit: Sauf avis contraire, déposer ces données directement dans xhydro
serait mal avisé.
The current documentation is hosted on Github Pages, but ReadTheDocs would be preferable.
Original conversation in #11 (comment)
Trying to install xhydro with pip on windows with pip install xhydro
results in an error when building the wheel for raven-hydro:
Building wheels for collected packages: raven-hydro
Building wheel for raven-hydro (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for raven-hydro (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [38 lines of output]
*** scikit-build-core 0.9.8 using CMake 3.29.6 (wheel)
*** Configuring CMake...
2024-07-02 14:41:20,456 - scikit_build_core - WARNING - Can't find a Python library, got libdir=None, ldlibrary=None, multiarch=None, masd=None
loading initial cache file build\CMakeInit.txt
-- Building for: Visual Studio 17 2022
-- CMAKE_BUILD_TYPE set to ''
CMake Warning (dev) in CMakeLists.txt:
A logical block opening on the line
C:/Users/KAMIL PC/AppData/Local/Temp/pip-install-_jrlkrk1/raven-hydro_3e654a4004f8426d80ea2d67005ff33e/CMakeLists.txt:33 (IF)
closes on the line
C:/Users/KAMIL PC/AppData/Local/Temp/pip-install-_jrlkrk1/raven-hydro_3e654a4004f8426d80ea2d67005ff33e/CMakeLists.txt:36 (ENDIF)
with mis-matching arguments.
This warning is for project developers. Use -Wno-dev to suppress it.
-- Selecting Windows SDK version 10.0.22621.0 to target Windows 10.0.19045.
-- The CXX compiler identification is MSVC 19.39.33522.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2022/BuildTools/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Sources found: C:/Users/KAMIL PC/AppData/Local/Temp/pip-install-_jrlkrk1/raven-hydro_3e654a4004f8426d80ea2d67005ff33e/RavenHydroFramework
-- Modified compile flags with '-Dnetcdf'
CMake Error at C:/Users/KAMIL PC/AppData/Local/Temp/pip-build-env-n7rk5gyy/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find NetCDF (missing: NetCDF_LIBRARY NetCDF_INCLUDE_DIR)
Call Stack (most recent call first):
C:/Users/KAMIL PC/AppData/Local/Temp/pip-build-env-n7rk5gyy/normal/Lib/site-packages/cmake/data/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
helpers/FindNetCDF.cmake:194 (find_package_handle_standard_args)
CMakeLists.txt:73 (find_package)
-- Configuring incomplete, errors occurred!
*** CMake configuration failed
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for raven-hydro
Failed to build raven-hydro
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (raven-hydro)
sebastienlanglois told me the problem is fixable on linux by running
sudo apt-get install gcc libnetcdf-dev gdal proj geos
,
but that there is currently no solution for windows.
pip install xhydro
Note that i ran pip install xhydro
in the cmd with python version 3.12.3, so the problem is not related to any environment configuration
Pydantic have just made a new release today (2023-12-22) and since then, we are getting errors coming from xscen :
Installing xscen or xhydro in a new environment with pydantic's newest release (v2.5.3).
Ça pourrait être bien d'écrire dans le README sur la page d'accueil du repo (https://github.com/hydrologie/xhydro/blob/main/README.rst). Pas besoin d'être très complet, il servirait simplement à accueillir les nouveaux contributeurs en décrivant les différentes sections du repo. On pourrait également y mettre un lien vers un document qui explique le projet dans le détail.
No response
No response
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.