ecmwf / anemoi-models Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Allow optional training with optimisation towards tendency as opposed to state
Adapt logic to allow the objective to be training towards tendencies
NA
Has been shown to improve results with a diffusion model targeting atmosphere - this will be useful for investigating whether or not improvements extend to other models and other domains such as ocean and wave.
Besides implement logic key things to consider:
ECMWF
Limited Area Modelling (LAM) is a use case that reuses much of the functionality of global weather models, such as the model architectures.
It would be interesting to extend the capabilities of anemoi-models to support LAM.
The main difference with respect to the current use case is that some input nodes are not part of the output state. These nodes are the boundary forcing.
Define an output mask with the data nodes part of the output state. This output mask can be defined based on:
No response
This supposes that the LAM data and the boundary forcing are passed together.
ECMWF
Currently, the MultiHeadSelfAttention module has a fixed dropout rate of 0.00, which limits the ability to tune this hyperparameter for different use cases. This lack of configurability can hinder model optimization and performance, especially in scenarios where overfitting may occur due to smaller datasets.
I would like to see the addition of a configurable dropout parameter to the MultiHeadSelfAttention module. This parameter should allow users to specify the dropout rate when initialising the module, enabling better customisation and optimization of the model.
No response
No response
No response
When using the GNN architecture with a graph object without connections within hidden nodes, we get an error like the following, which is not representative of what happens:
Error executing job with overrides: []
Error in call to target 'anemoi.models.layers.processor.GNNProcessor':
KeyError('edge_length')
Raise a more meaningful error if the subgraph is not correct.
No response
No response
ECMWF
Specific problem: We want the model to treat a variable x in degrees in range [0,360] as cos(x) and sin(x) inside the model and map the output back to the representation in degree.
Preprocessor:
The model should handle the remapped variables, and the training loss must be calculated on the the remapped set of variables. The validation loss should be calculated on the original variable, which will also be output in inference. Therefore, this remapping should be implemented as a preprocessor that allows the remapping of one variable to several new ones.
Data Indices: The preprocessors are the first layers of the model, so the input variables are the same.
After the preprocessors, the set of variables has changed. Therefore, this information needs to be included in the data indices.
Changing the set of variables when creating the dataset is not an option since the model output needs to contain the variables we are interested in for inference.
No response
ECMWF
The output of some specific variables, total precipitation for example, is not bounded. Therefore, the model sometimes outputs negative values, which is not physical.
Adding a bounding strategy to the model via the usage of activation functions.
No response
No response
ECMWF
Maintainability and code quality are important parts to keep anemoi alive and well and avoid technical debt.
Pre-commit hooks are an easy way to automatically flag and possibly fix these issues, but above this we can run pre-commit in Github Actions to automatically check code compliance across many definitions.
We already implement many pre-commit hooks, but from experience I would propose the following hooks in addition:
pygrep-hooks
to enforce type annotations, check for blanket noqa's (they should be specific to what they exempt) and check for log.warns
- repo: https://github.com/pre-commit/pygrep-hooks
rev: v1.10.0 # Use the ref you want to point at
hooks:
- id: python-use-type-annotations # Check for missing type annotations
- id: python-check-blanket-noqa # Check for # noqa: all
- id: python-no-log-warn # Check for log.warn
docsig
to check docstrings against function signature- repo: https://github.com/jshwi/docsig # Check docstrings against function sig
rev: v0.44.2
hooks:
- id: docsig
args:
- --ignore-no-params # Allow docstrings without parameters
- --check-dunders # Check dunder methods
- --check-overridden # Check overridden methods
- --check-protected # Check protected methods
- --check-class # Check class docstrings
- --disable=E113 # Disable empty docstrings
- --summary # Print a summary
There is no formal check that docstrings actually represent the function they describe. The additional settings make the docstring checking only necessary, when parameters are set and doesn't fail when no docstring is set, as a fairly lenient implementation. (This can be a point of discussion, whether we want to force docstrings and parameter descriptions.)
ruff
ruleset to ALL
, which expands the checking to a wide variety of possible problems.By default ruff
includes ruleset E
and F
which are pydocstyle errors and flake8 errors.
There are many more rulesets that improve the overall code quality and should be considered. We could also enable those explicitly, or simply rely on ALL
to include the full expansion of community accepted best practices.
From experimentation I had also disabled some specific rulesets:
"E203", whitespace before punctuation
"D100", missing docstring in public module
"D101", missing docstring in public class
"D102", missing docstring in public method
"D103", missing docstring in public function
"D104", missing docstring in public package
"D105", missing docstring in magic method
"D401", First line of docstring written in imperative mood
"S101", Use of assert (asserts are usually good in our case)
"PT018", Composite pytest asserts
These could be discussed, as empty docstrings may actually not be wanted in the framework.
Alternatives would be to enable each package we want individually:
Currently in AIFS we are working with these:
select = [
"A", # flake8 builtins
"B", # flake8-bugbear
"D", # pydocstyle
"E", # pycodestyle error
"W", # pycodestyle warning
"F", # flake8 error
"UP", # Pyupgrade
"SIM", # flake8-simplify
"N", # pep8 naming
"YTT", # flake8-2020
"S", # bandit
"COM", # Commas
"C4", # Comprehensions
"DTZ", # Datetimes
"ISC", # Implicit string concatenation
"ICN", # Import conventions
"LOG", # Logging
"PIE", # Misc lints
"T20", # Print statements
"PT", # Pytest
"Q", # Quotes
"RSE", # Raises
"TID", # Tidy imports
"PTH", # Use Pathlib
"PGH", # Pygrep hooks
"R", # Refactor
"FLY", # Fstrings
"PERF", # Perfomance linting
"FURB", # Modernising code
"RUF", # Ruff specific
"NPY", # Numpys
# "PL", # Pylint
# "TD", # Todos
# "FBT", # Boolean traps
# "CPY", # Copyright
]
Only disabling four, which would still be useful for a framework. The boolean traps ruleset might make coding harder however, and might necessitate some refactoring of parts of the code.
Alternatively we can introduce pre-commit hooks these rulesets implement:
bandit
for common code vulnerabilities (implemented in ruleset S
):- repo: https://github.com/pycqa/bandit # Check code for common security issues
rev: 1.7.7
hooks:
- id: bandit
docformatter
to enforce consistent formatting of docstrings (ruleset from pydocstyle D
, E
, W
):- repo: https://github.com/PyCQA/docformatter # Format docstrings
rev: v1.7.5
hooks:
- id: docformatter
args:
- -s numpy
- --black
- --in-place
pyupgrade
to automatically upgrade Python syntax from older patterns (e.g. upgrading from percent style formatting '%s %s' % (a, b)
to '{} {}'.format(a, b)
)- repo: https://github.com/asottile/pyupgrade # Upgrade Python syntax
rev: v3.15.1
hooks:
- id: pyupgrade
args:
- --py38-plus
Bandit or ruleset S
especially may expose vulnerabilities in code, which makes our life as maintainers a little easier.
ECMWF
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.