<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

JOSS review: Automated tests about xinvert HOT 12 CLOSED

DamienIrving commented on June 8, 2024

JOSS review: Automated tests

from xinvert.

Comments (12)

miniufo commented on June 8, 2024 1

Hi, @DamienIrving thanks for your careful check on this. You are right that one could find some analytical results to do the tests. But here I adopted an indirect way here. Basically, we are not checking the streamfunction itself, but its gradient (or velocity components).

According to Helmholtz decomposition theory, a vector field, say, $\mathbf u=(u, v)$, can be decomposed into rotational and divergent components $\mathbf u = \mathbf u_r + \mathbf u_d$. The rotational component fully recovers the vorticity of the original $\mathbf u$ and has exactly zero-divergence, whereas the divergent component fully recovers the divergence of $\mathbf u$ and has exactly zero-vorticity. So we can theck this to see if the decomposition (i.e., Poisson inverion) is right.

That is why I put asserts to see if the divergence of $\mathbf u_r$ is zero, and if the vorticity of $\mathbf u_d$ is zero, which are expected results. The slicing from 1 to -1 is because the finite difference at the boundaries (north-south poles) could result larger errors above the tolerance so it is better to check the interior. But zonal is periodic so not a problem here.

One of the potential problem with analytical expression is that, we use finite difference scheme here. If we use larger grid size, there would be grid-scale truncation error relative to analytical results, which may be above the tolerance. But the current asserts do not depend on grid size, because the inversion for streamfunction, as well we the calculation of its gradient or velocity components, all use the consistent finite difference scheme, which turns out to be quite good that the expected $\nabla\cdot\mathbf u_r=0$ is achieved with default numpy tolerance.

Also, there will be an arbitary constant to Poisson solution. So any streamfunction, after adding a constant, is still a solution to Poisson equation. I guess this could be a problem with windspharm to determine the arbitary constant. That is why I do really prefer the indirect way of assertion here, as it avoid the arbitary constant.

from xinvert.

miniufo commented on June 8, 2024

You are right, I know this concept, probably using assert keyword in the program, but not every detail. Could you please send me some concrete examples that I can follow?

Since it is a numerical solver, I cannot expect the repeated runs of the codes reproduce the same results. I only visualize the plots to see if they are close to those plots in the related papers.

from xinvert.

DamienIrving commented on June 8, 2024

This chapter in Research Software Engineering with Python gives an overview of general testing concepts:
https://merely-useful.tech/py-rse/testing.html

This page gives more information on using pytest:
https://realpython.com/pytest-python-testing/

In terms of a concrete example, here's a test file I wrote recently:
https://github.com/climate-innovation-hub/qqscale/blob/master/test_qdm.py

While you can't expect repeated runs to reproduce exactly the same results, you should hopefully be able to create a fixture that produces an expected result within some tolerance (i.e. assert np.allclose(expected_result, actual_result)).

Once you've got your tests setup, you can use GitHub Actions to run the tests every time you make a commit to your repository. That involves creating a tests.yml file in a new .github/workflows/ directory. It will look something like:
https://github.com/climate-innovation-hub/qqscale/blob/master/.github/workflows/tests.yml

Hope that helps 😄

from xinvert.

miniufo commented on June 8, 2024

Thanks very much for your kind help. I will follow these to make auto-tests for xinvert soon.

from xinvert.

miniufo commented on June 8, 2024

Hi @DamienIrving, I just followed your guide and made a action for tests. Everything on pytest works fine locally on my laptop. But the action build raised an error: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'scipy']

I made both netcdf4 and scipy in the requirement.txt file, so I don't know how to resolve this. Could you help me find why this is happening?

PS: I use git lfs this time because some of the sample netcdf files are larger than github limits.

from xinvert.

DamienIrving commented on June 8, 2024

If you add an extra step in tests.yml after the "install dependencies" step to list the installed packages you'll see that netCDF4 is successfully installed, so that isn't the problem.

      - name: List installed packages
        shell: bash -l {0}
        run: pip list

It might be that the files are too large... you could try a test with a very small data file?

from xinvert.

miniufo commented on June 8, 2024

Hi @DamienIrving, thanks for your tries and suggestion. I guess I found the reason. All netcdf files in Data has been tracked by LFS, which are then not recognized by xr.open_dataset. I modified a small nc file but leave it outside of LFS, then it is passed (you may check the log).

So this is really embarrassing, because I need some global dataset to make a test, which exceeds the limit by github. LFS seems to upload big file transparently but not openable online. Do you have any suggestion?

from xinvert.

DamienIrving commented on June 8, 2024

It sounds like you need to refactor the tests you want GitHub Actions to run so they don't require large input datasets.

For automated testing, it's most common to either have the test code create it's own fake data (e.g. https://github.com/climate-innovation-hub/qqscale/blob/master/test_qdm.py#L15) so that you don't need to include any data files in your repo, or if using "real" data is unavoidable then it's common to use a netCDF file containing only a single grid point or small spatial area.

You can do your own testing offline where you process and plot large global datasets, but those offline tests wouldn't be included in the tests run by GitHub Actions.

Does that make sense?

from xinvert.

miniufo commented on June 8, 2024

Well, that's right. I should skip those tests requiring large datasets (>100MB), and do them locally by hands.

from xinvert.

miniufo commented on June 8, 2024

Hi @DamienIrving, all the pytests has been passed. So please see if this issue is fixed now. Thank you very much for you kind help with this.

from xinvert.

DamienIrving commented on June 8, 2024

Thanks for implementing automated testing.

Just so I understand, what are the tests in test_Poisson.py checking?

I would expect that each test would have three basic steps:

Read in vorticity and divergence data from a file (that's the test "fixture")
Calculate the streamfunction and velocity potential (that's the "actual result")
Compare the actual result to an "expected result" to see if the test passes (i.e. using an assertion)

Do you have an expected result? If not, is there some vorticity and divergence data that you can create (i.e. simple example data) where you know what the streamfunction and velocity potential values should be?

from xinvert.

DamienIrving commented on June 8, 2024

Thanks for the explanation - that's a nice way to write the tests!

from xinvert.

JOSS review: Automated tests about xinvert HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent