Git Product home page Git Product logo

ismn's Introduction

ismn

image

image

image

image

Readers for the data from the International Soil Moisture Database (ISMN).

Documentation

The full documentation is available at https://ismn.readthedocs.io and includes a tutorial on reading ISMN data in python after downloading it from https://ismn.earth

The following tutorials are also available as ipython notebooks docs/examples:

  1. ISMN reader basic functionality
  2. Adding custom metadata readers

Data used in the tutorials is not provided in this package. Please create an account at ismn.earth to download the required files.

Citation

image

If you use the software in a publication then please cite it using the Zenodo DOI. Be aware that this badge links to the latest package version.

Please select your specific version at https://doi.org/10.5281/zenodo.855308 to get the DOI of that version. You should normally always use the DOI for the specific version of your record in citations. This is to ensure that other researchers can access the exact research artefact you used for reproducibility.

You can find additional information regarding DOI versioning at http://help.zenodo.org/#versioning

Installation

This package should be installable through pip:

pip install ismn

Optional dependencies

The cartopy and matplotlib packages are only needed when creating data visualisations. They can be installed separately with:

conda install -c conda-forge matplotlib
conda install -c conda-forge cartopy

Example installation script

The following script will install miniconda and setup the environment on a UNIX like system. Miniconda will be installed into $HOME/miniconda.

wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh
bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
git clone [email protected]:TUW-GEO/ismn.git ismn
cd ismn
conda env create -f environment.yml
source activate ismn

This script adds $HOME/miniconda/bin temporarily to the PATH to do this permanently add export PATH="$HOME/miniconda/bin:$PATH" to your .bashrc or .zshrc

The second to last line in the example activates the ismn environment.

After that you should be able to run:

pytest

to run the test suite.

Description

ISMN data can be downloaded for free after creating an account on the ISMN Website

ISMN data can be downloaded in two different formats:

  • Variables stored in separate files (CEOP formatted)

    this format is supported 100% and should work with all examples

  • Variables stored in separate files (Header+values)

    this format is supported 100% and should work with all examples

If you downloaded ISMN data in one of the supported formats in the past it can be that station names are not recognized correctly because they contained the '_' character which is supposed to be the separator. If you experience problems because of this please download new data from the ISMN since this issue should be fixed.

Landcover Classification

The ISMN data comes with information about landcover classification from the ESA CCI land cover project (years 2000, 2005 and 2010) as well as from in-situ measurements. To use ESA CCI land cover variables for filtering the data in the get_dataset_ids function, set the keyword parameters (landcover_2000, landcover_2005 or landcover_2010) to the corresponding integer values (e.g. 10) in the list below. To get a list of possible values for filtering by in-situ values (keyword parameter: "landcover_insitu"), call the get_landcover_types method of your ISMN_Interface object and set landcover='landcover_insitu'.

  • 10: Cropland, rainfed
  • 11: Cropland, rainfed / Herbaceous cover
  • 12: Cropland, rainfed / Tree or shrub cover,
  • 20: Cropland, irrigated or post-flooding,
  • 30: Mosaic cropland (>50%) / natural vegetation (tree, shrub, herbaceous,
  • 40: Mosaic natural vegetation (tree, shrub, herbaceous cover) (>50%) / cropland (<50%),
  • 50: Tree cover, broadleaved, evergreen, Closed to open (>15%),
  • 60: Tree cover, broadleaved, deciduous, Closed to open (>15%),
  • 61: Tree cover, broadleaved, deciduous, Closed (>40%),
  • 62: Tree cover, broadleaved, deciduous, Open (15-40%),
  • 70: Tree cover, needleleaved, evergreen, closed to open (>15%),
  • 71: Tree cover, needleleaved, evergreen, closed (>40%),
  • 72: Tree cover, needleleaved, evergreen, open (15-40%),
  • 80: Tree cover, needleleaved, deciduous, closed to open (>15%),
  • 81: Tree cover, needleleaved, deciduous, closed (>40%),
  • 82: Tree cover, needleleaved, deciduous, open (15-40%),
  • 90: Tree cover, mixed leaf type (broadleaved and needleleaved),
  • 100: Mosaic tree and shrub (>50%) / herbaceous cover (<50%),
  • 110: Mosaic herbaceous cover (>50%) / tree and shrub (<50%),
  • 120: Shrubland,
  • 121: Shrubland / Evergreen Shrubland,
  • 122: Shrubland / Deciduous Shrubland,
  • 130: Grassland,
  • 140: Lichens and mosses,
  • 150: Sparse vegetation (tree, shrub, herbaceous cover) (<15%),
  • 152: Sparse vegetation (tree, shrub, herbaceous cover) (<15%) / Sparse shrub (<15%),
  • 153: Sparse vegetation (tree, shrub, herbaceous cover) (<15%) / Sparse herbaceous cover (<15%),
  • 160: Tree cover, flooded, fresh or brakish water,
  • 170: Tree cover, flooded, saline water,
  • 180: Shrub or herbaceous cover, flooded, fresh/saline/brakish water,
  • 190: Urban areas,
  • 200: Bare areas,
  • 201: Consolidated bare areas,
  • 202: Unconsolidated bare areas,
  • 210: Water,
  • 220: Permanent snow and ice,

Climate Classification

The ISMN data comes with information about climate classification from the Koeppen-Geiger Climate Classification (2007) as well as in-situ measurements. To use Koeppen-Geiger variable for filtering the data in the get_dataset_ids function, set the keyword parameter "climate" to the corresponding keys (e.g. 'Af') in the list below. To get a list of possible values for filtering by in-situ values (keyword parameter: "climate_insitu"), call the get_climate_types method of your ISMN_Interface object and set climate='climate_insitu'.

  • Af: Tropical Rainforest
  • Am: Tropical Monsoon
  • As: Tropical Savanna Dry
  • Aw: Tropical Savanna Wet
  • BWk: Arid Desert Cold
  • BWh: Arid Desert Hot
  • BWn: Arid Desert With Frequent Fog
  • BSk: Arid Steppe Cold
  • BSh: Arid Steppe Hot
  • BSn: Arid Steppe With Frequent Fog
  • Csa: Temperate Dry Hot Summer
  • Csb: Temperate Dry Warm Summer
  • Csc: Temperate Dry Cold Summer
  • Cwa: Temperate Dry Winter, Hot Summer
  • Cwb: Temperate Dry Winter, Warm Summer
  • Cwc: Temperate Dry Winter, Cold Summer
  • Cfa: Temperate Without Dry Season, Hot Summer
  • Cfb: Temperate Without Dry Season, Warm Summer
  • Cfc: Temperate Without Dry Season, Cold Summer
  • Dsa: Cold Dry Summer, Hot Summer
  • Dsb: Cold Dry Summer, Warm Summer
  • Dsc: Cold Dry Summer, Cold Summer
  • Dsd: Cold Dry Summer, Very Cold Winter
  • Dwa: Cold Dry Winter, Hot Summer
  • Dwb: Cold Dry Winter, Warm Summer
  • Dwc: Cold Dry Winter, Cold Summer
  • Dwd: Cold Dry Winter, Very Cold Winter
  • Dfa: Cold Dry Without Dry Season, Hot Summer
  • Dfb: Cold Dry Without Dry Season, Warm Summer
  • Dfc: Cold Dry Without Dry Season, Cold Summer
  • Dfd: Cold Dry Without Dry Season, Very Cold Winter
  • ET: Polar Tundra
  • EF: Polar Eternal Winter
  • W: Water

Contribute

We are happy if you want to contribute. Please raise an issue explaining what is missing or if you find a bug. We will also gladly accept pull requests against our master branch for new features or bug fixes.

Development setup

For Development we also recommend a conda environment. You can create one including test dependencies and debugger by running conda env create -f environment.yml. This will create a new ismn environment which you can activate by using conda activate ismn.

Guidelines

If you want to contribute please follow these steps:

  • Fork the ismn repository to your account
  • Clone the repository
  • make a new feature branch from the ismn master branch
  • Add your feature
  • Please include tests for your contributions in one of the test directories. We use pytest so a simple function called test_my_feature is enough
  • submit a pull request to our master branch

Code Formatting

To apply pep8 conform styling to any changed files [we use yapf](https://github.com/google/yapf). The correct settings are already set in setup.cfg. Therefore the following command should be enough:

yapf file.py --in-place

Release new version

To release a new version of this package, make sure all tests are passing on the master branch and the CHANGELOG.rst is up-to-date, with changes for the new version at the top.

Then draft a new release at https://github.com/TUW-GEO/ismn/releases. Create a version tag following the v{MAJOR}.{MINOR}.{PATCH} pattern. This will trigger a new build on GitHub and should push the packages to pypi after all tests have passed.

If this does not work (tests pass but upload fails) you can download the whl and dist packages for each workflow run from https://github.com/TUW-GEO/ismn/actions (Artifacts) and push them manually to https://pypi.org/project/ismn/ (you need to be a package maintainer on pypi for that).

In any case, pip install ismn should download the newest version afterwards.

ismn's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ismn's Issues

get_min_max_timestamp functions

Hi there!
I just noticed that get_min_max_timestamp_header_values, get_min_max_timestamp_ceop_sep and get_min_max_timestamp_ceop are all pretty much the same, except of the line _ = fid.readline() which is included in the header_values function. Seems like those functions are copy-pasted and have not been finished since then?
Anyway, I'm working with header_value files and to actually get the min & max timestamps, I had to change the function a little bit and it now looks like this:

def get_min_max_timestamp_header_values(file_path):
    with io.open(file_path, mode='r', newline=None) as fid:
        lines = fid.readlines()
        first = lines[3]
        last = tail(fid)[0]

    min_date = datetime.strptime(first[:16], '%Y/%m/%d %H:%M')
    max_date = datetime.strptime(last[:16], '%Y/%m/%d %H:%M')

    return min_date, max_date

If I find some time, I can also test the other two functions and fix them.

Cheers, Marco

Improvement: don't use absolute path names in metadata cache

The ISMN reader seems to create a cache of the ISMN data in a folder in a subfolder called python_metadata in a file called metadata.npy.

metadata.npy contains absolute pathnames of the stm files, e.g. /path/to/mydata/ISMN/ISMN_V20180830_GLOBAL/AMMA-CATCH/Banizoumbou/AMMA-CATCH_AMMA-CATCH_Banizoumbou_sm_0.050000_0.050000_CS616-1_19780101_20180830.stm.

In our setup, we use the same NFS mount containing ISMN data in several systems, and not all systems use the same mountpoint (e.g. one might have the data at /path/to/mydata and the other have it at /some/completely/different/path/to/my/data). Also, we use linux softlinks, so we might also access the data as /softlink/to/data/....

Once the metadata folder is created by one of the systems, the other systems may produce errors if they don't use the same path to access the data because they can't find the stm files at the paths cached in the metadata.npy file.

Would it be possible to not use absolute file paths or make it possible in some other way to use the same ISMN folder from several systems?

add read() function

Can we make an alias for reading time series? I.e. in addition to the current read_ts function in ISMN_Interface also add a function read that calls read_ts (and does the same). In pytesmo the default function name for reading data is still read_ts (https://github.com/TUW-GEO/pytesmo/blob/796dc3c8c1629b86260dac0bf8ca2413faa9c8e7/src/pytesmo/validation_framework/data_manager.py#L95) which causes a lot of warnings being raised by default (this could be fixed if we set read as the default function name, but ISMN does not have that function yet... therefore it needs to be added here first to be safe to use in pytesmo)

No 'variable' in station '3.09' and min_depth/max_depth don't work

I am trying to extract data from station 3.09, about the variable 'soil_moisture' from depth 0.01 to 0.04. By default I should write command 1 like this:

min_depth,max_depth=0.01, 0.04
ids = ismn_data.get_dataset_ids(variable='soil_moisture',
                                        min_depth=min_depth,
                                        max_depth=max_depth,
                                        filter_meta_dict={'station': '3.09',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

But this yields no element in ids. So I tried to print the metadata for 3.09 by ismn_data.read(1098, return_meta=True), and found there is indeed soil moisture in the metadata, but no values in variable:

ismn_data.read(1098, return_meta=True)
Out[117]: 
(                     soil_moisture soil_moisture_flag soil_moisture_orig_flag
 date_time                                                                    
 2017-01-01 00:00:00          0.192                  G                       M
                           ...                ...                     ...
 2019-02-22 09:00:00          0.155                  G                       M
 
 [12745 rows x 3 columns],
 variable        key       
 clay_fraction   val                           5.2
                 depth_from                    0.0
                 depth_to                     0.05
 climate_KG      val                           Dfb
 climate_insitu  val                       unknown
 elevation       val                         104.0
 instrument      val                 Decagon-5TE-B
                 depth_from                    0.0
                 depth_to                     0.05
 latitude        val                       55.8609
 lc_2000         val                            10
 lc_2005         val                            10
 lc_2010         val                            10
 lc_insitu       val                          None
 longitude       val                        9.2945
 network         val                          HOBE
 organic_carbon  val                           0.5
                 depth_from                    0.0
                 depth_to                      0.3
 sand_fraction   val                          85.1
                 depth_from                    0.0
                 depth_to                     0.05
 saturation      val                          0.41
                 depth_from                    0.0
                 depth_to                      0.3
 silt_fraction   val                           5.7
                 depth_from                    0.0
                 depth_to                     0.05
 station         val                          3.09
 timerange_from  val           2017-01-01 00:00:00
 timerange_to    val           2019-02-22 09:00:00
 variable        val                 soil_moisture
                 depth_from                    0.0
                 depth_to                     0.05
 Name: data, dtype: object)

You can see, right above clay_fraction, there is no value of key variable. So I have to use command 2

ids = ismn_data.get_dataset_ids(variable=None,min_depth=min_depth,
                                        max_depth=max_depth,
                                        filter_meta_dict={'station': '3.09',
                                                          'variable'='soil_moisture',
                                                          'lc_2000': [10,11,12,20,60,130],
                                                          'lc_2005': [10,11,12,20,60,130],
                                                          'lc_2010': [10,11,12,20,60,130],})

but still get nothing in ids. I found that's because I set min_depth and max_depth. If I delete min_depth and max_depth in command 2, I can get ids as [1098, 1104]. But I do want to extract values between 0.01 and 0.04. So is there anything wrong in the data on 3.09? And I am confused what's the difference bewteen command 1 and command 2?

handle invalid folder names

e.g RUSWET-AGRO/SyrdarinskayaO. leads to error when metadata was created on Ubuntu but loaded on CentOS (and vice versa)

"orig_flag" in "ts" via "ismn_data.read"

Hi,
Here are my code to read data

ids = ismn_data.get_dataset_ids(variable='soil_moisture', 
                                max_depth=1)


ts, meta = ismn_data.read(ids[0], return_meta=True)
ts

after running it, i get

                     soil_moisture soil_moisture_flag soil_moisture_orig_flag
date_time                                                                    
2015-01-01 00:00:00         0.2608                D03                       M
2015-01-01 01:00:00         0.2607                D03                       M
2015-01-01 02:00:00         0.2606                D03                       M
2015-01-01 03:00:00         0.2605                D03                       M

I found the mean of soil_moisture_flag in here, but fail to find the mean of soil_moisture_orig_flag. what does the M denotes?
Could you please help explain it for me?

No write access to directory where data is stored.

I am raising this issue, since it might help in building an (even) more user-friendly interface for the ismn package. For my setup, I have no write access to the directory in which the ISMN data is stored. The whole institute has read access, but only data managers have write access. Therefore, I get an error when the package is trying to create the metadata directory (python_metadata).

I see a few potential solutions (either on the side of the user, the package maintainer, or the data provider):

  • Copy all the data (not a nice solution)
  • Working with symbolic links pointing from my own directories to the data directory (not tested yet, but might work)
  • Add the feature to ismn that the user can specify another directory where the metadata file can be written
  • Supply the metadata file together with the data instead, only the paths need to be added in some way

Just some suggestions, I didn't think about them in detail yet, and maybe there are more. I hope it is useful feedback. It relates a bit to #9 .

Regarding issues and suggestions about metadata.

First of all, thank you for your dedication to this project, which has been very helpful to me. However, I am not clear about the units of continuous variables in the metadata, such as clay_fraction, organic_carbon, and sand_fraction. I cannot intuitively obtain the units of these variables. Could you please provide me with the units of these variables in the metadata? Similar to the issue I mentioned earlier, I believe it is crucial to intuitively obtain the units of different variables, and this needs to be addressed urgently.

Therefore, I have a suggestion to address this issue. It would be helpful to add a function def get_unit() -> dict[str, str], which would directly return all the variables related to ISMN along with their units. This information could be stored as a JSON file, and when users call get_unit(), it would return a dictionary directly. I suggest using LaTeX formatting for the units.

I hope to receive your response soon. Thank you.

ISMN v1.2.0: error when reading dataset

When I tried v1.2.0, it seems to be an error when I was reading data Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911 (this is a file I downloaded from ISMN station, containing lat from 36N to 58N and lon from 11.75W to 29.5E.):

ismn_data = ISMN_Interface(data_path)
Files Processed: 100%|██████████| 321/321 [00:00<00:00, 4521.32it/s]Processing metadata for all ismn stations into folder /Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911.
This may take a few minutes, but is only done once...
Hint: Use `parallel=True` to speed up metadata generation for large datasets
Metadata generation finished after 0 Seconds.
Metadata and Log stored in /Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20211231_9078_Zd6I_20220911/python_metadata

Traceback (most recent call last):

  File "<ipython-input-23-84af3e3a7ed0>", line 1, in <module>
    ismn_data = ISMN_Interface(data_path)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/interface.py", line 135, in __init__
    self.activate_network(network=network, meta_path=meta_path, temp_root=temp_root)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/interface.py", line 166, in activate_network
    self.__file_collection.to_metadata_csv(meta_csv_file)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/ismn/filecollection.py", line 403, in to_metadata_csv
    dfs = pd.concat(dfs, axis=0, sort=True)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 304, in concat
    sort=sort,

  File "/Users/xushan/opt/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py", line 351, in __init__
    raise ValueError("No objects to concatenate")

ValueError: No objects to concatenate

Can you please help me with this? Thanks!

Custom metadata strategies

Add more classes/strategies to transfer data from different sources to python metadata
Eg

  • NetcdfStrategy to read data from (static) nedcdf files (eg via xarray)
  • PynetCf strategy to use the pynetcf interface
    Etc

read from zip file

Would be nice if (optional) reading data from the downloaded zip file directly was possible (would be slower of course) with this package. It would save a lot of storage space. Could someone of you have a look at this, I guess there should be python libraries for that?

Error unzipping downloaded ismn data

Hi!
For QA4SM, we've tried downloading the latest ISMN data and unzipping it - but we all get an error on decompression:

file #14549:  bad zipfile offset (local header sig):  21299
file #14550:  bad zipfile offset (local header sig):  796807
file #14551:  bad zipfile offset (local header sig):  1546605
...

This isn't a problem with the reader but I thought I've got to report it somewhere.

log.zip

setuptools_scm conflicts with other environments

since the latest release of 1.0, setuptools_scm is a package requirement.

However, it would be nice if that can be changed to a build_dependency since it is not required for running the package itself (or is it?)

For environments that used an (outdated) version of pyscaffold, the installation of setuptools_scm results in a failed installation.

    AttributeError: module 'setuptools_scm.git' has no attribute 'list_files_in_archive'

file read error because of csv formatting issue

The package gives an error when it tries to read a file in the RISMA network, CEF station

file: ``RISMA_RISMA_CEF_sm_0.050000_0.050000_Hydraprobe-II-Sdi-12-A_20000101_20230605.stm

error: ParserError: Error tokenizing data. C error: Expected 9 fields in line 5316, saw 10

The error seems to be generated by pd.read_csv in filehander.py > __read_format_header_values > __read_csv when it tries to access the line

2015/12/01 06:00 0.06 D01,D02,D03 OK,OK,NA,NA,Out of WFV average range,Frozen soil

Object arrays cannot be loaded when allow_pickle=False

When using ismn in my environment, I get this error:

in test
    ismn_reader = ISMN_Interface(ismn_data_folder)
.../lib/python3.6/site-packages/ismn/interface.py:550: in __init__
    os.path.join(path_to_data, 'python_metadata', 'metadata.npy'))
.../lib/python3.6/site-packages/numpy/lib/npyio.py:447: in load
    pickle_kwargs=pickle_kwargs)
.../lib/python3.6/site-packages/numpy/lib/format.py:692: in read_array
    raise ValueError("Object arrays cannot be loaded when "
E   ValueError: Object arrays cannot be loaded when allow_pickle=False

numpy version: 1.16.3

This is probably because the default value for allow_pickle seems to have changed to "False": release notes, security issue description.

Filter depths by depth_from only

@daberer If the option to filter sensors only by their depth_from (and ignoring the depth_to) really makes sense, we should add some option to

def get_dataset_ids(self, variable, min_depth=0, max_depth=0.1,
to do that.

I would suggest either adding an argument to ignore the max_depth or to set a rule (e.g. if max_depth is None it is ignored)

Initializing ISMN_reader errors "INFO:root:no header:" on static_variables.csv files

ISMN_reader.get_climate_types() and ISMN_reader.get_landcover_types() result in empty dicts. For a subset of ~550 stations, ISMN_reader.get_dataset_ids() works as expected for variable='soil moisture' and variable='soil temperature' but returns an empty list for variable='air temperature'.

./python_metadata/metadata.log records an error for every static_variables.csv file in my subset indicating that the files do not have a header. Behavior exists when reading the subset files both as CEOP and as Header+values formatted. The module (v.0.3.2) passed the setup tests w/o error (two deprecation warnings). Python 3.7.6.

(ismn) []$ head -4 calval/ISMN/PanArctic_20150401-20181231/python_metadata/metadata.log 
INFO:root:no header: calval/ISMN/PanArctic_20150401-20181231/BIEBRZA_S-1/grassland-meteo-0/calval/ISMN/PanArctic_20150401-20181231/BIEBRZA_S-1/grassland-meteo-0/BIEBRZA-S-1_BIEBRZA-S-1_grassland-meteo-0_static_variables.csv
INFO:root:Error occured while reading metadata from csv file (calval/ISMN/PanArctic_20150401-20181231/BIEBRZA_S-1/grassland-meteo-0)
INFO:root:no header: calval/ISMN/PanArctic_20150401-20181231/BIEBRZA_S-1/grassland-soil-1/calval/ISMN/PanArctic_20150401-20181231/BIEBRZA_S-1/grassland-soil-1/BIEBRZA-S-1_BIEBRZA-S-1_grassland-soil-1_static_variables.csv
INFO:root:Error occured while reading metadata from csv file (calval/ISMN/PanArctic_20150401-20181231/BIEBRZA_S-1/grassland-soil-1)

The static_variables.csv files do appear to have a header.

(ismn) []$ head -3 calval/ISMN/PanArctic_20150401-20181231/BIEBRZA_S-1/grassland-soil-4/BIEBRZA-S-1_BIEBRZA-S-1_grassland-soil-4_static_variables.csv 
quantity_name;unit;depth_from[m];depth_to[m];value;description;quantity_source_name;quantity_source_description;quantity_source_provider;quantity_source_version;quantity_source_resolution;quantity_source_timerange;quantity_source_url;
bulk density;g/cm3;0.00;0.10;0.30;;insitu;in situ;;;;;;
saturation;m^3*m^-3;0.00;0.10;0.82;;insitu;in situ;;;;;;

ISMN reader produces different indices for same stations on different systems

Copied from TUW-GEO/pytesmo#143

Like it says in the title: The ISMN reader (in particular get_dataset_ids) produces different indices for the same stations on different systems. This makes it harder to check errors occurring in a production system on your developer machine. I think this comes from the metadata collector (pytesmo.io.ismn.metadata_collector.collect_from_folder) line 57, which uses os.walk. os.walk doesn't guarantee an order but we could sort the folders and files lists to alphabetical order. This would take care of the problem, barring locale issues (different sorting order from different locales).

'ISMN_Interface' object has no attribute 'collection'

To whom it may concern
I downloaded ISMN data (network = ORACLE) in northern France and got two files: Data_separate_files_19801023_20211011_9078_f2h4_20211012.zip and Data_separate_files_header_19801023_20211011_9078_dRV2_20211012.zip. And I used following codes to read them:

from ismn.interface import ISMN_Interface
import numpy as np
import matplotlib.pyplot as plt  # matplotlib is not installed automatically
%matplotlib inline

# Either a .zip file or one folder that contains all networks, here we read from .zip
data_path = "/Users/xushan/research/TUD/ISMN/Data_separate_files_header_19801023_20211011_9078_dRV2_20211012"
ismn_data = ISMN_Interface(data_path)

ismn_data.collection #  or just:  ismn_data  - overview over loaded networks and their stations

But I got an error like: AttributeError: 'ISMN_Interface' object has no attribute 'collection'. Could you please help me with this? And the version is '0.3.2'.

The unit of the variable.

I couldn't find your description about the units. Can I understand the unit of soil moisture as m3/m3, the units of soil temperature, air temperature, and soil surface temperature as °C, the unit of precipitation as mm/h, the unit of soil suction as kPa, and the units of snow water equivalent and snow depth as mm? With respect for your work, I hope to receive your reply. Thank you.

Collection not created

Hello, I am using your package with environment.yml on miniconda

I followed the tutorial and created ismn_data instance, however, ismn_data does not contain collection

Could you please help me with this problem?

from ismn.interface import ISMN_Interface

# Enter the path to your ISMN data
path_to_ismn_data = "myfilepath/Data_separate_files_20000101_20200229_6817_FTVy_20210520"
ismn_data = ISMN_Interface(path_to_ismn_data, network=['SCAN', 'USCRN'])

no 'display' is found in 'docs/examples/interface.ipynb'

Hi,
I am trying to implement some scripts in docs/examples/interface.ipynb.
However, when it comes to the lines - display(network). The error occurs - undefined.
I check the whole text and do not find the relevant definition.
Could you help me figure it out?

Thread error when finding the nearest station given the lat and lon

I simply run several codes like below and got an error of Fatal Python error: Aborted, Restarting kernel...

from ismn.interface import ISMN_Interface
data_path= "/Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20191231_9078_rEeC_20221101"
ismn_data.collection.get_nearest_station(-5.54427, 41.23923)

The logging information is:

Found existing ismn metadata in /Users/xushan/research/TUD/ISMN_westEurope/Data_separate_files_header_20170101_20191231_9078_rEeC_20221101/python_metadata/Data_separate_files_header_20170101_20191231_9078_rEeC_20221101.csv.

OMP: Error #15: Initializing libomp.dylib, but found libiomp5.dylib already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://openmp.llvm.org/


Fatal Python error: Aborted


Main thread:
Current thread 0x00007ff857c1ab40 (most recent call first):
  File "/Users/xushan/miniconda3/envs/py310/lib/python3.10/site-packages/pygeogrids/nearest_neighbor.py", line 215 in find_nearest_index
  File "/Users/xushan/miniconda3/envs/py310/lib/python3.10/site-packages/pygeogrids/grids.py", line 456 in find_k_nearest_gpi
  File "/Users/xushan/miniconda3/envs/py310/lib/python3.10/site-packages/pygeogrids/grids.py", line 421 in find_nearest_gpi
  File "/Users/xushan/miniconda3/envs/py310/lib/python3.10/site-packages/ismn/components.py", line 784 in get_nearest_station
  File "/var/folders/sk/q1bd_nkn1z345sbmdty96hpn6_w5dn/T/ipykernel_12129/1796113207.py", line 1 in <module>


Restarting kernel...
 

ismn_data.collection.get_nearest_station

I seem to get only station Weslaco even if I change the coordinates to a different place...

from ismn.interface import ISMN_Interface
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

Either a .zip file or one folder that contains all networks

data_path = "F:\RainfallRunoff\Data_separate_files_header_19800101_20141231_8943_9sUo_20210606.zip"
ismn_data = ISMN_Interface(data_path)

import pandas as pd
grid = ismn_data.collection.grid
gpis, lons, lats = grid.get_grid_points()
pd.DataFrame(index=pd.Index(gpis, name='gpi'),
data={'lon': lons, 'lat': lats}).T

station, dist = ismn_data.collection.get_nearest_station(47.53786,-124.31575)
print(f'Station {station.name} is {int(dist)} metres away from the passed coordinates:')
assert ismn_data.collection.station4gpi(0) == station # same result when selecting with GPI
station

_Station Weslaco is 8741467 metres away from the passed coordinates:
assert ismn_data.collection.station4gpi(0) == station # same result when selecting with GPI
Traceback (most recent call last):

File "", line 1, in
assert ismn_data.collection.station4gpi(0) == station # same result when selecting with GPI

AssertionError_

Consistent Ids between subsets

ISMN_Interface.get_dataset_ids are different when a different subset (network) is initialized. Would be better if they are consistent with the global dataset

i.e. The following IDs should be identical

ds = ISMN_Interface(".../FRM4SM/07_data/ismn_data/ISMN_v202301/extracted/", network=['WEGENERNET', 'SMOSMANIA'])
ids = ds.get_dataset_ids('soil_moisture', 0, 10, filter_meta_dict={'network': 'WEGENERNET'})
print(ids)

>> [215, 216, 220, 221, 225, 226, 231, 234, 238, 242, 244, 248, 250, 253, 256, 258, ... 

ds = ISMN_Interface(".../FRM4SM/07_data/ismn_data/ISMN_v202301/extracted/", network=['WEGENERNET'])
ids = ds.get_dataset_ids('soil_moisture', 0, 10, filter_meta_dict={'network': 'WEGENERNET'})
print(ids)

>> [2, 3, 7, 8, 12, 13, 18, 21, 25, 29,..

how to read the cosmos, fr_aqui and sosmania data?

Hi thanks for your help!
And I downloaded the data within southeastern France which contains sites from cosmos, fr_aqui and sosmania. They are all contained in the ismn dataset.
When I tried the data listed in your demo (https://pytesmo.readthedocs.io/en/latest/examples.html#examples-page), I can successfully read the data under the directory of 'SOSMANIA', but not of 'FR_Aqui' or 'COSMOS'. So how can I do that?

here is the directory of my data downloaded from https://www.geo.tuwien.ac.at/insitu/data_viewer/?user_id=7565&session=2mj3vrhpg1ho06zj3lfizaz1lkobexhd# (ceop format):

Data_seperate_files_19960101_20191231_7565_Qqnl_20200622 xushan$ pwd
/Users/xushan/research/TUD/surfex/southfr/Data_seperate_files_19960101_20191231_7565_Qqnl_20200622
Data_seperate_files_19960101_20191231_7565_Qqnl_20200622 xushan$ ls
COSMOS Readme.txt
FR_Aqui SMOSMANIA
ISMN_qualityflags_description.txt python_metadata
Metadata.xml

my code is:
import os
import pytesmo.validation_framework.metric_calculators as metrics_calculators
from ascat import AscatSsmCdr
#from pytesmo.io.ismn.interface import ISMN_Interface
from ismn.interface import ISMN_Interface
from pytesmo.validation_framework.validation import Validation
from pytesmo.validation_framework.results_manager import netcdf_results_manager
testdata_folder = '/Users/xushan/research/TUD/surfex/southfr/Data_seperate_files_19960101_20191231_7565_Qqnl_20200622/'

#testdata_folder = '/Users/xushan/research/TUD/surfex/southfr/pytesmo-test-data-master/'
#output_folder = '/pytesmo/code/examples/output'

ismn_data_folder = testdata_folder#os.path.join(testdata_folder,'COSMOS/Toulouse')

ismn_reader = ISMN_Interface(ismn_data_folder)

jobs = []

ids = ismn_reader.get_dataset_ids(variable='soil moisture', min_depth=0, max_depth=0.1)
for idx in ids:
metadata = ismn_reader.metadata[idx]
jobs.append((idx, metadata['longitude'], metadata['latitude'], metadata[1]))
print((idx, metadata['longitude'], metadata['latitude'], metadata[1]))

thanks and hope for your early reply!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.