Git Product home page Git Product logo

ecmwf-opendata's Introduction

ecmwf-opendata

ecmwf-opendata is a package to simplify the download of ECMWF open data. It implements a request-based interface to the dataset using ECMWF's MARS language to select meteorological fields, similar to the existing ecmwf-api-client Python package.

A collection of Jupyter Notebooks that make use of that package is available here.

Installation

The ecmwf-opendata Python package can be installed from PyPI with:

$ pip install ecmwf-opendata

Usage

The example below will download the latest available 10-day forecast for the mean sea level pressure (msl) into a local file called data.grib2:

from ecmwf.opendata import Client

client = Client()

client.retrieve(
    step=240,
    type="fc",
    param="msl",
    target="data.grib2",
)

NOTE: This package is designed for users that want to download a subset of the whole dataset. If you plan to download a large percentage of each data file, it may be more efficient to download whole files and filter out the data you want locally. See the documentation on the file naming convention for more information. Alternatively, you can use this tool to download whole files by only specifying date, time, step, stream and type. Please be aware that all data for a full day is in the order of 726 GiB.

Options

The constructor of the client object takes the following options:

client = Client(
    source="ecmwf",
    model="ifs",
    resol="0p25",
    preserve_request_order=False,
    infer_stream_keyword=True,
)

where:

  • source is either the name of server to contact or a fully qualified URL. Possible values are ecmwf to access ECMWF's servers, or azure to access data hosted on Microsoft's Azure. Default is ecmwf.

  • model is the name of the model that produced the data. Use ifs for the physics-driven model and aifs for the data-driven model. Please note that aifs is currently experimental and only produces a small subset of fields. Default is ifs.

  • resol specifies the resolution of the data. Default is 0p25 for 0.25 degree resolution, and is the only resolution that is currently available.

  • preserve_request_order. If this flag is set to True, the library will attempt to write the retrieved data into the target file in the order specified by the request. For example, if the request specifies param=[2t,msl] the library will ensure that the field 2t is first in the target file, while with param=[msl,2t], the field msl will be first. This also works across different keywords: ...,levelist=[500,100],param=[z,t],... will produce different output to ...,param=[z,t],levelist=[500,100],... If the flag is set to False, the library will sort the request to minimise the number of HTTP requests made to the server, leading to faster download speeds. Default is False.

  • infer_stream_keyword. The stream keyword represents the ECMWF forecasting system that creates the data. Setting it properly requires knowledge of how ECMWF runs its operations. If this boolean is set to True, the library will try to infer the correct value for the stream keyword based on the rest of the request. Default is True if model is ifs.

⚠️ NOTE: It is recommended not to set the preserve_request_order flag to True when downloading a large number of fields as this will add extra load on the servers.

Methods

Client.retrieve()

The Client.retrieve() method takes request as input and will retrieve the corresponding data from the server and write them in the user's target file.

A request is a list of keyword/value pairs used to select the desired data. It is possible to specify a list of values for a given keyword.

The request can either be specified as a dictionary:

from ecmwf.opendata import Client

client = Client(source="ecmwf")

request = {
    "time": 0,
    "type": "fc",
    "step": 24,
    "param": ["2t", "msl"],
}

client.retrieve(request, "data.grib2")

# or:

client.retrieve(
    request=request,
    target="data.grib2",
)

or directly as arguments to the retrieve() method:

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    type="fc",
    step=24,
    param=["2t", "msl"],
    target="data.grib2",
)

The date and time keyword are used to select the date and time of the forecast run (see Date and time below). If date or both date and time are not specified, the library will query the server for the most recent matching data. The date and time of the downloaded forecast is returned by the retrieve() method.

from ecmwf.opendata import Client

client = Client(source="ecmwf")

result = client.retrieve(
    type="fc",
    step=24,
    param=["2t", "msl"],
    target="data.grib2",
)

print(result.datetime)

may print 2022-01-23 00:00:00.

Client.download()

The Client.download() method takes the same parameters as the Client.retrieve() method, but will download the whole data files from the server, ignoring keywords like param, levelist or number.

The example below will download all field from the latest time step 24, ignoring the keyword param:

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.download(
    param="msl",
    type="fc",
    step=24,
    target="data.grib2",
)

Client.latest()

The Client.latest() method takes the same parameters as the Client.retrieve() method, and returns the date of the most recent matching forecast without downloading the data:

from ecmwf.opendata import Client

client = Client(source="ecmwf")

print(client.latest(
    type="fc",
    step=24,
    param=["2t", "msl"],
    target="data.grib2",
))

may print 2022-01-23 00:00:00.

NOTE: The data is available between 7 and 9 hours after the forecast starting date and time, depending on the forecasting system and the time step specified.

Request keywords

The supported keywords are:

  • type: the type of data (compulsory, defaults to fc).
  • stream: the forecast system (optional if unambiguous, compulsory otherwise). See the infer_stream_keyword above.
  • date: the date at which the forecast starts.
  • time: the time at which the forecast starts.
  • step: the forecast time step in hours, or fcmonth, the time step in months for the seasonal forecast (compulsory, default to 0 and 1, respectively).

and (all optional, with no defaults):

  • param: the meteorological parameters, such as wind, pressure or humidity.
  • levtype: select between single level parameters and parameters on pressure levels.
  • levelist: the list of pressure levels when relevant.
  • number: the list of ensemble member numbers when relevant.

The keywords in the first list are used to identify which file to access, while the second list is used to identify which parts of the files need to be actually downloaded. Some HTTP servers are able to return multiple parts of a file, while other can only return a single part from a file. In the latter case, the library may perform many HTTP requests to the server. If you wish to download whole files, only provide keywords from the first list.

Date and time

The date and time parameters refer to the starting time of the forecast. All date and time are expressed in UTC.

There are several ways to specify the date and time in a request.

Date can be specified using strings, numbers and Python datetime.datetime or datetime.date objects:

...
    date='20220125',
    time=12,
...
    date='2022-01-25',
    time=12,
...
    date='2022-01-25 12:00:00',
...
    date=20220125,
    time=12,
...
    date=datetime.datetime(2022, 1, 25, 12, 0, 0),
...
    date=datetime.date(2022, 1, 25),
    time=12,
...

Dates can also be given as a number less than or equal to zero. In this case, it is equivalent to the current UTC date minus the given number of days:

...
    date=0, # today
    date=-1, # yesterday
    date=-2, # the day before yesterday
...

The keyword time can be given as a string or an integer, or a Python datetime.time object. All values of time below are equivalent:

...
    time=12,
...
    time=1200,
...
    time='12',
...
    time='1200',
...
    time=datetime.time(12),
...
List of valid values for time
0, 6, 12 and 18

If time is not specified, the time is extracted from the date.

...
   date='2022-01-25 12:00:00',
...

is equivalent to:

...
   date='2022-01-25',
   time=12,
...

If the time keyword is specified, it overrides any time given in the request.

...
   date='2022-01-25 12:00:00',
   time=18,
...

is equivalent to:

...
   date='2022-01-25',
   time=18,
...

As stated before, if date or both date and time are not specified, the library will query the server for the most recent matching data. The date and time of the downloaded forecast is returned by the retrieve() method:

Example without the date keyword:

from ecmwf.opendata import Client

client = Client(source="ecmwf")

result = client.retrieve(
    time=12,
    type="fc",
    param="2t",
    step="24",
    target="data.grib2",
)

print(result.datetime)

will print 2022-01-22 12:00:00 if run in the morning of 2022-01-23.

Example without the date and time keywords:

from ecmwf.opendata import Client

client = Client(source="ecmwf")

result = client.retrieve(
    type="fc",
    param="2t",
    step="24",
    target="data.grib2",
)

print(result.datetime)

will print 2022-01-23 00:00:00 if run in the morning of 2022-01-23.

Stream and type

ECMWF runs several forecasting systems:

  • HRES: High Resolution Forecast.
  • ENS: Ensemble Forecasts.
  • SEAS: Long-Range (Seasonal) Forecast.

Each of these forecasts also produces several types of products, that are referred to using the keywords stream and type.

Valid values for type are:

HRES:

  • fc: Forecast.

ENS:

  • cf: Control forecast.
  • pf: Perturbed forecast.
  • em: Ensemble mean.
  • es: Ensemble standard deviation.
  • ep: Probabilities.

Valid values for stream are:

  • oper: Atmospheric fields from HRES - 00 UTC and 12 UTC.
  • wave: Ocean wave fields from HRES - 00 UTC and 12 UTC.
  • enfo: Atmospheric fields from ENS.
  • waef: Ocean wave fields from ENS.
  • scda: Atmospheric fields from HRES - 06 UTC and 18 UTC.
  • scwv: Ocean wave fields from HRES - 06 UTC and 18 UTC.

📌 NOTE: if the client's flag infer_stream_keyword is set to True, the library will infer the stream from the type and time. In that case, you just need to specify stream=wave to access ocean wave products, and don't provide a value for stream in other cases.

Time steps

To select a time step, use the step keyword:

...
   step=24,
...
   step=[24, 48],
...
Forecasting system Time List of time steps
HRES 00 and 12 0 to 144 by 3, 144 to 240 by 6
ENS 00 and 12 0 to 144 by 3, 144 to 360 by 6
HRES 06 and 18 0 to 90 by 3
ENS 06 and 18 0 to 144 by 3
Probabilities - Instantaneous weather events 00 and 12 0 to 360 by 12
Probabilities - Daily weather events 00 and 12 0-24 to 336-360 by 12

📌 NOTE: Not specifying step will return all available time steps.

Parameters and levels

To select a parameter, use the param keyword:

...
   param="msl",
...
   param=["2t", "msl"]
...

for pressure level parameters, use the levelist keyword:

...
   param="t",
   levelist=850,
...
   param=["u", "v"],
   levelist=[1000, 850, 500],
...

📌 NOTE: Not specifying levelist will return all available levels, and not specifying param will return all available parameters.

List of pressure levels (hPa)
1000, 925, 850, 700, 500, 300, 250, 200 and 50

Below is the list of all parameters:

Atmospheric fields on pressure levels

Parameter Description Units
d Divergence s-1
gh Geopotential height gpm
q Specific humidity kg kg-1
r Relative humidity %
t Temperature K
u U component of wind m s-1
v V component of wind m s-1
vo Vorticity (relative) s-1

Atmospheric fields on a single level

Parameter Description Units
10u 10 metre U wind component m s-1
10v 10 metre V wind component m s-1
2t 2 metre temperature K
msl Mean sea level pressure Pa
ro Runoff m
skt Skin temperature K
sp Surface pressure Pa
st Soil Temperature K
stl1 Soil temperature level 1 K
tcwv Total column vertically-integrated water vapour kg m-2
tp Total Precipitation m

Ocean waves fields

Parameter Description Units
mp2 Mean zero-crossing wave period s
mwd Mean wave direction Degree true
mwp Mean wave period s
pp1d Peak wave period s
swh Significant height of combined wind waves and swell m

Ensemble mean and standard deviation - pressure levels

Parameter Description Units Levels
gh Geopotential height gpm 300, 500, 1000
t Temperature K 250, 500, 850
ws Wind speed m s-1 250, 850

Ensemble mean and standard deviation - single level

Parameter Description Units
msl Mean sea level pressure Pa

Instantaneous weather events - atmospheric fields - 850 hPa

Parameter Description Units
ptsa_gt_1p5stdev Probability of temperature standardized anomaly greater than 1.5 standard deviation %
ptsa_gt_1stdev Probability of temperature standardized anomaly greater than 1 standard deviation %
ptsa_gt_2stdev Probability of temperature standardized anomaly greater than 2 standard deviation %
ptsa_lt_1p5stdev Probability of temperature standardized anomaly less than -1.5 standard deviation %
ptsa_lt_1stdev Probability of temperature standardized anomaly less than -1 standard deviation %
ptsa_lt_2stdev Probability of temperature standardized anomaly less than -2 standard deviation %

Daily weather events - atmospheric fields - single level

Parameter Description Units
10fgg10 10 metre wind gust of at least 10 m/s %
10fgg15 10 metre wind gust of at least 15 m/s %
10fgg25 10 metre wind gust of at least 25 m/s %
tpg1 Total precipitation of at least 1 mm %
tpg10 Total precipitation of at least 10 mm %
tpg100 Total precipitation of at least 100 mm %
tpg20 Total precipitation of at least 20 mm %
tpg25 Total precipitation of at least 25 mm %
tpg5 Total precipitation of at least 5 mm %
tpg50 Total precipitation of at least 50 mm %

Instantaneous weather events - ocean waves fields

Parameter Description Units
swhg2 Significant wave height of at least 2 m %
swhg4 Significant wave height of at least 4 m %
swhg6 Significant wave height of at least 6 m %
swhg8 Significant wave height of at least 8 m %

Ensemble numbers

You can select individual members of the ensemble forecast use the keyword number.

...
   stream="enfo",
   step=24,
   param="msl",
   number=1,
...
   stream="enfo",
   step=24,
   param="msl",
   number=[1, 10, 20],
...
List of ensemble numbers
1 to 50

📌 NOTE: Not specifying number will return all ensemble forecast members.

Examples

Download a single surface parameter at a single forecast step from ECMWF's 00UTC HRES forecast

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    stream="oper",
    type="fc",
    step=24,
    param="2t",
    target="data.grib2",
)

Download the tropical cyclone tracks from ECMWF's 00UTC HRES forecast

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    stream="oper",
    type="tf",
    step=240,
    target="data.bufr",
)
  • The downloaded data are encoded in BUFR edition 4
  • For the HRES Tropical Cyclone tracks at time=06 and time=18 use:
...
   step = 90,
...

NOTE: Tropical cyclone tracks products are only available when there are tropical cyclones observed or forecast.

Download a single surface parameter at a single forecast step for all ensemble members from ECMWF's 12UTC 00UTC ENS forecast

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    stream="enfo",
    type="pf",
    param="msl",
    target="data.grib2",
)
  • To download a single ensemble member, use the number keyword: number=1.
  • All of the odd numbered ensemble members use number=[num for num in range(1,51,2)].
  • To download the control member, use type="cf".

Download the Tropical Cyclone tracks from ECMWF's 00UTC ENS forecast

The Tropical Cyclone tracks are identified by the keyword type="tf".

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    stream="enfo",
    type="tf",
    step=240,
    target="data.bufr",
)
  • The downloaded data are encoded in BUFR edition 4
  • For the ENS Tropical Cyclone tracks at time=06 and time=18 replace step=240 with step=144.

Download the ensemble mean and standard deviation for all parameters at a single forecast step from ECMWF's 00UTC ENS forecast

The ensemble mean and standard deviation are identified by the keywords type="em":

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    stream="enfo",
    type="em",
    step=24,
    target="data.grib2",
)

and type="es", respectively:

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    stream="enfo",
    type="es",
    step=24,
    target="data.grib2",
)

Download the ensemble probability products

The ensemble probability products are identified by the keyword type="ep". The probability products are available only for time=00 and time=12.

Two different products are available.

Probabilities - Instantaneous weather events - Pressure levels

The probability of temperature standardized anomalies at a constant pressure level of 850hPa are available at 12 hourly forecast steps.

from ecmwf.opendata import Client

client = Client(source="ecmwf")

client.retrieve(
    time=0,
    stream="enfo",
    type="ep",
    step=[i for i in range(12, 361, 12)],
    levelist=850,
    param=[
        "ptsa_gt_1stdev",
        "ptsa_gt_1p5stdev",
        "ptsa_gt_2stdev",
        "ptsa_lt_1stdev",
        "ptsa_lt_1p5stdev",
        "ptsa_lt_2stdev",
    ],
    target="data.grib2",
)

Probabilities - Daily weather events - Single level

The probabilities of total precipitation and wind gusts exceeding specified thresholds in a 24 hour period are available for step ranges 0-24 to 336-360 by 12​​. These are specified in the retrieval request using, e.g.: step=["0-24", "12-36", "24-48"].

from ecmwf.opendata import Client

client = Client(source="ecmwf")

steps = [f"{12 * i}-{ 12 * i + 24}" for i in range(29)]

client.retrieve(
    time=0,
    stream="enfo",
    type="ep",
    step=steps,
    param=["tpg1", "tpg5", "10fgg10"],
    target="data.grib2",
)

ECMWF open data license

By downloading data from the ECMWF open data dataset, you agree to the their terms: Attribution 4.0 International (CC BY 4.0). If you do not agree with such terms, do not download the data. Visit this page for more information.

License

Apache License 2.0 In applying this licence, ECMWF does not waive the privileges and immunities granted to it by virtue of its status as an intergovernmental organisation nor does it submit to any jurisdiction.

ecmwf-opendata's People

Contributors

b8raoult avatar floriankrb avatar milanavuckovic avatar palewire avatar pdando avatar tlmquintino avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ecmwf-opendata's Issues

Allow "area" keyword ?

Hi ECMWF,

Thanks for releasing opensource data and providing this python package !

I would like to access only data on a particular geographic area. That would :

  • limit bandwidth
  • limit storage on my server

I try the following, inspired from ecmwf-api-client :

>>> from ecmwf.opendata import Client
>>> client = Client()
>>> r = client.retrieve(step=240, type="fc", param="msl", area="70/-130/30/-60", target="plop.grib2")
MARS post-processing keywords {'area'} not supported

Is there any plan to add the "area" keyword, or any other way to achieve this ?

Stéphane

Extra directory included in Opendata path from 06z on 20240228

The 06z run for 28th Feb 2024 has introduced an extra directory level in the path to the ECMWF data files which breaks client.py (and possibly other scripts). client.py is unable to locate data files as a result.

The new structure is:

https://data.ecmwf.int/forecasts/20240228/06z/ifs/
or
https://data.ecmwf.int/forecasts/20240228/06z/aifs/

As a quick fix, assuming you want to use IFS data, you can modify the value of HOURLY_PATTERN in the client.py file, i.e.

HOURLY_PATTERN = (
"{_url}/{_yyyymmdd}/{_H}z/ifs/{resol}/{_stream}/"
"{_yyyymmddHHMMSS}-{step}h-{_stream}-{type}.{_extension}"
)

Not specifying step will not return all available steps

According to the documentation, not specifying step should return all available time steps, however if I try to do so I only get back the first time step 0

from ecmwf.opendata import Client
def get_ECMWF_open_data_grib_ensemble(target, stream, param, number):
    try:
        client = Client(source="ecmwf")
        downloaded_file = client.retrieve(
            target = target,
            stream=stream,
            param=param,
            number=number,
        )
        return downloaded_file

    except:
        print('The ECMWF service is not currently available, please try again later')

grib_stream = 'enfo'
grib_format = 'grib2'
grib_target = f'aaa_{grib_stream}.{grib_format}'
grib_param = ["10u", "10v"]
grib_number = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
grib_file = get_ECMWF_open_data_grib_ensemble(grib_target, grib_stream, grib_param, grib_number)

ds = xr.open_mfdataset(grib_target, engine='cfgrib', parallel=True, chunks={'step': 3, 'number': 3,},
                               backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround', 'topLevel': 10}, })
print(ds)

ensemble = 11 #21
mask = (ds.number.values == ensemble)
xarr = ds.sel(number=mask).squeeze(drop=True)
print('xarr')
print(xarr)

print(xarr.coords['step'].values)

Evaporation

Dear All,
Is the evaporation at the earth surface available in the ecmwf-opendata database?
Thank you

lastest function get error during UTC 0:00-1:12

I'm in UTC+8, so this morning at 9:00 when I use the following code to get the latest ecmwf forecast run,

from ecmwf.opendata import Client
latest_time =Client().latest(type='fc'))

I get a wrong message:

Cannot establish latest date for {'date': ['2024-08-01 18:00:00'], 'model': ['ifs'], 'resol': ['0p25'], 'stream': ['oper'], 'type': ['fc'], 'step': ['0'], '_url': ['https://data.ecmwf.int/forecasts']}

I take a look at the client.py and full_date function in date.py, I think the reason comes from the search only end at the last UTC 18z instead of the last UTC 12z. Why not search to the yeasterday UTC 12z, it always pops out when the url get a positive response, so add another time for search won't do harm? the ECMWF forecast comes out at UTC 7:55,13:12,19:55,1:12 for 00/16/12/18.
Maybe change the client.py line to stop = date - datetime.timedelta(days=1, hours=12) or date>=stop ?

def latest(self, request=None, **kwargs):
       if request is None:
           params = dict(**kwargs)
       else:
           params = dict(**request)
       if "time" not in params:
           delta = datetime.timedelta(hours=6)
       else:
           delta = datetime.timedelta(days=1)
       date = full_date(0, params.get("time", 18))
       stop = date - datetime.timedelta(days=1, hours=6)
       while date > stop:
           result = self._get_urls(
               request=None,
               use_index=False,
               date=date,
               **params,
           )
           codes = [
               robust(self.session.head)(url, verify=self.verify).status_code
               for url in result.urls
           ]
           if len(codes) > 0 and all(c == 200 for c in codes):
               return date
           date -= delta

Reproject aifs to a regular lat/lon grid

I am trying to reproject the reduced-gaussian aifs grib file to a regulat 0.25° lat/lon grid.
I have tried 3 approaches, one using nearest neighbor and 2 others that should be better but both failed.

The first one using nearest neighbor :
cdo remapnn,target_grid_world_025 input_file output_file
That works but is it really appropriate?

The second one with bilenear interpolation:
cdo remapbil,target_grid_world_025.txt input_file output_file
=> cdo remapbil: Bilinear weights from gaussian_reduced (542080) to lonlat (1440x721) grid
=> cdo remapbil (Abort): Can't do bilinear interpolation if the source grid is not a regular 2D grid!

The third one using first-order conservative remapping:
cdo remapycon,target_grid_world_025.txt input_file output_file
=> cdo remapcon (Abort): Source grid cell corner coordinates missing!

Any idea as to how I should correctly reproject the aifs to a regular lat/lon grid?

Historic data availability

Hi! I wanted to know what is the historic availability for the open-data.

On the 2022-01-25 I managed to fetch data for that day. But today, 2022-01-31, data for that date is no longer available.

Accessing by FTP, the only data available is the past ~4 days. Trying to fetch data specifying a date before ~4 days ago, I receive a 404 from the API.

Is there a restriction for the open-data historic availability?

If there is a limitation, it would be nice to document it, I couldn't find this information anywhere.

best regards!

Links in README are broken

image

All of these links are now not found.
Sorry I don't have enough experience with the platform to submit corrected links.

Possibility for subdomain specification in request

Dear ECMWF,

Would it be possible to specify a subdomain in the request? I'm interested in ensemble data for the EU but have to download the whole domain at the moment.

Thanks in advance!

With regards, Teun

Find latest run with all timesteps

I know that the latest() function returns the latest run for a specific setting, but this consider all runs, also those that stil don't have all files online.
Is there a way to find the latest complete run with all steps present on the server?
I tried to specify step in the request but that didn't work.

Members Scrambled

I noticed when download both control and perturbed members with this, the members seem to be get scrambled. For example
client.retrieve( date=20230127, time=6, step=72, stream="enfo", type=["cf","pf"], param='tp', target="tmp.grib2", ) fieldset = ecdata.read("tmp.grib2")

and loop through forecast times, when I plot total precipitation, something like box and whiskers look good, but the data for each member over time does not increase or stay the same (in many cases, decreases). This is very noticeable with total precipitation, but would be much harder to know for something like temperature or height. This happens for every cycle I try. Not sure that it occurs if I give the entire list of forecast hours in one request, but that would result in a large file, and pretty crazy memory usage if done that way. When I spot check members with the multiple single forecast hour request, it shows the same order ([0,1,10,11,12,13,14,15,16,17,18,19,2,20,21,22,23,24,25,26,27,28,29,3,30,31,32,33,34,35,36,37,38,39,4,40,41,42,43,44,45,46,47,48,49,5,50,6,7,8,9]), so I am puzzled as to what leads to this.
EPS20230126-12_ptotal_SPSW4

AI version of ECMWF - (model = "aifs") doesn't seem to be downloading

The version is 0.3.3 is working fine for downloading the model = "ifs, but when changing the model to (model = "aifs") seems to download the same model as the "ifs" version.

I've tried with infer_steam as True and also as False with no difference.

from ecmwf.opendata import Client

client = Client(
source="ecmwf",
model="aifs",
resol="0p25",
preserve_request_order=False,
infer_stream_keyword=False,
)
from ecmwf.opendata import Client

client = Client(source="ecmwf")

request = {
"date": 20240309,
"time": 12,
"type": "fc",
"step": [3,6,9,12,15,18,21,24,27,30,33,36,39,42,45,48,51,54,57,60],
"param": ["2t", "msl", "tp", "10u", "10v", "tcwv"],
}

client.retrieve(request, "surface1.grib2")

Feedback would be appreciated. Thanks so much!

support 0p25 data

the 0p25 data has been published on January 31, 2024. The current version does not seem to support this data.
Client(...
beta=True,
...
)
beta is a boolean that indicates whether to access the beta or the production version of the dataset. Current only beta=True is supported.

Specifying output Resolution

Hi,
I've installed ecmwf-opendata very rencently. Working on it for a moment without significant issue, I found on ECMWF page that it is possible to mention to the script the output resolution! Wanting to get the HRES (0.1 ~ 9 Km), I tryed to pass on the script that value but failed. URL not found. I've checked the Open data (https://data.ecmwf.int/forecasts/20220608/00z/0p4-beta/oper/), of course, there is not yet way to request it.
So my question is :
Does the keyword resol is supported by the server ?
If yes, is there a ways to use it and how, is that means the data will be interpolated or the raw data itself
Thanks in advance for any ansmers

Missing import files

I have downloaded the ecmwf-open data files. No clear doc on what to do next. I have tried executing some of the individual files, but always receive missing import files. Seems to be a missing library of some kind? Any suggestions would be appreciated.

Corrupted index files for tp param

Hello,

Some .index files for the ensemble models (that I've checked) have the param tp "corrupted". It is shown as "unknown".

Example:
{"domain": "g", "date": "20220222", "time": "0000", "expver": "0001", "class": "od", "type": "pf", "stream": "enfo", "levtype": "sfc", "number": "28", "step": "12", "param": "unknown", "_offset": 36341019, "_length": 812043}

When specifying this param in the API, I get:

Did you mean 'sp' instead of 'tp'?

The problem is only on the index file. Passing "unknown" as a param, the grib file returns with the tp variable.

best regards

Where is 2t ?

Dear Developers,

When I download latest open data with the script below (adapted from README.md of this repo) , there is no variable 2t (and variable 10u is u10 and 10v is v10, not as in documented in this repo). Is it missing in the open data or have I made a mistake?

Variables I see are:

['v', 'u', 'skt', 'gh', 'r', 'st', 't', 'tp', 'msl', 'sp', 'v10', 'tcwv', 'u10', 'q', 'vo', 'd', 'ro', 'time', 'step', 'isobaricInhPa', 'latitude', 'longitude', 'valid_time', 'surface', 'depthBelowLandLayer', 'meanSea', 'heightAboveGround', 'entireAtmosphere']

If I remove the beta=True in the script below, I get even less variables.

terveisin, Markus

from ecmwf.opendata import Client
import xarray as xr
import matplotlib.pyplot as plt

client = Client("ecmwf", beta=True)

filename_osuite = "open_data_operational.grib"

dl = client.retrieve(
        step=[0],
        stream="oper",
        type="fc",
        target=filename_osuite,
)
print(dl.datetime)

ds = xr.open_dataset(filename_osuite, engine="cfgrib")

# Get the names of all variables
variable_names = list(ds.variables.keys())

# Print the variable names
print("Variable Names:", variable_names)

Pulls from S3 bucket rather than from AWS URL

Hello there folks,

Was making use of this opendata, to get that new 0.25 degree data. I noticed something that I would like investigate.

As is stands now, the tools I see, namely 'client.py', pulls data from a URL. For example, something like:
wget https://ecmwf-forecasts.s3.eu-central-1.amazonaws.com/20240227/12z/0p25/enfo/20240227120000-0h-enfo-ef.index

I believe that, from an amazon AWS EC2 instance, this would be a faster pull mechanism:
aws s3 cp --no-sign-request s3://ecmwf-forecasts/20240227/12z/0p25/enfo/20240227120000-0h-enfo-ef.index .

Those are command line steps, of course, Inside client.py and such, it would be different tools. My description above was simply to show the difference between a pull via HTTPS and AWS S3.

Any chance of adding capability to pull from the S3 bucket (and thus, the AWS 'backbone') into an AWS EC2 instance, rather than HTTP?

Regards,
Brian E.

Change of file paths - Client.Retrive does not work anymore

Hello, it appears that ECMWF recently changed the file paths by adding new folders, so the "client.retrieve" function is not able to find the correct files.

Previous path (example)
https://data.ecmwf.int/forecasts/20240425/00z/0p4-beta/enfo/20240425000000-174h-enfo-ef.index

New path: (with new IFS folder)
https://data.ecmwf.int/forecasts/20240425/00z/ifs/0p4-beta/enfo/20240425000000-174h-enfo-ef.index

This change was made around the 23 of April, 2024.

Request downloads all grib fields

Attempting to download a single parameter (MSLP) triggers a download of >800Mb data, which eventually fails. Looking at the incomplete grib files, it seems like all parameters are in there.

from ecmwf.opendata import Client
import ecmwf.data as ecdata
import dotenv
import xarray as xr

dotenv.load_dotenv()

client = Client("ecmwf", beta=True)

parameters = 'msl'
em_filename = 'medium-mslp-mean-spread_em.grib'

client.retrieve(
    date=-2,
    time=0,
    step=12,
    stream="enfo",
    type="em",
    levtype="sfc",
    param='msl',
    target = em_filename
)

Trigger a download which fails at 800Mb:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-9fc6f715bf9a> in <module>
     19     levtype="sfc",
     20     param='msl',
---> 21     target = em_filename
     22 )

~\AppData\Local\Continuum\anaconda3\envs\analytics\lib\site-packages\ecmwf\opendata\client.py in retrieve(self, request, 
target, **kwargs)
138     def retrieve(self, request=None, target=None, **kwargs):
139         result = self._get_urls(request, target=target, use_index=True, **kwargs)
--> 140         result.size = download(result.urls, target=result.target)
141         return result
142 

~\AppData\Local\Continuum\anaconda3\envs\analytics\lib\site-packages\multiurl\downloader.py in download(url, target, 
**kwargs)
107 
108 def download(url, target, **kwargs):
--> 109     return Downloader(url, **kwargs).download(target)

~\AppData\Local\Continuum\anaconda3\envs\analytics\lib\site-packages\multiurl\base.py in download(self, target)
131             assert (
132                 os.path.getsize(download) == size
--> 133             ), f"File size mismatch {os.path.getsize(download)} bytes instead of {size}"
134 
135         if download != target:

AssertionError: File size mismatch 834454190 bytes instead of 609068

Resulting grib file:

msl_em = ecdata.read(em_filename)
msl_em.describe()

image

Radiation variables unavailable?

Variables like surface net radiation (ssr) don't seem to be available, eventhough in product description from ecmwf they are described.

Is this an issue on their side?

ECMWF open data website list 13 pressure levels for High-resolution products, but why ftp only contains 9 levels for u,v,r,t ?

On the website: https://www.ecmwf.int/en/forecasts/datasets/open-data, it says:

High-resolution products:

Single and Pressure Levels (hPa): 1000, 925, 850, 700, 600, 500, 400, 300, 250, 200, 150, 100, 50

But, on the ftp, both these two sites:

https://data.ecmwf.int/forecasts/20240301/12z/ifs/0p4-beta/oper/

https://data.ecmwf.int/forecasts/20240301/12z/ifs/0p25/oper/

I can only get 9 pressure levels for u,v,r,t variables.

Would you please explain why?

Thank you!

ECMWF Order 10250

Please note that comments on this platform are not monitored by ECMWF. The open data is provided on a best effort basis, without further support from ECMWF. However, users are free to discuss their experiences and issues related to the software tool, and propose clean merge requests with appropriate testing and documentation. If you do require support or assistance, we encourage you to create a support request in the ECMWF Support Portal. Thank you for your understanding and for your contribution to our community.

0p4 exchange to 0p25 error

resol =0p4_beta exchange to resol =0p25,error happend
urllib3.exceptions.IncompleteRead: IncompleteRead(2947216 bytes read, 388822 more expected)
urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(2947216 bytes read, 388822 more expected)', IncompleteRead(2947216 bytes read, 388822 more expected))
requests.exceptions.ChunkedEncodingError: ('Connection broken: IncompleteRead(2947216 bytes read, 388822 more expected)', IncompleteRead(2947216 bytes read, 388822 more expected))

Cumulative data becomes smaller

Hello I have an important question.
I have ERA 5 land weather data. The data there are always cumulated within a day. That means actually that they must become larger and larger within a day. But sometimes they do not, sometimes it is so that the actually cumulated values within a day again become smaller. Why is that? Is that okay? And if so, where can I find the explanation that this is okay?

possible data corruption when retrieving data for a large number of steps

I think there might be a problem with the data that is downloaded when a large number of steps are passed.

In this example I am passing all steps from 0h to 240h for all ensemble numbers from 1 to 50 for the parameters 10u and 10v.
I am opening the file with xarray, filter the array by a random ensemble number and loop through every single step printing all the values for one of the variables.

The code will fail with the error reported below. However if I try divide the list of steps in two halves and process them separately the code will run correctly which makes me think there might be some issues with the data when retrieving a large number of steps in one go.

Can you replicate this behaviour?

import sys
np.set_printoptions(threshold=sys.maxsize)

from ecmwf.opendata import Client
def get_ECMWF_open_data_grib_ensemble(target, stream, step, param, number):
    try:
        client = Client(source="ecmwf")
        downloaded_file = client.retrieve(
            # time=0,
            target = target,
            stream=stream,
            # type=type,
            step=step,
            param=param,
            number=number,
        )
        return downloaded_file

    except:
        print('The ECMWF service is not currently available, please try again later')

grib_stream = 'enfo'
# grib_type = 'fc'
grib_format = 'grib2'
grib_target = f'aaa_{grib_stream}.{grib_format}'
step = [0, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99, 102, 105, 108, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 141, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234, 240]
grib_param = ["10u", "10v"]
grib_number = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]
grib_file = get_ECMWF_open_data_grib_ensemble(grib_target, grib_stream, step, grib_param, grib_number)

ds = xr.open_mfdataset(grib_target, engine='cfgrib', parallel=True, chunks={'step': 3, 'number': 3,},
                               backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround', 'topLevel': 10}, })
print(ds)


ensemble = 11 #21
mask = (ds.number.values == ensemble)
xarr = ds.sel(number=mask).squeeze(drop=True)
print('xarr')
print(xarr)

print(xarr.coords['step'].values)

for number in step:
    # number = 0
    h = str(number) + 'hours'
    timedelta = pd.Timedelta(h)
    res = timedelta.to_timedelta64()
    print(res)


    new_da = xarr.where(xarr.step == res, drop=True)
    print('new_da')
    print(new_da)

    print(new_da['u10'].values)
    print(new_da['u10'].size)

error:

Traceback (most recent call last):
  File "C:\Users\Giacomo\PycharmProjects\HIT-v3\test.py", line 378, in <module>
    print(new_da['u10'].values)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\dataarray.py", line 732, in values
    return self.variable.values
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\variable.py", line 614, in values
    return _as_array_or_item(self._data)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\variable.py", line 314, in _as_array_or_item
    data = np.asarray(data)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\array\core.py", line 1701, in __array__
    x = self.compute()
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\base.py", line 310, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\base.py", line 595, in compute
    results = schedule(dsk, keys, **kwargs)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\threaded.py", line 89, in get
    results = get_async(
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\local.py", line 319, in reraise
    raise exc
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\optimization.py", line 992, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 151, in get
    result = _execute_task(task, cache)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\dask\array\core.py", line 126, in getter
    c = np.asarray(c)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\indexing.py", line 484, in __array__
    return np.asarray(self.get_duck_array(), dtype=dtype)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\indexing.py", line 487, in get_duck_array
    return self.array.get_duck_array()
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\indexing.py", line 664, in get_duck_array
    return self.array.get_duck_array()
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\indexing.py", line 551, in get_duck_array
    array = self.array[self.key]
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\cfgrib\xarray_plugin.py", line 155, in __getitem__
    return xr.core.indexing.explicit_indexing_adapter(
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\xarray\core\indexing.py", line 858, in explicit_indexing_adapter
    result = raw_indexing_method(raw_key.tuple)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\cfgrib\xarray_plugin.py", line 164, in _getitem
    return self.array[key]
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\cfgrib\dataset.py", line 358, in __getitem__
    message = self.index.get_field(message_ids[0])  # type: ignore
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\cfgrib\messages.py", line 484, in get_field
    return ComputedKeysAdapter(self.fieldset[message_id], self.computed_keys)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\cfgrib\messages.py", line 344, in __getitem__
    return self.message_from_file(file, offset=item)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\cfgrib\messages.py", line 340, in message_from_file
    return Message.from_file(file, offset, **kwargs)
  File "C:\Users\Giacomo\anaconda3\envs\HIT-v3\lib\site-packages\cfgrib\messages.py", line 93, in from_file
    file.seek(offset)
OSError: [Errno 22] Invalid argument

Incorrect unit for TP

Hello!

I think that the provided unit for total precipitation (kg m-2) on the README is incorrect. Loading the data I got really low values as the maximum.

To corroborate that, I found a notebook example where TP is given as meters, not kg m-2 (mm).

The documentation could also provided the "validity" of the data like the S2S dataset:

total precipitation: accumulated from the beginning of the forecast
surface_air_maximum_temperature: maximum determined over the last 6 hours

This kind of specification really helped me, as newcomer to the "meteorology world".

Thanks for the work, this is really helping me!

best regards!

Requesting "aifs" version appears to download the "ifs" version using ECMWF-OPENDATA 0.3.8

Not a new issue with 0.3.8, but it seems that requesting the "aifs" version downloads a version of the ECMWF that appears to be the same as the "ifs" version of the model. Here's a sample of the request I'm using:

from ecmwf.opendata import Client
time.sleep(1)
client = Client(
    source="ecmwf",
    model="aifs",
    resol="0p25",
    preserve_request_order=False,
    infer_stream_keyword=False,
)

from ecmwf.opendata import Client

client = Client(source="ecmwf")

request = {
    "date": 20240602,
    "time": 6,
    "step": [66,72,78,84,90],
    "param": ["r", "t"],
    "levelist": [925, 850, 700, 500,]
}

client.retrieve(request, "upper4.grib2")



Am I making the request correctly or is there an issue with the python script not properly requesting the aifs version? (I've tried numerous different levels and parameters, all with a resulting download grib2 file that is indistinguishable from the "ifs" version.
I've also tried-

infer_stream_keyword=True, )

Any help would be appreciated. Thank you.

Client.latest can not establish latest date

Hello,

I've recently started to use ecmwf opendata and have a problem when I want to automate the download.

It says; "Cannot establish latest date for {'date': ['2022-07-07 18:00:00'], 'resol': ['0p4-beta'], 'stream': ['oper'], 'type': ['fc'], 'step': ['0'], 'url': ['https://data.ecmwf.int/forecasts']}"
---> 26 if client.latest().strftime("%Y-%m-%d
%H") == date:

while True:
    files = [f for f in listdir(path) if f.endswith(".grib2")]
    if files == []:
        date = client.latest().strftime("%Y-%m-%d_%H")
        filename = path+'{}.grib2'.format(date)
        client.retrieve(request, filename)
    else:
        date = str(files[-1])[-20:-6]
        if client.latest().strftime("%Y-%m-%d_%H") == date:               #### 26
                pass
        else:
            date = client.latest().strftime("%Y-%m-%d_%H")
            filename = path+'{}.grib2'.format(date)
            client.retrieve(request, filename)
    time.sleep(60*60)

Download error

Ignore, it started working again. There was a permissions issue on their site.

KeyError came out!

I am a researcher who want to use ecmwf-opendata api. But whenever I tried to execute this code, it didn't work out. I don't know why it doesn't run.

Screen Shot 2023-04-19 at 12 31 55 AM
Screen Shot 2023-04-19 at 12 32 25 AM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.