bluegreen-labs / ecmwfr Goto Github PK

View Code? Open in Web Editor NEW

99.0 7.0 28.0 12.84 MB

Interface to the public ECMWF API Web Services

Home Page: https://bluegreen-labs.github.io/ecmwfr/

License: Other

R 97.28% Python 2.72%

copernicus cds ecmwf-api ecmwf-services climate-data r-package rstats

ecmwfr's People

Contributors

Stargazers

Watchers

ecmwfr's Issues

Order of input arguments to wp_request

cds_request is basically a copy of wf_request, however, I have changed the order of the input arguments. For me 'my' order makes more sense, but is, for sure, not set in stone.

wf_request(email, path, time_out, transfer, request, verbose = TRUE) (link)
cds_request(user, request, transfer, path, time_out, verbose) (link)

When I am thinking about a use case I think something like:

With my personal user (email/user) I would like to ...
... request the following data set ...
... and transfer it to my machine, ...
... stored in the following path.

Just a suggestion!

wf_set_key works for webapi but not for cds

First, thanks a lot for your work with ecmwfr.

I have edited all the messages and just keep the only problem:
wf_user_info and wf_services can be used only for webapi, not cds or ads.

NetCDF should not be forced (wf_request)

The wf_request function forces the NetCDF format. I don't think that this should be done as some people (including myself) are using good old grib and some data sets (e.g., regional models (TIGGE)) cannot be converted to NetCDF.

In wp_request around line 77+

Request time-series of a point or serveral points

Hi,

First thing I would like to congratulate you for this wonderfull package that bring a way to download ECMWF data from R.

I would like to know if there is a way to get time-series of data like in this example/tutorial from CDS: https://cds.climate.copernicus.eu/toolbox/doc/how-to/2_how_to_retrieve_time_series_and_extract_point_information/2_how_to_retrieve_time_series_and_extract_point_information.html

If not, I wonder if this functionality will be added at some point.

Thank you and congratulations!

Date period specification

Documentation says:

For those familiar to ECMWFs mars syntax: CDS does not accept date = "2000-01-01/to/2000-12-> 31" specifications at the moment. It is possible to specify one specific date via date = "2000-01-> >01" or multiple days via date = ["2000-01-01","2000-01-02","2000-10-20"] but not via ".../to/...".

So, how should the "date" piece of request be built when I need to extract date1 to date2?

date = "[2018-01-30 , 2019-02-01]", #does not work
date = "['2018-01-30', '2019-02-01']", #does not work
date = '["2018-01-30", "2019-02-01"]', #does not work

Create animated gifs for both MARS and python requests

For vignettes and README docs create animated gifs to demonstrate the Addin functionality, and or other features (TBD).

wf_set_key results in error message

If I try to set my webapi or cds key with wf_set_key, I get the following error:

... unused argument (email = "EMAILADDRESS")

Also, if I try to manually set the key, I get the following error message:
Error in wf_set_key(service = "webapi") :
Could not validate login information.

This happens with both, webapi and cds service.
Any advice?

User + key management

In thinking about the user verification it occur to me that currently there's no way of knowing is an user/password belongs to the webapi or the cds service. Should there be a service argument to wf_set_key() and the rest?

PS: You are free to tell me if I'm opening too many issues 😅

Can't save files in location

I've recently found your great package and wanted to test it and got this error:

library(ecmwfr)

query <- list(
  class="ei",
  dataset="interim",
  date="1979-01-01",
  expver="1",
  grid="2.5/2.5",
  levelist="200",
  levtype="pl",
  param="129.128",
  step="0",
  stream="oper",
  time="00:00:00/06:00:00/12:00:00/18:00:00",
  type="an",
  format= "netcdf",
  target="file.nc")

ecmwfr::wf_request("[email protected]", 
                   request = query, 
                   transfer = TRUE, 
                   path = getwd())

#> - staging data transfer at url endpoint or request id:
#>   https://api.ecmwf.int/v1/datasets/interim/requests/5c8fe53ebcd319dff7058b5a
#> 
#> - timeout set to 3.0 hours
#> - polling server for a data transfer
#> - writing data to disk ("/tmp/RtmpERehHV/ecmwfr_e491543096c.nc")
#> - moved temporary file to -> /home/elio/file.nc
#> - request purged from queue!
#> Warning message:
#> In file.rename(src, dst) :
#>   cannot rename file '/tmp/RtmpERehHV/ecmwfr_e491543096c.nc' to '/home/elio/file.nc', reason 'Invalid cross-device link'

The temporary file is in in there correctly downloaded, but "file.nc" is nowhere to be found. Changing the file.rename() call to file.copy() (see here) solves the issue but considering the potentially big files, it's probably not ideal.

I'm on elementaryOS 5 (based on Ubuntu 18.04)

Fix ADS unit tests

Also fix:

documentation
README pages
vignettes

download with multiple leadtime_month

Hi
I am trying to download a dataset from Cds with multiple leadtime, the API is the following:

c.retrieve(
'seasonal-postprocessed-single-levels',
{
'originating_centre': 'ecmwf',
'system': '5',
'variable': 'total_precipitation_anomalous_rate_of_accumulation',
'product_type': 'ensemble_mean',
'year': '2020',
'month': '01',
'leadtime_month': [
'2', '3', '4',
'5',
],
'format': 'grib',
},
'download.grib')

The R version below works only with 1 leadtime_month, i am not able to run it with multiple months
Any help? Thanks

request <- list(
originating_centre = "ecmwf",
system = "5",
variable = "total_precipitation_anomalous_rate_of_accumulation",
product_type = "ensemble_mean",
year = "2020",
month = "01",
leadtime_month = "2",
format = "grib",
dataset = "seasonal-postprocessed-single-levels",
target = "download.grib"
)

Support for .ecmwfapirc file

Right now the package uses a keyring, but it's also possible to use a .ecmwfapirc file in the home folder. This method has the advantage that works with the python module too, so if someone is working with both versions there's no need to have credentials in two places.

How about adding support for both methods? A good way of doing this would be to use the information in .ecmwfapirc if no user is passed to wf_request(). This would also have the side benefit of being able to have a default user (in my case, I have only one user so the user parameter is redundant).

Something like this should work (and some modifications to wf_request())

wf_get_key <- function(user){
  if (missing("user")) {
    creds <- .read_ecmwfapirc()
    message(paste0("Using ", creds$user, " credentials from .ecmwfapirc"))
    creds$key
  } else {
    keyring::key_get(service = "ecmwfr", username = user)
  }
}

#  Function to locate and parse .ecmwfapirc
# (only tested on linux!)
.read_ecmwfapirc <- function() {
  os_type <- Sys.info()["sysname"]
  if (os_type == "Windows") {
    file_path <-  file.path(Sys.getenv("USERPROFILE"), ".ecmwfapirc")
  } else {
    file_path <- file.path(Sys.getenv("HOME"), ".ecmwfapirc")
  }
  lines <- readLines(file_path)

  parse_data <- function(name) {
    lines <- lines[grepl(name, lines)]
    lines <- strsplit(lines, ":")[[1]][2]
    lines <- gsub('"', "", lines, fixed = TRUE)
    lines <- gsub(" ", "", lines, fixed = TRUE)
    lines
  } 
  
  key <- parse_data("key")
  user <-   parse_data("email")
  list(user = user, key = key)
}

Problem retrieving multiple variables from CDS

I get error messages when I tried to retrieve multiple variables from the CDS store. Below is my request list using the ecmwf package:
request <- list("dataset" = "reanalysis-era5-single-levels",
"product_type" = "reanalysis",
"variable" = c("10m_u_component_of_wind", "10m_u_component_of_wind",
"10m_v_component_of_wind","2m_temperature", "mean_boundary_layer_dissipation",
"total_precipitation"),
"year" = "2010",
"month" = "01",
"day" = "01",
"area" = "-28/25/48/22",
"format" = "netcdf",
"time" = "00:00",
"target" = "era5-jan2010.nc")

However, I am able to retrieve one variable.

Thanks
Seyi

Add APSIM formatting support

Check {nasapower} package for a template, might want to move this into something called ecmwfrExtra to keep the base package for data access only?

encouraging pkg citation

Just saw your tweet, and fully agree with your concerns. I've had success getting citations of my software by making it clear to users that it is expected, and making it easy for them to do it. For example, make it the first thing people see on the README and in the vignette, like this:

ecmwfr

other badges, etc.

Programmatic interface to the two European Centre for Medium-Range Weather Forecasts API services. The package provides easy access to the 'ECMWF' web API services
and Copernicus Climate Data Store or 'CDS' from within R, matching and expanding upon the ECMWF python tools.

How to cite this package in your article

You can cite this package like this "we obtained data from the European Centre for Medium-Range Weather Forecasts API using the ecmwf R package (Hufkens, Stauffer, and Campitelli 2019)".

Here is the full bibliographic reference to include in your reference list (don't forget to update the 'last accessed' date):

Hufkens, K., R. Stauffer, & E. Campitelli. (2019). ecmwfr: Programmatic interface to the two European Centre for Medium-Range Weather Forecasts API services. (Version v1.2.0). Zenodo. http://doi.org/10.5281/zenodo.2647541. Last accessed 15 May 2020

Installation

stable release

To install the current stable release use a CRAN repository:

install.packages("ecmwfr")
library("ecmwfr")

Add support for ADS?

Hi, very nice package! Are there any plans to add support for the Atmosphere Data Store (https://ads.atmosphere.copernicus.eu/)? It appears to use exactly the same Python interface as the Climate Data Store, so I assume it should be quite straightforward to implement. Or maybe this is already possible, but my attempts to retrieve ADS data failed so far.

Requests for era5-single-levels results in an empty netcdf file

Hello! I am trying to download a netcdf file of the 10meters v component of wind for a whole month (every hour of every day) running the following request,

` CDS_request <- list("dataset"        = "reanalysis-era5-single-levels",
                      "product_type"   = "reanalysis",
                      "variable"       = "10m_v_component_of_wind",
                      "year"           = 1981,
                      "month"          = 1,
                      "day"            = 1:31,
                      "time"           = c('00:00', '01:00', '02:00',
                                           '03:00', '04:00', '05:00',
                                           '06:00', '07:00', '08:00',
                                           '09:00', '10:00', '11:00',
                                           '12:00', '13:00', '14:00',
                                           '15:00', '16:00', '17:00',
                                           '18:00', '19:00', '20:00',
                                           '21:00', '22:00', '23:00'),
                      "area"           = "6.125/-75.125/-34.125/-33.875", ### This area downloads Brazil territory
                      "format"         = "netcdf",
                      "target"         = paste0("netcdf/wind_u_1981.nc")
  )

  wf_request(
    user     = "nnnnn",
    request  = CDS_request,
    transfer = TRUE,
    path     = "./",
    time_out = `36000)``

which results into a very small file (28kb, as attached)
wind_u_1981.zip

I already tried to open it in R with the raster package, which results me the following error:

Error in R_nc4_open: NetCDF: Unknown file format
Error in ncdf4::nc_open(filename, suppress_dimvals = TRUE) : 
  Error in nc_open trying to open file C:\Users\victofs\Desktop\wind_u_1981.nc

What is a probable cause for this error? The function apparently is not downloading the data properly

wf_set_key in Ubuntu Server

Hi,

I can use the package normally in my laptop to download data. However, when I tried into an Ubuntu Server it is not possible to set the key and I get a warning.

wf_set_key(user = *****, key = *****, service = "cds")
User ***** for cds service added successfully
Warning messages:
 1: In default_backend_auto() :
   Selecting ‘env’ backend. Secrets are stored in environment variables
 2: In default_backend_auto() :
   Selecting ‘env’ backend. Secrets are stored in environment variables

sessionInfo()
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] keyring_1.1.0 dplyr_1.0.0   stringr_1.4.0 ecmwfr_1.2.3 

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6       lubridate_1.7.8    lattice_0.20-41    getPass_0.2-2     
 [5] prettyunits_1.1.1  ps_1.3.3           gtools_3.8.2       assertthat_0.2.1  
 [9] rprojroot_1.3-2    digest_0.6.25      packrat_0.5.0      R6_2.4.1          
[13] backports_1.1.7    httr_1.4.1         ggplot2_3.3.1      pillar_1.4.4      
[17] rlang_0.4.6        curl_4.3           rstudioapi_0.11    data.table_1.12.9 
[21] callr_3.4.3        raster_3.1-5       blob_1.2.1         Matrix_1.2-18     
[25] RPostgres_1.2.0    desc_1.2.0         devtools_2.3.0     splines_3.6.3     
[29] rgdal_1.5-8        bit_1.1-15.2       munsell_0.5.0      compiler_3.6.3    
[33] pkgconfig_2.0.3    pkgbuild_1.0.8     tidyselect_1.1.0   tibble_3.0.1      
[37] codetools_0.2-16   fitdistrplus_1.1-1 fansi_0.4.1        crayon_1.3.4      
[41] withr_2.2.0        rappdirs_0.3.1     MASS_7.3-51.6      grid_3.6.3        
[45] jsonlite_1.6.1     gtable_0.3.0       lifecycle_0.2.0    DBI_1.1.0         
[49] magrittr_1.5       scales_1.1.1       cli_2.0.2          stringi_1.4.6     
[53] fs_1.4.1           remotes_2.1.1      sp_1.4-2           testthat_2.3.2    
[57] filelock_1.0.2     ellipsis_0.3.1     generics_0.0.2     vctrs_0.3.0       
[61] tools_3.6.3        bit64_0.9-7        glue_1.4.1         purrr_0.3.4       
[65] hms_0.5.3          processx_3.4.2     pkgload_1.1.0      parallel_3.6.3    
[69] survival_3.1-12    yaml_2.2.1         sodium_1.1         wcengine_4.02     
[73] colorspace_1.4-1   sessioninfo_1.1.1  memoise_1.1.0      usethis_1.6.1

For each new R session I need to set again the key to download data. Also in each request I get the backend warning. What it seems is that keyring needs a GUI to work. It is possible to setup key and user in a OS without GUI? I found this solution but I cannot make it work with wf_set_key.

https://community.rstudio.com/t/keyring-on-amazon-ec2/34595

Thank you very much,
Oriol

Using request string in `wf_request()`

I was wondering if it would be a good idea to support using copy-pasted request string as ' request' parameter. Something like:

wf_request("
c.retrieve(
    'sis-water-level-change-timeseries',
    {
        'format': 'tgz',
        'variable': 'mean_sea_level',
        'experiment': 'rcp4_5',
        'year': '2072',
    },
    'download.tar.gz')")

The string would be passed to the python to list functionality already used for the addin.

My motivation is that perhaps if users can skip the formatting step, then you would get fewer issues related to badly-formatted requests.

Parallel requests

I'm on the situation that I need to download a lot of data which I need to divide into multiple requests. Each request can take in the order of hours (depending on CDS mood that day) but I figured out that you could just send a bunch of request in parallel and let them handle the load as they see fit.

There are many ways of doing this but I went for the easy one of literally parallelising the calls like this:

download_date <- function(date, force_download = TRUE) {
  request <- # Build the request based on date

  file <- wf_request(request, 
                      user = user,
                      time_out = 5*3600,
                      path = path)
  return(file_out)
}
future::plan("multicore", workers = 5)   # CDS processes 5 requests at a time
files <- furrr::future_map(dates, download_date, force_download = FALSE)

Do you think it would be worthwhile to handle some kind of multiple requests nativelly? Essentially allowing for the request argument to be a list of request?

wf_request problem

Hi,
after successfully completing the wf_set_key command using my uid as user identity, the wf_request command terminates with an error message (using a sample request list) :
Error in FUN(X[[i]], ...) : argument is missing, with no default
Calls: sourceWithProgress ... eval -> eval -> wf_request -> which -> vapply -> FUN
In addition: Warning messages:
1: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
2: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
3: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
4: In default_backend_auto() :
Selecting ‘env’ backend. Secrets are stored in environment variables
Execution halted

What can be the problem?

Bests,
Zsolt

wf_request error

wf_get_key (user = "5168x")
Error in b_wincred_i_get (target): Windows credential store error in 'get': Element not found.
using it it gives this error.

But when using wf_set_key, there is no problem.

wf_set_key (user = "5168x",

key = "23a2dddd-cc98-431c-8a3e-d06713eaxxxx",
service = "cds")
User 5168x for cds service added successfully in keychain

When I try to run wf_request I get the following error.
file <- wf_request (user = "5168x",

request = request,
transfer = TRUE,
path = ".")

Error: Data identifier reanalysis-era5-single-level is not found in Web API, CDS or ADS datasets. Or your login credentials do not match your request.

Why do you think this is happening? I do everything according to what is written in the document.

rOpenSci review + (JOSS) paper

Currently functionality is good, and working toward the ecmwfrExtra will provide considerable context, and worked examples, so writing a paper will be easier (or vice versa). It might be time to think about how to move forward with this. Thoughts, @jwagemann @eliocamp @retostauffer?

worked examples + data integration

From my part I have these ideas:

phenology modelling which include temperature + precip. variables
bird migration + wind interactions (recent request by birding friend), includes wind and altitude differences in processes.

Fix encrypted keys for unit checks

Keys changed due to repo migration, need to regenerate these using

travis encrypt TOKEN=mykey-123

`wf_user_info()` can return "bad" info

For debugging I added a new user with username “user” and the same API key than “[email protected]”

keyring::key_list()
#>   service                 username
#> 1  ecmwfr [email protected]
#> 2  ecmwfr                     user
keyring::key_get("ecmwfr", "user") == keyring::key_get("ecmwfr", "[email protected]")  
#> [1] TRUE

But even though there's no "user" username, wf_user_info() still thinks is a valid username and returns information, albeit on the "eliocampitelli" username.

ecmwfr::wf_user_info("user")
#>   first_name code                      uid  last_name       full_name
#> 1       Elio  200 [email protected] Campitelli Elio Campitelli
#>                      email
#> 1 [email protected]

^{Created on 2019-03-19 by the reprex package (v0.2.1.9000)}

The ECMFW API seems to ignore the username part of the request and return the user information associated with the API key. In other words in this part of the code

  response <- httr::GET(
    paste0(wf_server(),
          "/who-am-i"),
    httr::add_headers(
      "Accept" = "application/json",
      "Content-Type" = "application/json",
      "From" = user,
      "X-ECMWF-KEY" = key),
    encode = "json"
  )

The "From" = user part is ignored (it works the same if commented out). One solution could be to validate the combination (even if ECMWF wont) by checking that the username returned by the API is the same as the one in the local keyring.

User for testing

I'm trying to test the changes related to #55. Since your user is hardcoded in the sourcecode, I would need to change all of the instance with mine, test, and the change it back (I think, maybe there's a better way?).

Maybe it would be better to store the user in an environmental variable. Then, the same code could work in any machine. I think that you would have to set up the environmental variables in Travis.

CRAN issue, ... in @param not in function

Not sure why this is detected right now and not like x months ago as nothing changed.

Anyway, @param here should be removed.

https://github.com/khufkens/ecmwfr/blob/60d2b35fc1fd1eaf92505dea5e7b9db36ae79ed3/R/wf_transfer.R#L13

Dear maintainer,

Please see the problems shown on
<https://cran.r-project.org/web/checks/check_results_ecmwfr.html>.

Specifically, see the warnings about 'Documented arguments not in
\usage' in the r-devel checks.  These are from a recent bug fix
(PR#16223, see
<https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16223>): can you
please fix your man pages as necessary?  (In most cases, remove the
documentation for argument '...'.)

Please correct before 2020-02-17 to safely retain your package on CRAN.

Best,
-k

Check functions - both WebAPI and CDS

Still need to check all functions using the new integrated routines:

WebAPI
CDS

@retostauffer I follow roughly these guidelines mostly based upon rOpenSci experience with my MODISTools package:

errors

forward errors from underlying services if possible
trap all missing parameters

messages

functional messaging, repeated statements of the same information to be avoided
limit incremental info (see first point) which might fill a terminal when not properly refreshing

formatting of data, functions

functional tidy data where possible (human readable, but machine parse-able)
interface with other data formats when possible (hence me removing html output)
limited fragmentation to limit overhead (more functions == more unit tests == more documentatoin == way more overhead in maintenance) for the same task
exceptions to this rule are small helper functions for very specific tasks (complex feedback formatting, data shuffling etc)

documentation

small worked examples are ok, i.e. basic but good documentation
larger examples should go into the vignettes or "workshop" based documentation outside the package (specific use cases out of the scope of basic data queries)

How to manage 'dataset' property within a dataset

Dear Koen,

Thanks for your nice contribution with this package.

I have a problem and I don't know how to solve it. In some datasets of the CDS (e.g Fire Danger https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-fire-historical?tab=form or River discharge https://cds.climate.copernicus.eu/cdsapp#!/dataset/cems-glofas-historical?tab=form ) there is a manadatory property of the dataset called "dataset". When I include both datasets descriptors in my request it gives an error.

How can I call this property from the ecmwfr package?

Thanks in advance

Selection spatial window

Dear friend, I am trying to download a spatial subset of the Thermal comfort indices database, using ecmwfr.
I am facing two problems: I am not able to donwload the required spatial window requested by the order 'area' (the size of the file seems to be the whole global spatial coverage 90N-60S, 180W-180E)
and also I am not able to open neither the resulting tar.gz file nor the zip alternative file in my PC, using winrar.
am I doing something wrong with the syntax?
Best regards
request <- list(
'dataset' = "derived-utci-historical",
'variable' = "Universal thermal climate index",
'product_type' = "Consolidated dataset",
'year' = "1979",
'month' = "01",
'day' = "01",
'area' = "50/-10/35/5",
'format' = "tgz",
'target' = "download.tar.gz")

Send failure: Connection was reset

I'm trying to get a request but each time I only get this message (after a short time).

- timeout set to 1.0 hours
Error in curl::curl_fetch_memory(url, handle = handle) : 
  Timeout was reached: Send failure: Connection was reset

The thing is that on cds.climate.copernicus the request is still in progress. Any idea what to do?

Default user

I was thinking that for the common use-case of people with only one user for each service, it would be great not to need to pass it as an argument; especially for CDS, in which the username is a hard-to-remember random number.

Searching the keyring to see if there's only one user for the selected service is easy to implement. But from what I've seen in the code since the service is actually selected based on the user supplied, I've ran into a bit of a chicken-and-egg situation and I'm not sure how to proceed.

Any ideas?

Continuous Integration (travis)

Nod really an issue, more a note which might have to be discussed/require adjustments. Had to change some settings regarding travis CI:

`_set_key`, `_get_key`

To be able to test the set and get key (keyring) I am currently exporting four global variables on travis-ci (ECMWFAPIEMAIL, ECMWFAPIKEY, CDSAPIUSER, and CDSAPIKEY). This allows to load the login details via Sys.getenv(...) within the test scripts, works nicely, but might not be your preferred way. An alternative would be to add encoded variables in the .travis.yml file. The way it is implemented right now allows that different travis CI checks (different users) can use different user accounts (might not be required in the future).

`*_key_from_file`

One of the extensions is that ~/.ecmwfapirc and ~/.cdsapirc files can be used (as for python ecmwfapi/cdsapi). To be able to run the checks I do have to create them on travis-ci. Thus, .travis.yml calls the create_apirc (https://github.com/retostauffer/ecmwfr/commit/098ffc964b2e0120efd4aded9d5613b177d96997) script (see travis.yml line 14). If on travis-ci the local files will be created based on the global variables (see above).

Works fairly ok, but I do not have too much experience with CI so far.

Download time-series data for a given coordinates

Is it possible to use wf_request() function to download time-series data for a given list of coordinates?

Thanks,
Alexandru

Bump up R version requirement

Increase the mandatory R version requirement to 3.6?

In general I've run into many troubles, either personally or when giving workshops when people run outdated R installs. Pushing for a 3.6 requirement might trap some issues which are not necessarily package related (not in 3.6 anyway).

https://github.com/khufkens/ecmwfr/blob/7ed5c96d969bf5b733671be8313350d03aa54de4/DESCRIPTION#L31

and potentially related to this: #49 (comment)

Thoughts @eliocamp ?

Keyring fails with env backend.

I'm trying to download from the CDS service from a server which doesn't have proper keyring support, so it uses environmental variables for storing keys. So far so good- However, it seems that there are subtle differences with the env backend such that some part of the code fails with this error:

#> Error in b_env_list(self, private, service, keyring) : 
  'service' is required for 'env' backend.

I'm posting this basically to remind myself of figuring the error out.

Integrate CDS ERA5 queries

https://cds.climate.copernicus.eu/api-how-to

Unindentified error in wf_request

Hi,

I used the wf_request many times in the past and it was working all right. Today, I tried to use it again but received an unidentified error that reads only: "Error: " (there is nothing written after ":" )

After receiving this message, I reinstalled the package from github, but the error persisted.

The basic code I'm running is as follows:

request <- list("dataset"        = "reanalysis-era5-pressure-levels",
                  "product_type"   = "reanalysis",
                  "variable"       = "temperature",
                  "pressure_level" = "850",
                  "year"           = "2000",
                  "month"          = "04",
                  "day"            = "04",
                  "time"           = "00:00",
                  "area"           = "70/-20/30/60",
                  "format"         = "netcdf",
                  "target"         = "era5-demo.nc")
  
  # Start downloading the data, the path of the file
  # will be returned as a variable (ncfile)
  ncfile <- wf_request(user = myUID, # <-- your USER ID HERE
                       request = request,   
                       transfer = TRUE,  
                       path = "~",
                       verbose = T)

And my system config

> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes

Thanks a lot.

support for era-5-land

Hello,

Recently, the ERA5-land dataset was released. I'm trying to use wf_request to obtain this data, but am unsure what I should put as my product_type and dataset variable in the request list. If you have any advice that would be much appreciated. Here is an example request:

# area of interest
ar <- paste0(5,"/",-2,"/",1,"/",2)

request <- list(
      area = ar,
      product_type = "reanalysis-era5-land",
      format = "netcdf",
      variable = c("2m_temperature", "2m_dewpoint_temperature", 
                   "10m_u_component_of_wind", "10m_v_component_of_wind"),
      year = "2007",
      month = c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", 
                "12"),
      day = c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", 
              "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", 
              "23", "24", "25", "26", "27", "28", "29", "30", "31"),
      time = c("00:00", "01:00", "02:00", "03:00", "04:00", "05:00", "06:00", 
               "07:00", "08:00", "09:00", "10:00", "11:00", "12:00", "13:00", 
               "14:00", "15:00", "16:00", "17:00", "18:00", "19:00", "20:00", 
               "21:00", "22:00", "23:00"),
      dataset = "reanalysis-era5-land",
      target = "test.nc")

Error in wf_check[[correct]] : attempt to select less than one element in get1index

Hi,

i am trying to download a massive amount of data from ERA5 and have been running into the error in the title. i have now gone back into to the demo on CRAN and am still getting the error, the script is straight from the demo:

Specify the data set

request <- list("dataset" = "reanalysis-era5-pressure-levels",
"product_type" = "reanalysis",
"variable" = "temperature",
"pressure_level" = "850",
"year" = "2000",
"month" = "04",
"day" = "04",
"time" = "00:00",
"area" = "70/-20/30/60",
"format" = "netcdf",
"target" = "era5-demo.nc")

Start downloading the data, the path of the file

will be returned as a variable (ncfile)

ncfile <- wf_request(user = "2088",
request = request,
transfer = TRUE,
path = "~",
verbose = FALSE)

any idea what could be causing this?

i am working in R version: 3.6.3-64bit and am using ecmwf version: 1.2.3, when i run the wf_get_key(user = "",service = "cds") it lists my correct key just as a reference.

Error without message

First of all, thank you for this nice package.

I'm trying to download mean sea leve pressure data with a code that worked for me in the past. I don't know why now it doesn't works:

request <- list(
  product_type = "reanalysis",
  format = "netcdf",
  variable = "mean_sea_level_pressure",
  year = c("1981", "1982", "1983", "1984", "1985", "1986", "1987", "1988", "1989", "1990", "1991", "1992", "1993", "1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015"),
  month = c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12"),
  day = c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", "28", "29", "30", "31"),
  time = c("00:00", "06:00", "12:00",  "18:00"),
  area = "55/-12/30/15",
  grid = "1.0/1.0",
  dataset = "reanalysis-era5-single-levels",
  target = "mslp_era5_1deg.nc"
)

file <- wf_request(user     = "8933",   # user ID (for authentification)
                   request  = request,  # the request
                   transfer = TRUE,     # download the file
                   path     = "DATA/")      # store data in current working directory

And the result:

Error:

As I said before this same code worked good in previous occasions. I have checked my API key and everything seems ok.
Thank you in advance!
Marc

Update API keys for unit tests

Funbags at ECMWF let keys expire every year so I have to do the little API replacement dance AGAIN.

Notify when download starts

Right now the package makes a request to the data url and downloads the content and if it's not a binary file, then it continues to poll the server in the same manner. The problem is that as a user, there's no real indication of when the file is ready and the script started to download. Furthermore, there's no way of indicating progress.

I seem to have missplaced the URL with the API documentation, but I remember that the url with the request ID was supposed to return a different status code for each case. In this case, then we could change this line and instead of getting the full content, getting only httr::status_code(response) and testing on it. One it get's the a-ok for download, then inform the user and get the content.

This setup would also allow to download files bigger than RAM, since we can potentially bypass loading the content into memory.

If you think it's a good idea, I could do it myself. However, as I said, I'm not finding the correct documentation page and I don't remember which status code was which. (embarrasing! 😳️)

requests of era5-single-levels give different results when setting different target parameter

Hello! I am running a request for reanalysis-era5-single-levels and i am having an issue. Apparently, when changing the "target" parameter, the resulting file of the request is affected. When changing the parameter from 1981_02_08_wind_u100.nc to 1981_02_09_wind_u100.nc the downloaded file has a very small size (as you can see attached)

I believe this is not a problem with the requests once the first one is successful, i confirm this by plotting the raster downloaded

 CDS_request <- list("dataset"                = "reanalysis-era5-single-levels",
                        "product_type"           = "reanalysis",
                        "variable"               = "100m_u_component_of_wind",
                        "year"                   =  "1981",
                        "month"                  =  "02",
                        "day"                    =  "08",
                        "time"                   = c("00:00", "01:00", "02:00", "03:00",
                                                     "04:00", "05:00", "06:00", "07:00",
                                                     "08:00", "09:00", "10:00", "11:00",
                                                     "12:00", "13:00", "14:00", "15:00",
                                                     "16:00", "17:00", "18:00", "19:00",
                                                     "20:00", "21:00", "22:00", "23:00"),
                        
                        "area"                   = "6.12/-75.12/-34.12/-33.87",
                        "format"                 = "netcdf",
                        "target"                 = "1981_02_08_wind_u100.nc"


    CDS_request <- list("dataset"                = "reanalysis-era5-single-levels",
                        "product_type"           = "reanalysis",
                        "variable"               = "100m_u_component_of_wind",
                        "year"                   =  "1981",
                        "month"                  =  "02",
                        "day"                    =  "08",
                        "time"                   = c("00:00", "01:00", "02:00", "03:00",
                                                     "04:00", "05:00", "06:00", "07:00",
                                                     "08:00", "09:00", "10:00", "11:00",
                                                     "12:00", "13:00", "14:00", "15:00",
                                                     "16:00", "17:00", "18:00", "19:00",
                                                     "20:00", "21:00", "22:00", "23:00"),
                        
                        "area"                   = "6.12/-75.12/-34.12/-33.87",
                        "format"                 = "netcdf",
                        "target"                 = "1981_02_09_wind_u100.nc"

Also, i am using a RStudio Server virtual machine. This could be related to the issue? What could be causing this?

Thanks in advance!

EDIT: i noticed that there is no issue when running any of the scripts on my local machine (Windows 10)

Default request as function default value

In wf_request a default is set for the request input parameter. I would set this as one of the required parameters (line 38+):

In addition, don't you think that the default transfer should be set to TRUE? I guess that the 'standard-user' might have easy access to the data and will not use ecmwfr as a 'queue-batch-submit script'?

wf_request <- function(
  email,
  path = tempdir(),
  time_out = ifelse(request$dataset == "mars", 3*3600, 3600),
  transfer = FALSE,
  request = list(stream = "oper",
                 levtype = "sfc",
                 param = "167.128",
                 dataset = "interim",
                 step = "0",
                 grid = "0.75/0.75",
                 time = "00",
                 date = "2014-07-01/to/2014-07-02",
                 type = "an",
                 class = "ei",
                 area = "50/10/51/11",
                 format = "netcdf",
                 target = "tmp.nc"),
  verbose = TRUE

request failure

Data identifier XXX is not found in Web API, CDS or ADS datasets. Or your login credentials do not match your request

Similar issue was already posted, but no solution found. Since yesterday I started to get this error (before there was no problems). Only thing I did - updated all libraries I use.
wf_get_key return my key (I tried 3 accounts already)
I tried 'reanalysis-era5-single-levels-preliminary-back-extension', 'reanalysis-era5-single-levels', 'seasonal-original-single-levels'

p.s. I even tried your example from github documentation
request <- list("dataset_short_name" = "reanalysis-era5-pressure-levels",
"product_type" = "reanalysis",
"variable" = "temperature",
"pressure_level" = "850",
"year" = "2000",
"month" = "04",
"day" = "04",
"time" = "00:00",
"area" = "70/-20/00/60",
"format" = "netcdf",
"target" = "era5-demo.nc")

My sample request (basically mimicking example above):
request = list('dataset_short_name'=dataset_short_name,'product_type'='reanalysis','grid'='0.1/0.1',
'variable'=my_var,'date'=my_dates,'time'=my_time,'area'=my_area,
'format'='netcdf','target'=my_target)

Fun which check validity wf_check_request of request gives error itself
encl$_hash(c(encl$_f_hash, args, lapply(encl$_additional, : object 'rlang_hash' not found

R- version I use 4.0.3.

p.s.2 after another re-update of all packages it works. sometimes. voodoo. maybe this is nothing to do with package itself then.

Problem with locked keyring

I updated my OS and had to setup the keys again in my computer and was getting this error:

Secret service keyring error in 'set': 'Cannot create an item in a locked collection'

I had to use keyring::keyring_unlock() and then it worked. I'm not familiar with managing keyrings but, would it be necessary to add this command before adding (or retrieving?) keys?

MARS or python based query

Hello, I can't find the MARS or python based queries you mentioned in Data Requests. Can you help me find out how? I searched for ECMWF a lot, but still could not find it.

Building requests

I was thinking about how to go about the issue of specifying requests. In my experience as a recent user, creating them by hand is really complicated because of the many options, unmemorable names of datasets and invalid combinations. It would be great to have a more intuitive way of specifying requests, but it seems a daunting task for the same reasons and also the vast variety of datasets and users.

Of course one can go to the web api and click the fields. But that gets annoying fast because the sheer amount of clicking required to get a somewhat complete dataset (37 clicks to get all levels in ERA Interim). It also has some limitations, like not being able to select more than one year in daily fields.

What I (and probably most people) do is go to the website to get a request skeleton for the type of data I want (monthly, pressure levels, etc...), paste it into R and build around it, adding dates, changing resolution, etc... I think that of formalising this process could be a good start.

My idea is to have "archetypes" that would be a combination of a list and optionally default arguments and a function that combines the archetype with new data to build the request. This is a rough sketch:

add_defaults <- function(object, ...) {
  defaults <- list(...)
  attr(object, "defaults") <- defaults
  object
}

build_request <- function(archetype, ...) {
  new <- list(...)
  data <- c(attr(archetype, "defaults"), new)
  lapply(archetype, function(x) as.character(glue::glue_data(data, x)))
}

Then one would go to the website and build a list like this:

ERAI_monthly_levs <- list(
  class = "ei",
  dataset = "interim",
  expver = "1",
  levtype = "pl",
  stream = "moda",
  type = "an",
  format = "netcdf",
  date = "{date}",
  grid = "{res}/{res}",
  levelist = "{levs}",
  param = "155.128",
  target = "output"
)

ERAI_monthly_levs <- add_defaults(res = 2.5, levs = 500)

where the strings with curly brackets are the variables that will be populated. To build a real request one would do something like:

request <- build_request(ERAI_monthly_levs, 
                         date = "20100101", 
                         res = "2.5")
str(request)
#> List of 12
#>  $ class   : chr "ei"
#>  $ dataset : chr "interim"
#>  $ expver  : chr "1"
#>  $ levtype : chr "pl"
#>  $ stream  : chr "moda"
#>  $ type    : chr "an"
#>  $ format  : chr "netcdf"
#>  $ date    : chr "20100101"
#>  $ grid    : chr "2.5/2.5"
#>  $ levelist: chr "500"
#>  $ param   : chr "155.128"
#>  $ target  : chr "output"

and then pass that to wf_request().

What do you think?

Dynamically set file extension provided in "target"

https://github.com/khufkens/ecmwfr/blob/a7e0ad9947af1f459e332e17e0b4e00f93cb9f82/R/wf_request.R#L151