rich-iannone / stationary Goto Github PK

View Code? Open in Web Editor NEW

250.0 17.0 32.0 38.47 MB

Get hourly meteorological data from one of thousands of global stations

Home Page: http://rich-iannone.github.io/stationaRy/

License: Other

R 100.00%

r met-data dataset global

stationary's Introduction

stationaRy

Overview

Get meteorological data from met stations located all over the world. That’s what you can do with this R package. There are LOTS of stations too (29,729 available in this dataset) and many have data that go pretty far back in time. The data comes from the Integrated Surface Dataset (ISD), which is maintained by the National Oceanic and Atmospheric Administration (NOAA).

Retrieving Met Data with a `station_id`

Let’s get some met data from La Guardia Airport in New York City (the station ID value is "725030-14732"). This station has a pretty long history (starting operations in 1973) but we’ll just obtain data from the years of 2017 and 2018.

lga_met_data <- 
  get_met_data(
    station_id = "725030-14732",
    years = 2017:2018
  )

lga_met_data
#> # A tibble: 17,520 x 10
#>    id    time                 temp    wd    ws atmos_pres dew_point    rh
#>    <chr> <dttm>              <dbl> <dbl> <dbl>      <dbl>     <dbl> <dbl>
#>  1 7250… 2017-01-01 00:00:00   7.2   230   5.7      1012.      -4.4  43.5
#>  2 7250… 2017-01-01 01:00:00   7.8   230   4.6      1012.      -3.9  43.4
#>  3 7250… 2017-01-01 02:00:00   7.2   230   3.6      1012.      -2.2  51.3
#>  4 7250… 2017-01-01 03:00:00   7.8   240   5.7      1013.      -3.3  45.4
#>  5 7250… 2017-01-01 04:00:00   7.8   240   4.6      1013.      -3.9  43.4
#>  6 7250… 2017-01-01 05:00:00   8.3   240   4.6      1014.      -4.4  40.4
#>  7 7250… 2017-01-01 06:00:00   8.3   250   5.1      1015.      -3.9  41.9
#>  8 7250… 2017-01-01 07:00:00   8.3   260   5.7      1016.      -3.3  43.8
#>  9 7250… 2017-01-01 08:00:00   8.3   240   5.1      1017.      -2.8  45.5
#> 10 7250… 2017-01-01 09:00:00   8.3   260   6.2      1019.      -2.8  45.5
#> # … with 17,510 more rows, and 2 more variables: ceil_hgt <dbl>,
#> #   visibility <dbl>

Discovering Met Stations

At a minimum we need a station’s identifier to obtain its met data. We can start the process of getting an identifier by accessing the entire catalog of station metadata with the get_station_metadata() function. The output tibble has station id values in the first column. Let’s get a subset of stations from that: those stations that are located in Norway.

stations_norway <- 
  get_station_metadata() %>%
  dplyr::filter(country == "NO")

stations_norway
#> # A tibble: 405 x 16
#>    id    usaf  wban  name  country state icao    lat   lon  elev begin_date
#>    <chr> <chr> <chr> <chr> <chr>   <chr> <chr> <dbl> <dbl> <dbl> <date>    
#>  1 0100… 0100… 99999 BOGU… NO      <NA>  ENRS   NA   NA     NA   2001-09-27
#>  2 0100… 0100… 99999 JAN … NO      <NA>  ENJA   70.9 -8.67   9   1931-01-01
#>  3 0100… 0100… 99999 ROST  NO      <NA>  <NA>   NA   NA     NA   1986-11-20
#>  4 0100… 0100… 99999 SORS… NO      <NA>  ENSO   59.8  5.34  48.8 1986-11-20
#>  5 0100… 0100… 99999 BRIN… NO      <NA>  <NA>   61.4  5.87 327   1987-01-17
#>  6 0100… 0100… 99999 RORV… NO      <NA>  <NA>   64.8 11.2   14   1987-01-16
#>  7 0100… 0100… 99999 FRIGG NO      <NA>  ENFR   60.0  2.25  48   1988-03-20
#>  8 0100… 0100… 99999 VERL… NO      <NA>  <NA>   80.0 16.2    8   1986-11-09
#>  9 0100… 0100… 99999 HORN… NO      <NA>  <NA>   77   15.5   12   1985-06-01
#> 10 0100… 0100… 99999 NY-A… NO      <NA>  ENAS   78.9 11.9    8   1973-01-01
#> # … with 395 more rows, and 5 more variables: end_date <date>,
#> #   begin_year <int>, end_year <int>, tz_name <chr>, years <list>

This table can be even more greatly reduced to isolate the stations of interest. For example, we could elect to get only high-altitude stations (above 1000 meters) in Norway.

norway_high_elev <-
  stations_norway %>% 
  dplyr::filter(elev > 1000)

norway_high_elev
#> # A tibble: 12 x 16
#>    id    usaf  wban  name  country state icao    lat   lon  elev begin_date
#>    <chr> <chr> <chr> <chr> <chr>   <chr> <chr> <dbl> <dbl> <dbl> <date>    
#>  1 0122… 0122… 99999 MANN… NO      <NA>  <NA>   62.4  7.77 1294  2010-03-15
#>  2 0123… 0123… 99999 HJER… NO      <NA>  <NA>   62.2  9.55 1012  2010-09-07
#>  3 0134… 0134… 99999 MIDT… NO      <NA>  <NA>   60.6  7.27 1162  2011-11-25
#>  4 0135… 0135… 99999 FINS… NO      <NA>  <NA>   60.6  7.53 1208  2003-03-30
#>  5 0135… 0135… 99999 FINS… NO      <NA>  <NA>   60.6  7.5  1224  1973-01-02
#>  6 0135… 0135… 99999 SAND… NO      <NA>  <NA>   60.2  7.48 1250  2004-01-07
#>  7 0136… 0136… 99999 JUVV… NO      <NA>  <NA>   61.7  8.37 1894  2009-06-26
#>  8 0136… 0136… 99999 SOGN… NO      <NA>  <NA>   61.6  8    1413  1979-03-01
#>  9 0137… 0137… 99999 KVIT… NO      <NA>  <NA>   61.5 10.1  1028  1973-01-01
#> 10 0143… 0143… 99999 MIDT… NO      <NA>  <NA>   59.8  6.98 1081  1973-01-01
#> 11 0144… 0144… 99999 BLAS… NO      <NA>  <NA>   59.3  6.87 1105. 1973-01-01
#> 12 0146… 0146… 99999 GAUS… NO      <NA>  <NA>   59.8  8.65 1804. 2014-06-05
#> # … with 5 more variables: end_date <date>, begin_year <int>, end_year <int>,
#> #   tz_name <chr>, years <list>

The station IDs from the tibble can be transformed into a vector of station IDs with dplyr::pull().

norway_high_elev %>% dplyr::pull(id)
#>  [1] "012200-99999" "012390-99999" "013460-99999" "013500-99999" "013510-99999"
#>  [6] "013520-99999" "013620-99999" "013660-99999" "013750-99999" "014330-99999"
#> [11] "014400-99999" "014611-99999"

Suppose you’d like to collect several years of met data from a particular station and fetch only the observations that meet some set of conditions. Here’s an example of obtaining temperatures above 15 degrees Celsius from the high-altitude "JUVVASSHOE" station in Norway and adding a column with temperatures in degrees Fahrenheit.

station_data <- 
  get_station_metadata() %>%
  dplyr::filter(name == "JUVVASSHOE") %>%
  dplyr::pull(id) %>%
  get_met_data(years = 2011:2019)

high_temp_data <-
  station_data %>%
  dplyr::select(id, time, wd, ws, temp) %>% 
  dplyr::filter(temp > 16) %>%
  dplyr::mutate(temp_f = ((temp * (9/5)) + 32) %>% round(1)) %>%
  dplyr::arrange(dplyr::desc(temp_f))

high_temp_data
#> # A tibble: 50 x 6
#>    id           time                   wd    ws  temp temp_f
#>    <chr>        <dttm>              <dbl> <dbl> <dbl>  <dbl>
#>  1 013620-99999 2019-07-26 15:00:00   160     5  18.5   65.3
#>  2 013620-99999 2019-07-26 17:00:00   210     3  18.4   65.1
#>  3 013620-99999 2019-07-26 18:00:00   180     2  18.3   64.9
#>  4 013620-99999 2019-07-26 16:00:00   180     4  18.2   64.8
#>  5 013620-99999 2014-07-23 16:00:00   270     2  17.6   63.7
#>  6 013620-99999 2019-07-26 14:00:00   150     4  17.5   63.5
#>  7 013620-99999 2014-07-23 17:00:00   300     4  17.3   63.1
#>  8 013620-99999 2019-07-28 16:00:00   130     6  17.3   63.1
#>  9 013620-99999 2014-07-23 18:00:00   280     3  17.2   63  
#> 10 013620-99999 2018-07-04 15:00:00   340     2  17.2   63  
#> # … with 40 more rows

Additional Data Fields

There can be a substantial amount of additional met data beyond wind speed, ambient temperature, etc. However, these additional fields can vary greatly across stations. The nomenclature for the additional categories of data uses ‘two-letter + digit’ identifiers (e.g., AA1, GA1, etc.). Within each category are numerous fields, where the variables are coded as [identifer]_[index]). More information about these additional data fields can be found in this PDF document.

To find out which categories of additional data fields are available for a station, we can use the station_coverage() function. You’ll get a tibble with the available additional categories and their counts over the specified period.

additional_data_fields <-
  get_station_metadata() %>%
  dplyr::filter(name == "JUVVASSHOE") %>%
  dplyr::pull(id) %>%
  station_coverage(years = 2015)

additional_data_fields
#> # A tibble: 87 x 3
#>    id           category count
#>    <chr>        <chr>    <int>
#>  1 013620-99999 AA1          0
#>  2 013620-99999 AB1          0
#>  3 013620-99999 AC1          0
#>  4 013620-99999 AD1          0
#>  5 013620-99999 AE1          0
#>  6 013620-99999 AG1          0
#>  7 013620-99999 AH1          0
#>  8 013620-99999 AI1          0
#>  9 013620-99999 AJ1        194
#> 10 013620-99999 AK1          0
#> # … with 77 more rows

We can use purrr’s map_df() function to get additional data field coverage for a subset of stations (those that are near sea level and have data in 2019). With the station_coverage() function set to output tibbles in wide mode (one row per station, field categories as columns, and counts of observations as values), we can ascertain which stations have the particular fields we need.

stns <- 
  get_station_metadata() %>%
  dplyr::filter(country == "NO", elev <= 5 & end_year == 2019)

coverage_tbl <- 
  purrr::map_df(
    seq(nrow(stns)),
    function(x) {
      stns %>%
        dplyr::pull(id) %>%
        .[[x]] %>%
        station_coverage(
          years = 2019,
          wide_tbl = TRUE
        )
    }
  )

coverage_tbl
#> # A tibble: 1 x 88
#>   id       AA1   AB1   AC1   AD1   AE1   AG1   AH1   AI1   AJ1   AK1   AL1   AM1
#> * <chr>  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1 01167…   491     0     0     0     0     0     0     0   167     0     0     0
#> # … with 75 more variables: AN1 <int>, AO1 <int>, AP1 <int>, AU1 <int>,
#> #   AW1 <int>, AX1 <int>, AY1 <int>, AZ1 <int>, CB1 <int>, CF1 <int>,
#> #   CG1 <int>, CH1 <int>, CI1 <int>, CN1 <int>, CN2 <int>, CN3 <int>,
#> #   CN4 <int>, CR1 <int>, CT1 <int>, CU1 <int>, CV1 <int>, CW1 <int>,
#> #   CX1 <int>, CO1 <int>, CO2 <int>, ED1 <int>, GA1 <int>, GD1 <int>,
#> #   GF1 <int>, GG1 <int>, GH1 <int>, GJ1 <int>, GK1 <int>, GL1 <int>,
#> #   GM1 <int>, GN1 <int>, GO1 <int>, GP1 <int>, GQ1 <int>, GR1 <int>,
#> #   HL1 <int>, IA1 <int>, IA2 <int>, IB1 <int>, IB2 <int>, IC1 <int>,
#> #   KA1 <int>, KB1 <int>, KC1 <int>, KD1 <int>, KE1 <int>, KF1 <int>,
#> #   KG1 <int>, MA1 <int>, MD1 <int>, ME1 <int>, MF1 <int>, MG1 <int>,
#> #   MH1 <int>, MK1 <int>, MV1 <int>, MW1 <int>, OA1 <int>, OB1 <int>,
#> #   OC1 <int>, OE1 <int>, RH1 <int>, SA1 <int>, ST1 <int>, UA1 <int>,
#> #   UG1 <int>, UG2 <int>, WA1 <int>, WD1 <int>, WG1 <int>

For the "KAWAIHAE" station in Hawaii, some interesting data fields are available. In particular, its SA1 category provides sea surface temperature data, where the sa1_1 and sa1_2 variables represent the sea surface temperature and its quality code.

Combining the use of get_met_data() with functions from dplyr, we can create a table of the mean ambient and sea-surface temperatures by month. The additional data is included in the met data table by using the add_fields argument and specifying the "SA1" category (multiple categories can be included).

kawaihae_sst <- 
  get_met_data(
    station_id = "997173-99999",
    years = 2017:2018,
    add_fields = "SA1"
  ) %>%
  dplyr::mutate(
    year = lubridate::year(time),
    month = lubridate::month(time)
  ) %>%
  dplyr::filter(sa1_2 == 1) %>%
  dplyr::group_by(year, month) %>%
  dplyr::summarize(
    avg_temp = mean(temp, na.rm = TRUE),
    avg_sst = mean(sa1_1, na.rm = TRUE)
  )

kawaihae_sst
#> # A tibble: 6 x 4
#> # Groups:   year [2]
#>    year month avg_temp avg_sst
#>   <dbl> <dbl>    <dbl>   <dbl>
#> 1  2017    12     24.0    25.7
#> 2  2018     1     23.8    25.2
#> 3  2018     2     23.7    25.1
#> 4  2018     3     23.8    25.0
#> 5  2018     4     25.6    26.3
#> 6  2018    12     26.5    25.9

Installation

The stationaRy package can be easily installed from CRAN.

install.packages("stationaRy")

To install the development version of stationaRy, use the following:

install.packages("devtools")
remotes::install_github("rich-iannone/stationaRy")

If you encounter a bug, have usage questions, or want to share ideas to make this package better, feel free to file an issue.

License

MIT © Richard Iannone

stationary's People

Contributors

Stargazers

Watchers

stationary's Issues

`Return` section in documentation

It would be nice if the documentation of get_isd_station_data had a description of what the vectors in the resulting data frame represent. I may not be very good at troubleshooting some of the problems I find, but if it's okay with you, I can at least make some contribution in writing out some of this documentation.

For reference, the appropriate source information on this is, I believe,
http://www1.ncdc.noaa.gov/pub/data/ish/ish-format-document.pdf

Issue with get-ncdc_station_data

From the list of stations, I wish to obtain detailed data for the LAKEFRONT AIRPORT station
USAF 722315 WBAN 53917 in Louisiana

station_data <- get_ncdc_station_data(station_id = "722315-53917",
                                  startyear = 2014,
                                  endyear = 2015)

Error in if (tz != "UTC") { : missing value where TRUE/FALSE needed

Issue pulling in data with get_ncdc_station_data

Trying to work through your examples, I get an error

library(stationaRy)
library(pipeR)

tofino_airport_2005_2010 <- 
  get_ncdc_station_info() %>>%
    select_ncdc_station(name = "tofino", number = 1) %>>%
    get_ncdc_station_data(startyear = 2005, endyear = 2010)

Error in col_positions$begin : $ operator is invalid for atomic vectors

I get the same error when working through the other examples as well. I've run each line in the example above separately, and have no issues pulling in the station info.

EDIT/UPDATE: I reinstalled stationaRy using install.packages("stationaRy")rather than though github (using devtools::install_git("rich-iannone/stationaRy")) and now I can pull in the data properly.

Thoughts?

Not clear that local_tz is being implemented

This is for the West Vancouver station which has gn_gmtoffset and rawoffsets of -8

get_isd_station_data("717840-99999", startyear=2014,endyear= 2015) %>% 
filter(year==2015&month==1&day==26&!is.na(temp)) %>% 
  select(year,day,hour,temp)

   year day hour temp
1  2015  26    0 12.3
2  2015  26    0 12.3
3  2015  26    1 11.2
4  2015  26    2 10.2
5  2015  26    3  9.8
6  2015  26    4  9.3
7  2015  26    5  9.0
8  2015  26    6  8.0
9  2015  26    6  8.0
10 2015  26    7  8.0

I figure the hour=0 temp (for which BTW there are two values which are not always the same) is the 4pm reading of the 25th

Applying local_tz = FALSE

get_isd_station_data("717840-99999", startyear=2014,endyear= 2015,local_tz = FALSE) %>% 
filter(year==2015&month==1&day==26&!is.na(temp)) %>% 
  select(year,day,hour,temp)

produces the same output

Downloaded 0 bytes Error in if (file.exists(data_files[i])) { : argument is of length zero

I am collecting weather data in a loop from several stations by country. But sometimes there seems to be no data and my script breaks. Could you suggest what to do to continue the loop in such cases?

Regards,
Claudio

get_isd_station_data(station_id = "64080-99999",
startyear = 2014,
endyear = 2015)
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/064080-99999-2013.gz'
downloaded 0 bytes
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2014/064080-99999-2014.gz'
downloaded 0 bytes
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2015/064080-99999-2015.gz'
downloaded 0 bytes
Error in if (file.exists(data_files[i])) { : argument is of length zero

Release stationaRy 0.5.0

Prepare for release:

Submit to CRAN:

usethis::use_version('patch')
Update cran-comments.md
devtools::submit_cran()
Approve email

Wait for CRAN...

get_met_data() returns NAs and warnings regarding the column 'temp'

lga_met_data <- get_met_data( station_id = "725030-14732", years = 2017:2018 )
Returns 4 warning messages like this:
"1: Problem with mutate() column temp.
i temp = dplyr::case_when(temp == 9999 ~ NA_real_, TRUE ~ temp/10).
i One or more parsing issues, see problems() for details"

Then:
View(lga_met_data[1:10,])
It shows that the temp column just has NAs, inconsistent with the output shown in the example.

R version: 4.1.1
stationaRy version: 0.5.1

Include USA Remote Automatic Weather Stations (RAWS) climate archive

Looove this package. Any chance you are working including the RAWS data as well?

embedded nul in string: 'y\r\n\0<'

I can't completely rule out the I'm the problem, but I'm mostly certain it's an issue with reading in the data.

I can run this code from your README file just fine

tofino_airport_2005_2010 <- 
  get_isd_stations() %>%
  select_isd_station(name = "tofino", number = 1) %>%
  get_isd_station_data(startyear = 2005, endyear = 2010)

But running this code gives me an error

bergen_point_met <- 
  get_isd_stations() %>%
  select_isd_station(name = "bergen point") %>%
  get_isd_station_data(startyear = 2006, endyear = 2015)

Error in collectorsGuess(source, tokenizer, n = 100) : 
  embedded nul in string: 'y\r\n\0<'

I get the same error when I try to pull some local data

get_isd_stations() %>%
  select_isd_station(name = "FORT CAMPBELL / TEST") %>%
  get_isd_station_data(startyear = 2000, endyear = 2005)

get_isd_stations() %>%
  select_isd_station(name = "CAMPBELL AAF AIRPORT") %>%
  get_isd_station_data(startyear = 2013, endyear = 2015)

I'm not quite sure where this issue is. I may try to look at it more closely from home later this week. Sorry I can't offer a more specific idea of where the problem might be.

R version 3.2.1 (2015-06-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] stationaRy_0.3 dplyr_0.4.2    httr_0.6.1     devtools_1.8.0

loaded via a namespace (and not attached):
 [1] Rcpp_0.11.6     digest_0.6.8    assertthat_0.1  bitops_1.0-6    plyr_1.8.3     
 [6] R6_2.0.1        DBI_0.3.1       git2r_0.10.1    magrittr_1.5    stringi_0.5-2  
[11] lazyeval_0.1.10 curl_0.9        xml2_0.1.1      tools_3.2.1     readr_0.1.1    
[16] downloader_0.3  stringr_1.0.0   RCurl_1.95-4.6  parallel_3.2.1  rversions_1.0.1
[21] memoise_0.2.1

Mistake in AJ1 identifiers in README table

Minor issue, but I believe the individual snow ("AJ1") identifiers in the README table are incorrect. They should run from 1-6 not 6-11, i.e.:

Category	Identifier	Column Name
`...`	`...`	`...`
`AJ1`	`aj1_1`	`aj1_snow_depth_dimension`
`AJ1`	`aj1_2`	`aj1_snow_depth_condition_code`
`AJ1`	`aj1_3`	`aj1_snow_depth_quality_code`
`AJ1`	`aj1_4`	`aj1_snow_depth_equiv_water_depth_dimension`
`AJ1`	`aj1_5`	`aj1_snow_depth_equiv_water_condition_code`
`AJ1`	`aj1_6`	`aj1_snow_depth_equiv_water_quality_code`

Add internal functions to download cached tables

tests fail with dev version of readr

checking tests ... ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  1 710040 99999 CYPRESS BOWL FREESTYLE      CA       49.400 -123.2   969  2007
  2 715620 99999 CYPRESS BOWL SNOWBOARD      CA       49.383 -123.2  1180  2010
  3 716930 99999     CYPRESS BOWL SOUTH      CA       49.383 -123.2   886  2007
  Variables not shown: end (dbl), gmt_offset (dbl), time_zone_id (chr),
    country_name (chr), country_code (chr), iso3166_2_subd (chr), fips10_4_subd
    (chr)
  testthat results ================================================================
  OK: 30 SKIPPED: 0 FAILED: 2
  1. Error: get_isd_station_data returns correct number of columns 
  2. Error: error messages are provided in certain situations 

  Error: testthat unit tests failed
  Execution halted

I did a bit of exploration, and it looks like this is because readr now longer guesses that strings with leading 0s are integers. You can fix it by manually specifying col_types (and I think, in general, that's a better strategy when you're reading files with a predefined format).

I'm about to submit to CRAN today, so you might want to prepare a fix in the near future. Apologies for not spotting this earlier - I'm not sure why it didn't crop up when I did the previous round of revdep checks.

Hi! Is there any explanation of the abbreviation of each feature, such as wd, ws, atmos_pres and so on?

get_isd_station_data drops

I'm running a loop that grabs a bunch of weather stations, one after another, pulls the data, and binds the rows into one huge data frame. The loop goes smoothly, but then will just freeze on a random year/weather station. Each time I run it, it will stop at a different point, sometimes on the first station in the loop, sometimes (agonizingly) on the last. It will merely show me the last (next?) attempted download:

trying URL 'ftp://ftp.ncdc.noaa.gov/....'

and just hang forever. FWIW, this is what happens when I try on my office computer, which is still running Mac OSX Mavericks, R 3.2.3, and all of the latest associated packages (as of 2016-03-07). I'm downloading data from Southeast Asia (Cambodia, Laos, Singapore, Taiwan, Philippines, Malaysia).

In the same problem but a different situation (potentially a separate issue), I gave this a try on my home laptop, which is running Mac OSX El Capitan. Whenever I try to pull data I get the attached error message.

It's like it can't even connect to the server, but my internet works for everything else. Perplexing.

Is the github version the same as the CRAN version? Both appear to give the same errors. If so, I may fork and fiddle to see if I can get something that works.

Thanks for making this program, it has been immensely helpful.

Handle parenthesis in station names for select_isd_station

library(dplyr)
library(stationaRy)

#get Alaska stations
stations <- get_isd_stations(lower_lat = 50.943870,
                             upper_lat = 71.545427,
                             lower_lon = -179.652659,
                             upper_lon = -125.405937) %>%
  filter(country == "US", state == "AK")

  a_row <- stations[13,]
  #doesn't work
  station_data <- a_row %>%
  select_isd_station(name = a_row$name) %>%
  get_isd_station_data(startyear = a_row$begin,
                       endyear = a_row$end)

  #works 
  a_row %>%
    select_isd_station(name = "WAINWRIGHT \\(DEW\\)") %>%
    get_isd_station_data(startyear = a_row$begin,
                         endyear = a_row$end)

  # add a function to handle special characters from http://stackoverflow.com/questions/14836754/is-there-an-r-function-to-escape-a-string-for-regex-characters
re.escape <- function(strings){
    vals <- c("\\\\", "\\[", "\\]", "\\(", "\\)", 
              "\\{", "\\}", "\\^", "\\$","\\*", 
              "\\+", "\\?", "\\.", "\\|")
    replace.vals <- paste0("\\\\", vals)
    for(i in seq_along(vals)){
        strings <- gsub(vals[i], replace.vals[i], strings)
    }
    strings
}
  #works
  a_row %>%
    select_isd_station(name = re.escape(a_row$name)) %>%
    get_isd_station_data(startyear = a_row$begin,
                         endyear = a_row$end)

expanded_column_names

Hi,
very useful package, nice work! Just a question: why did you define the expanded_column_names in get_isd_station_data without using them in the output?

Error message when running loop

Hi,

I'm attempting to get station data using get_isd_station. Here's part of my loop:

#########################
assign(paste(id, "_data", sep=""), get_isd_station_data(station_id = id,
startyear = year(start),
endyear = year(end)))

      #Collecting data for time period
      temp_station <- get(paste(id, "_data", sep="")) %>%
        dplyr::mutate(date = ymd(paste(year, month, day, sep="-"))) %>%
        dplyr::filter(date >= start, date <= end) %>%
        as.data.frame()

##########################
but I get the following error message:

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1980/946720-99999-1980.gz'
downloaded 31 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1981/946720-99999-1981.gz'
downloaded 34 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1982/946720-99999-1982.gz'
downloaded 31 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1983/946720-99999-1983.gz'
downloaded 33 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1984/946720-99999-1984.gz'
downloaded 33 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1949/037720-99999-1949.gz'
downloaded 258 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1950/037720-99999-1950.gz'
downloaded 263 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1951/037720-99999-1951.gz'
downloaded 262 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1952/037720-99999-1952.gz'
downloaded 261 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1971/911820-22521-1971.gz'
downloaded 100 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1972/911820-22521-1972.gz'
downloaded 103 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1973/911820-22521-1973.gz'
downloaded 201 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1974/911820-22521-1974.gz'
downloaded 228 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1975/911820-22521-1975.gz'
downloaded 218 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2002/747043-99999-2002.gz'
downloaded 117 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2003/747043-99999-2003.gz'
downloaded 117 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2004/747043-99999-2004.gz'
downloaded 122 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/747043-99999-2005.gz'
downloaded 125 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2006/747043-99999-2006.gz'
Error in if (file.info(file.path(temp_folder, data_file_to_download))$size > :
missing value where TRUE/FALSE needed
In addition: Warning message:
In download.file(url, ...) : InternetOpenUrl failed: ''
###########################
Some files are OK but I was not able to download from station 747043-99999. Would anyone be able to help?

Missing years causes error

For a handful of stations, there seem to be some missing years between the begin and end date, which cause an error when selected. For instance station id 483565-99999 (Sakon Nakhon, Thailand) produces an error for the following code:

get_ncdc_station_data(station_id = "483565-99999", startyear = 1999, endyear = 2014)

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1998/483565-99999-1998.gz'
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/1999/483565-99999-1999.gz'
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2000/483565-99999-2000.gz'
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2001/483565-99999-2001.gz'
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2002/483565-99999-2002.gz'
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2003/483565-99999-2003.gz'
ftp data connection made, file length 1940 bytes
==================================================
downloaded 1940 bytes

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2004/483565-99999-2004.gz'
ftp data connection made, file length 4974 bytes
==================================================
downloaded 4974 bytes

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2005/483565-99999-2005.gz'
ftp data connection made, file length 37845 bytes
==================================================
downloaded 36 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2006/483565-99999-2006.gz'
ftp data connection made, file length 39581 bytes
==================================================
downloaded 38 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2007/483565-99999-2007.gz'
ftp data connection made, file length 43194 bytes
==================================================
downloaded 42 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2008/483565-99999-2008.gz'
ftp data connection made, file length 48048 bytes
==================================================
downloaded 46 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2009/483565-99999-2009.gz'
ftp data connection made, file length 44746 bytes
==================================================
downloaded 43 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2010/483565-99999-2010.gz'
ftp data connection made, file length 34278 bytes
==================================================
downloaded 33 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2011/483565-99999-2011.gz'
ftp data connection made, file length 57183 bytes
==================================================
downloaded 55 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2012/483565-99999-2012.gz'
ftp data connection made, file length 70566 bytes
==================================================
downloaded 68 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/483565-99999-2013.gz'
ftp data connection made, file length 62401 bytes
==================================================
downloaded 60 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2014/483565-99999-2014.gz'
ftp data connection made, file length 71820 bytes
==================================================
downloaded 70 KB

Error in tryCatchOne(expr, names, parentenv, handlers[[1L]]) : 
  attempt to apply non-function

However this function works properly when startyear = 2004. It seems as though years 1999-2003 are missing. I'm running a loop to get a bunch of data and would prefer a warning message for missing years, but an output of years in the data set.

(Besides this issue, this package has been really useful and easy to use, thanks for making it!)

Include some basic mapping functionality?

I made an Rpubs that shows how to do some basic mapping using facilities from the oce package (for which I am a co-author), found here:

https://rpubs.com/clarkrichards/86727

@rich-iannone expressed an interest in including some mapping functionality in stationaRy, so this discussion is to narrow down what that would entail and how it would look. Some questions are:

What functionality could be added that would differ from directly connecting with and using other packages that provide mapping (e.g. oce, sp, etc)?
What kinds of maps would be general enough to include? Some examples could include:

Station locations (as in the Rpub)
Met values (e.g. temperature or whatever) for a certain date at a variety of stations
Gridded (i.e. spatially interpolated) met fields
... ?

Do you need to include lubridate?

Deploying to shinyapps.io, got the error

Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : 
there is no package called �lubridate�
ERROR: lazy loading failed for package �stationaRy�

Added library(lubridate) in app and seemed to solve problem

Create documentation for the additional met fields

There are dozens upon dozens of additional met fields. They are currently hard to discover and not documented in the package (there’s an external doc but it’s a slog to get through). There should be some documentation here in this package.

Can't find "file.size" function

Hi,
when i run this code
met_data <- get_isd_station_data(station_id = "165220-99999", startyear = 2009, endyear = 2015)
the library starts to get files from ftp.ncdc.noaa.gov correctly, but at the endof the proces i get this message
can't find file.size function

Update history_tbl for 2021

Hi - thanks very for maintaining this package. It's great!

Is it possible to update the history_tbl to include 2021 please? As far as I can tell the CRAN version runs up to 2020 at the moment. Not sure if there is a way of doing this myself (would be a great feature if so).

Thanks,

Erik

get_isd_station_data("717840-99999", startyear=2010,endyear= 2013)
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2010/717840-99999-2010.gz'
downloaded 249 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2011/717840-99999-2011.gz'
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2012/717840-99999-2012.gz'
trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2013/717840-99999-2013.gz'
downloaded 272 KB

trying URL 'ftp://ftp.ncdc.noaa.gov/pub/data/noaa/2014/717840-99999-2014.gz'
downloaded 304 KB

It seems a bit unlikely that there was no data during the 2010/11 - although I note that there was a nearby station which has begin=2007 and end=2013