edwindj / cbsodatar Goto Github PK
View Code? Open in Web Editor NEWStatistics Netherlands (CBS) OpenData API Client for R
Home Page: https://edwindj.github.io/cbsodataR
Statistics Netherlands (CBS) OpenData API Client for R
Home Page: https://edwindj.github.io/cbsodataR
Hi!
I ran into the cbs_get_meta()
error (Error in strsplit(params$`$select`, ", ") : non-character argument
) that was reported and fixed by @sarahouweling a few days ago.
The CRAN version is not yet updated, so the solution for now is to install from github.
devtools::install_github("edwindj/cbsodataR")
Thanks for creating this awesome package!
Hi Edwindj,
Als ik mijn R script via de terminal wil starten (ipv via Rstudio), dan krijg ik de melding dat setwd() afgetrapt wordt in het package. Dit mag niet via terminal scripts. Kun je de paden dynamisch maken, zodat je niet afhankelijk bent van setwd()?
Het zit 'm in cbs_download_meta.R
Why is there a number behind the colnames? Is this how you receive the data from CBS?
# Get the data (doodsoorzaken)
doodsoorzaken <- get_data('81452NED')
colnames(doodsoorzaken)
[1] "ID" "Geslacht"
[3] "Leeftijd" "Perioden"
[5] "TotaalDodelijkeOngevallen_1" "TotaalDodelijkeVervoersongevallen_2"
[7] "Voetganger_3" "Fiets_4"
[9] "BromEnSnorfietsEnBrommobiel_5" "GemotInvalidenvoertuigScootmobiel_6"
[11] "Motorfiets_7" "Personenauto_8"
[13] "BestelautoVrachtauto_9" "OverigOnbekend_10"
[15] "AccidenteleVal_11" "AccidenteleVerdrinking_12"
[17] "TotaalAccidenteleVergiftiging_13" "Medicijnen_14"
[19] "Drugs_15" "Alcohol_16"
[21] "OverigOnbekend_17" "TotaalOverigeDodelijkeOngevallen_18"
[23] "MechanischEffect_19" "RookVuurEnVlammen_20"
[25] "Verstikking_21" "OverigInclLaatGevolg_22"
[27] "TotaalDodelijkeOngevallen_23" "DodelijkeVervoersongevallen_24"
[29] "AccidenteleVal_25" "AccidenteleVerdrinking_26"
[31] "AccidenteleVergiftigingInclOpzetOnb_27" "OverigeDodelijkeOngevallen_28"
I am not so sure what the recode
argument does, but is doesn't change anything.
version 0.3 cbsodata may generate an error on some windows configurations.
This is due to proxy confgurations and will be solved in a coming version.
cbs_get_data
fails, trying to connect to http://opendata.cbs.nl
. Fixed in version 0.5.2
.
Remedies for now:
catalog
parameter, that works... (and was intended to make the base_url obsolete): https://edwindj.github.io/cbsodataR/reference/cbs_get_data.htmlverbose=TRUE
, for now.progress=TRUE
for now.Thanks to Mirjam Zengers for reporting
This is a feature request, not a bug I think.
Use case: I am interested in table 85067NED, "Gebieden in Nederland"
library(cbsodataR)
library(dplyr)
toc = cbs_get_toc()
filter(toc, Identifier=='85067NED') |> select(Identifier, Title, Period)
Identifier Title Period
<chr> <chr> <chr>
1 85067NED Gebieden in Nederland 2022 2022
However, newer versions of the table may become available and I would like to always get the most recent version. I don't see any pattern to the Identifier of this table and previous versions. I thought cbs_search
might do the trick but it returns only one version of the table and that is an old one.
s=cbs_search('gebieden in nederland',language='nl')
select(s, Title, Identifier, Period)
Title Identifier Period
<chr> <chr> <chr>
1 "Regionale kerncijfers Nederland" 70072ned Jaarcijfers 1995 - 2023
2 "Conjunctuurenquête Nederland; kwartaal, bedrijfstakken" 82435NED 1e kwartaal 2012 - 1e kwartaal 20…
3 "Banen van werknemers in december; economische activiteit (SBI2008), regio" 83582NED 2010-2021
4 "Bevolkingsontwikkeling; regio per maand" 37230ned Januari 2002 - maart 2023
5 "Bedrijven; bedrijfstak" 81589NED 2007 kwI - 2023 kw II
6 "Winning, invoer en uitvoer van materialen naar soort; nationale rekeningen " 83180NED 1996-2020
7 "Bodemgebruik; uitgebreide gebruiksvorm, per gemeente" 70262ned 1996, 2000, 2003, 2006, 2008, 201…
8 "Winning, invoer en uitvoer materialen per continent; nationale rekeningen" 83177NED 2004-2020
9 "Onderwijsinstellingen; grootte, soort, levensbeschouwelijke grondslag" 03753 1990/'91 - 2021/'22
10 "Gebieden in Nederland 2020" 84721NED 2020
So my request: is there a way, or could there be a way, in which I can get the most recent version of a specific table? And a second note: it is not clear to me from the help where cbs_search
actually searches. Apparently not in the title or it would have found the most recent version as well.
Many thanks for the package by the way, it greatly simplifies things.
Hi
I run the following code (2 lines):
It errors out with:
library(cbsodataR)
dt <- cbs_download_data(id='84910NED')
Error in isTRUE(catalog != "CBS") :
promise already under evaluation: recursive default argument reference or earlier problems?
I re-installed R, RStudio and installed the cbsodataR package again but no difference.
Can you help me?
In version 0.3 cbs_get_toc fails when either:
(Thanks to Rob van Harrevelt for reporting!)
Currently all data columns are characters. This is part of the API of Statistics Netherlands, which uses multiple special values, e.g. "not possible", "unknown", "strictly zero".
An option for get_data
and download_table
could be to automatically change data columns into numeric columns, thereby changing these special values into NA
.
I am only interested in data from the gemeentes from the 'Kerncijfers wijken en buurten'.
So my code looks like this:
cbs_get_data("84583NED", SoortRegio_2 = "Gemeente ", verbose = T)
(with two spaces after 'gemeente'). This does not seem to work however, all types of regions are loaded. Any idea why?
Could it possibly have something to do with the '_' in the column name?
SN data contains temporal indicators: (years)YYYYJJ00, (months) YYYYMMxx, and quarters (YYYYKWxx). A utiltiy function that converts them to date/datetimes would be helpful.
The help documentation is pretty thin. It could be greatly improved by added a few lines in the @description
section and adding at a @return
everywhere.
It seems to be a good idea to strip whitespace. See the output below
> t <- get_data('37556')
Writing TableInfos.csv...
Writing DataProperties.csv...
Writing CategoryGroups.csv...
Writing Perioden.csv...
Retrieving data from table '37556'
Done!
> t$Mannen_2
[1] " ." " 2521" " 2550" " 2584" " 2622" " 2660" " 2699" " 2737"
[9] " 2777" " 2817" " 2855" " 2899" " 2945" " 2987" " 3037" " 3088"
[17] " 3141" " 3188" " 3236" " 3282" " 3311" " 3352" " 3410" " 3465"
[25] " 3516" " 3574" " 3629" " 3683" " 3735" " 3785" " 3838" " 3886"
[33] " 3943" " 4006" " 4068" " 4124" " 4177" " 4221" " 4264" " 4307"
[41] " 4353" " 4408" " 4454" " 4497" " 4530" " 4558" " 4603" " 4634"
[49] " 4748" " 4838" " 4926" " 4998" " 5084" " 5146" " 5198" " 5256"
[57] " 5321" " 5391" " 5460" " 5529" " 5619" " 5686" " 5754" " 5838"
[65] " 5924" " 6001" " 6091" " 6178" " 6262" " 6317" " 6383" " 6465"
[73] " 6550" " 6624" " 6676" " 6722" " 6772" " 6837" " 6872" " 6907"
[81] " 6945" " 6994" " 7048" " 7082" " 7103" " 7124" " 7150" " 7185"
[89] " 7224" " 7274" " 7317" " 7358" " 7420" " 7480" " 7535" " 7586"
[97] " 7627" " 7662" " 7697" " 7740" " 7793" " 7846" " 7910" " 7972"
[105] " 8015" " 8046" " 8066" " 8077" " 8089" " 8112" " 8156" " 8203"
[113] " 8243" " 8283" " 8307" " 8334" " 8373" " 8417"
Maybe add strip.white=TRUE
to line https://github.com/edwindj/cbsodataR/blob/master/R/get-data.R#L31? Not tested.
Hi Edwin,
Bij het onderstaande gebruiken van de functie cbs_add_columns(): krijg ik een foutmelding alsvolgt
##download cbs package
#devtools::install_github("edwindj/cbsodataR")
##libraries
lib <- c("cbsodataR", "dplyr")
##Attach
lapply(lib, require, character.only = TRUE)
##Specificeer de 21 Haarlemse wijken
Wijken <- c("WK039201 ", "WK039202 ", "WK039203 ", "WK039204 ", "WK039205 ", "WK039206 ", "WK039207 ", "WK039208 ", "WK039209 ", "WK039210 ",
"WK039211 ", "WK039212 ", "WK039213 ", "WK039214 ", "WK039215 ", "WK039216 ", "WK039217 ", "WK039218 ", "WK039219 ",
"WK039220 ", "WK039221 ")
##Haal de data op
kern <- cbs_get_data("84286NED",
select = c("WijkenEnBuurten",
"Gemeentenaam_1",
"SoortRegio_2", "Codering_3",
"PersonenautoSTotaal_89",
"PersonenautoSBrandstofBenzine_90",
"PersonenautoSOverigeBrandstof_91",
"PersonenautoSPerHuishouden_92",
"PersonenautoSNaarOppervlakte_93",
"GemiddeldeHuishoudensgrootte_32"),
WijkenEnBuurten = Wijken) %>% cbs_add_label_columns()
krijg ik een foutmelding als volgt:
Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels, :
factor level [177] is duplicated
Ik gebruik zowel een macbook als een windows 10 computer. Op de windows computer krijg ik de labels gewoon terug. Terwijl op de mac de bovenstaande foutmelding wordt gegenereerd.
Ik hoor graag of jij een idee hebt waar deze error vandaan komt.
> ##Toon sessie
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.4
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_0.8.0.1 cbsodataR_0.3.2 forcats_0.3.0 stringr_1.4.0
[5] purrr_0.3.1 readr_1.1.1 tidyr_0.8.3 tibble_2.0.1
[9] tidyverse_1.2.1 spData_0.2.9.4 leaflet_2.0.2 plotly_4.8.0
[13] ggplot2_3.1.0 shiny_1.3.2
loaded via a namespace (and not attached):
[1] tidyselect_0.2.5 haven_1.1.2 lattice_0.20-35
[4] colorspace_1.3-2 htmltools_0.3.6 viridisLite_0.3.0
[7] yaml_2.1.19 rlang_0.3.1 later_0.8.0
[10] pillar_1.3.1 glue_1.3.0 withr_2.1.2
[13] readxl_1.1.0 modelr_0.1.2 plyr_1.8.4
[16] cellranger_1.1.0 munsell_0.5.0 gtable_0.2.0
[19] rvest_0.3.2 devtools_1.13.5 htmlwidgets_1.3
[22] memoise_1.1.0 knitr_1.20 httpuv_1.5.0
[25] crosstalk_1.0.0 curl_3.2 broom_0.5.0
[28] Rcpp_1.0.0 xtable_1.8-2 scales_1.0.0
[31] promises_1.0.1 backports_1.1.2 jsonlite_1.6
[34] mime_0.6 hms_0.4.2 digest_0.6.18
[37] stringi_1.3.1 grid_3.4.4 cli_1.0.1
[40] tools_3.4.4 magrittr_1.5 lazyeval_0.2.1
[43] crayon_1.3.4 whisker_0.3-2 pkgconfig_2.0.2
[46] rsconnect_0.8.12 xml2_1.2.0 data.table_1.10.4-3
[49] lubridate_1.7.4 assertthat_0.2.0 httr_1.4.0
[52] rstudioapi_0.7 R6_2.4.0 nlme_3.1-131.1
[55] git2r_0.21.0 compiler_3.4.4
Met vriendelijke groet,
Tobias Brils
The naming of the functions will change: prefix with cbs_
from version 0.3 and on.
cbsodataR haalt data op via http://opendata.cbs.nl/ODataApi/odata/xxx
op veel plekken mogen http api's niet meer binnengehaald worden. Is het mogelijk er een https verbinding van te maken?
Many tables of SN contain regional data: it would be nice if these tables can be converted into sf
objects.
However:
Hello,
Is it an idea to change the license into a license with less restrictions like the BSD license? This package is hard to include in some projects due to the GPL license.
What are your thoughts about this?
Converting the cbs date to the regular date format does not seem to work:
-----------code snippet----------------
library(cbsodataR)
dt <- cbs_get_data("70895ned")
dt <- cbs_add_date_column (dt)
head (dt)
-----------output----------------
Geslacht LeeftijdOp31December Perioden Perioden_Date Perioden_freq Overledenen_1
1 1100 10000 1995X000 394
2 1100 10000 1995W101 2719
3 1100 10000 1995W102 2823
4 1100 10000 1995W103 2609
5 1100 10000 1995W104 2664
6 1100 10000 1995W105 2577
-----------end-----------------
The code did add the colums but it did not fill in the correct date and the correct frequency.
n.b.: Not shown here is that in case of a year-date, it is filled in correctly for both.
Hi Edwin,
very nice package.
In your examples you show how to filter on a specific variable-value:
cbs_get_data(id="03759ned", Perioden=c("2013JJ00","2014JJ00"), Geslacht="T001038")
however, in this large file of 45 million records I want to filter to e.g.
Perioden > "1990JJ00",
Geslacht != "T001038",
! Leeftijd %in% c(10000, 60100,60200,60300,60400,60500,60600,60700,60800,60900,21900)
Is that possible?
What would be the correct syntax for "not equal", "greater then" or "not in"?
Or filter substr(RegioS,1,2)="GM" filtering just municipalities :-)
Hi,
I am trying to pull out data but it seems I have a problem with the proxy and I get this error.
tables_en <- cbs_get_toc(Language="en")
Error in curl::curl_fetch_memory(url, handle = handle) :
Failed to connect to opendata.cbs.nl port 443: Connection refused
Could you help me to get a solution to it?
Best,
Nmta
cbs_get_data(id = "84378NED")
error:
Error in open.connection(con, "rb") :
cannot open the connection to 'http://opendata.cbs.nl/ODataApi/odata/84378NED'
In addition: Warning messages:
1: In open.connection(con, "rb") :
URL 'http://opendata.cbs.nl/ODataApi/odata/84378NED': status was 'Server returned nothing (no headers, no data)'
2: Failing: http://opendata.cbs.nl/ODataApi/odata/84378NED
Retrying...
3: In open.connection(con, "rb") :
URL 'http://opendata.cbs.nl/ODataApi/odata/84378NED': status was 'Server returned nothing (no headers, no data)'
Any idea? The site does show a page when I load it in the browser..
It would be helpful if a progressbar was shown when downloading a table
id <- '80302ned'
library(cbsodataR)
meta <- cbs_get_meta(id,verbose=TRUE,cache=TRUE)
View(meta)
View(meta$Voertuigtypes)
View(meta$Perioden)
data <- cbs_get_data(id)
View(data)
# Voeg de metadata toe aan de data zelf
data2 <- cbs_add_label_columns(data)
View(data2)
# werkt wel bij jaar maar niet bij voertuigtype
# in de data is voertuigtype een getal en in de metadata een string b.v. 0 en '00', mogelijk is dat de oorzaak?
# expliciet benoemen helpt ook niet
data2 <- cbs_add_label_columns(data,columns = 'Voertuigtypes')
The title says it all. I may do a PR if get round to it.
Dear Edwin,
thank you for developing this great package! I tried to run it in the microdata-remoteaccess environment of CBS but i could not get it running. The same code worked perfectly fine outside the environment, but it simply did not download the data (even though the URL is accessible in the environment).
Are you familiar with this problem?
Best,
Benedikt
When there are too many filter statements, the request URL to the odata portal becomes too long and the request fails. Currently, the resulting error message is very uninformative. Most people probably don't know how to address this error.
Example:
library(cbsodataR)
tbl <- "70072ned"
meta <- cbs_get_meta(tbl)
gemeentes <- meta$RegioS$Key
gemeentes <- gemeentes[grep("^GM", gemeentes)]
jaren <- meta$Perioden$Key
onderwerpen <- c("TotaleBevolking_1",
"JongerDan5Jaar_4",
"k_5Tot10Jaar_5",
"k_10Tot15Jaar_6",
"k_15Tot20Jaar_7",
"k_20Tot25Jaar_8",
"k_25Tot45Jaar_9",
"k_45Tot65Jaar_10",
"k_65Tot80Jaar_11",
"k_80JaarOfOuder_12",
"Bevolkingsdichtheid_57",
"VestigingUitAndereGemeente_69",
"VertrekNaarAndereGemeente_70",
"BinnenlandsMigratiesaldo_71",
"BinnenlandsMigratiesaldoRelatief_72",
"VerhuismobiliteitRelatief_73",
"Bevolkingsgroei_79",
"TotaalAantalParticuliereHuishoudens_82",
"VoorraadOp1Januari_90",
"Nieuwbouwwoningen_91",
"Woningen_97",
"GemiddeldeWoningwaarde_99",
"TotaleOppervlakte_187")
dta <- cbs_get_data(id = tbl, Perioden = jaren,
RegioS = gemeentes,
select = c("Perioden", "RegioS", onderwerpen))
This result in either the following error message:
Error in curl::curl_fetch_memory(url, handle = handle) :
OpenSSL SSL_read: SSL_ERROR_SYSCALL, errno 104
Or sometimes the following:
Error in get_json(url, verbose = verbose) :
Request-URI Too Long (HTTP 414). Failed to Client error: (414) Request-URI Too Long.
Perhaps add the following lines to cbs_download_data
after url <- URLencode(url)
:
if (nchar(url) > 2000L)
warning(paste0(c("The request URL is longer than 2000 characters. ",
"This could cause the request to fail on some platforms. ",
"If so, try to reduce the number of filter statements and filter the data afterwards.")))
Or, catch the error: something like:
res <- get_json(url, verbose = verbose)
tryCatch({
res <- get_json(url, verbose = verbose),
}, error = function(e) {
warning <- if (nchar(url) < 2000L) "" else
paste0(c("\n\nThe request URL is longer than 2000 characters. ",
"This could cause the request to fail on some platforms. ",
"Try to reduce the number of filter statements and ",
"filter the data afterwards."))
stop("Request failed with the following message:\n", e$message, warning)
})
The security certificates on https://opendata.cbs.nl have been updated, causing that IE11 and Windows 7 and Windows 8 have trouble connecting to the open data server.
Add the following lines to your R script (before making any calls with cbsodataR
Sys.setenv(CURL_SSL_BACKEND = "openssl")
options("url.method" = "libcurl")
Thanks to Jasper Dupont for reporting the issue and problem.
Hello Edwin,
Maybe it is good to write somewhere in the README that you have to load dplyr to run this example. For the more inexperienced R users... I was trying to call View
, but it fails without dyplr.
> get_data('71509ENG') %>% select(2:5) %>% head
Source: local data frame [6 x 4]
FruitFarmingRegions Periods TotalAppleVarieties_1 CoxSOrangePippin_2
1 Total Netherlands 1997 420 43
2 Total Netherlands 1998 518 40
3 Total Netherlands 1999 568 39
4 Total Netherlands 2000 461 27
5 Total Netherlands 2001 408 30
6 Total Netherlands 2002 354 17
Very nice package btw. Is CBS going to support it?
Kind regards Jonathan
Hi,
I'am having problems with loading the 70747ned table, other tables seem to be alright. When I try to get the table I get the following:
Attempt 1
code:
fname = "70747ned"
bucket1 <- cbs_get_data(id = fname)
Output:
Error: parse error: premature EOF { "odata.metadata":"https:// (right here) ------^ In addition: Warning message: In fun(libname, pkgname) : couldn't connect to display ":0"
Attempt 2
code:
fname = "70747ned"
cbs_download_table(id=fname, dir=fname)
output:
[1] "https://opendata.cbs.nl/ODataFeed/odata/70747ned/UntypedDataSet?$format=json&$skip=7060000"
Writing...
Reading...
[1] "https://opendata.cbs.nl/ODataFeed/odata/70747ned/UntypedDataSet?$format=json&$skip=7070000"
Error: parse error: premature EOF
{ "odata.metadata":"https://
(right here) ------^
Regards,
Maarten.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.