adibender / coalitions Goto Github PK
View Code? Open in Web Editor NEWCoalition probabilities in multi-party democracies
Home Page: https://adibender.github.io/coalitions/
License: Other
Coalition probabilities in multi-party democracies
Home Page: https://adibender.github.io/coalitions/
License: Other
In draw_from_posterior
the correction is currently added to the simulated shares. It should however be added to the raw shares before simulation (of course with newly drawn runif
corrections for each simulation) as we want to adjust the raw shares with it and as different alpha values also affect the variance of the rdirichlet()
drawings, which in the end (slightly) changes the resulting distribution of simulated shares.
The hashtag on the updated website is still #koala17. Maybe you need a new one? #koalaneuwahlen, #koala18, #koala21, who knows? 🤣
So apparently coalitions has been archived on CRAN because I got ill and wasn't able to submit in time. However, it still doesn't appear to be compliant with CRAN policy, not sure however, what part exactly it refers to. I guess the skip_on_cran
part. I replaced it by skip_if_offline
in the current devel
branch. Hope this will do.
we fixed the issue and build in checks within the functions that use
internet resources. Additionally all test and examples that require
internet connection have been wrapped in either \dontrun{} or
skip_cran() statements.That does not comply with the policy, which does not apply only to usage
by CRAN.Unfortunately, I was not able to re-submit before today due to illness
and now it appears the package has already been archived.
Please advice on how to proceed.Resubmit when you have 'a fully compliant update'.
There is a new LTW survey where CDU and SPD have exactly the same result. Because of this, the pool_survey()
function brings up an error.
Cause of the error: The max_party
object has to be only one party (which is used for the effective sample size in the function), but with the two parties having the same share max_party
is a vector of the two parties.
Solution: Just add max_party <- max_party[1]
or something like that
Using coalitions::get_surveys()
with dplyr v0.7.1 throws an error: Evaluation error: Variable context not set
The error comes from gather(..., one_of(av.parties))
in collapse_parties
.
I don't know how to solve the error with one_of()
. My current workaround is to use cdu:sonstige
instead of the one_of()
argument
Even though get_surveys()
uses collapse_parties()
the NA values (e.g. Piraten in GMS) are not deleted in the way they should, leading to an error when applying draw_from_posterior()
to a survey with a Piraten and FW share of NA.
get_surveys() %>%
filter(institute == "gms") %>%
unnest() %>%
filter(datum == "2017-06-01") %>%
unnest()
institute datum start end befragte party percent votes
1 gms 2017-06-01 2017-05-26 2017-05-31 1016 cdu 39 396.24
2 gms 2017-06-01 2017-05-26 2017-05-31 1016 spd 23 233.68
3 gms 2017-06-01 2017-05-26 2017-05-31 1016 gruene 8 81.28
4 gms 2017-06-01 2017-05-26 2017-05-31 1016 fdp 10 101.60
5 gms 2017-06-01 2017-05-26 2017-05-31 1016 linke 8 81.28
6 gms 2017-06-01 2017-05-26 2017-05-31 1016 piraten NA NA
7 gms 2017-06-01 2017-05-26 2017-05-31 1016 fw NA NA
8 gms 2017-06-01 2017-05-26 2017-05-31 1016 afd 8 81.28
9 gms 2017-06-01 2017-05-26 2017-05-31 1016 sonstige 4 40.64
what does work is the following, so there's no problem with collapse_parties()
:
lookup <- coalitions:::institutes_df
address <- lookup$address[lookup$institute == "gms"]
coalitions::scrape_wahlrecht(address) %>%
filter(datum == "2017-06-01") %>%
coalitions::collapse_parties() %>%
unnest()
datum start end befragte party percent votes
1 2017-06-01 2017-05-26 2017-05-31 1016 cdu 39 396.24
2 2017-06-01 2017-05-26 2017-05-31 1016 spd 23 233.68
3 2017-06-01 2017-05-26 2017-05-31 1016 gruene 8 81.28
4 2017-06-01 2017-05-26 2017-05-31 1016 fdp 10 101.60
5 2017-06-01 2017-05-26 2017-05-31 1016 linke 8 81.28
6 2017-06-01 2017-05-26 2017-05-31 1016 afd 8 81.28
7 2017-06-01 2017-05-26 2017-05-31 1016 sonstige 4 40.64
devtools::release()
On the Ubuntu 15.04 server (running R 3.4.4, coalitions 0.6.4) get_surveys()
sometimes throws an error:
Error in mutate_impl(.data, dots) : Evaluation error: schannel: next InitializeSecurityContext failed: SEC_E_BUFFER_TOO_SMALL (0x80090321) - Der Puffer, der einer Funktion <fc>bergeben wurde, war zu klein..
I was also able to reproduce this on Windows (R 3.5.1, coalitions 0.6.4).
As I noted, it doesn't happen all the time, but by using for(i in 1:20) { coalitions::get_surveys() }
I was able to reproduce the error right away.
@adibender Do you know this error?
as_survey
output/without using purrr::map
Coverage down to 17%, needs to be improved as well as testing
html_table
(see PR #132)devtools::check()
utf8
to be fixed for Win develdevtools::release
Some of the newest surveys of Forschungsgruppe Wahlen are not scraped by get_surveys()
since the structure of their Sonstige
column was changed.
Structure of the newest surveys on wahlrecht.de:
Trying to scrape these data:
dat <- coalitions::get_surveys()
dat %>%
filter(pollster == "fgw") %>%
unnest() %>%
head()
# Result
# A tibble: 6 × 6
pollster date start end respondents survey
<chr> <date> <date> <date> <dbl> <list>
1 fgw 2021-08-13 2021-08-10 2021-08-12 1252 <tibble [7 × 3]>
2 fgw 2021-07-16 2021-07-13 2021-07-15 1224 <tibble [7 × 3]>
3 fgw 2021-06-25 2021-06-22 2021-06-24 1271 <tibble [7 × 3]>
4 fgw 2021-05-07 2021-05-04 2021-05-06 1271 <tibble [7 × 3]>
5 fgw 2021-04-16 2021-04-13 2021-04-15 1292 <tibble [7 × 3]>
6 fgw 2021-03-26 2021-03-23 2021-03-25 1030 <tibble [7 × 3]>
Pooled shares don't sum to 100%:
> dat <- get_surveys()
There were 14 warnings (use warnings() to see them)
> dat
# A tibble: 7 x 2
pollster surveys
<chr> <list>
1 allensbach <tibble [39 x 5]>
2 emnid <tibble [213 x 5]>
3 forsa <tibble [219 x 5]>
4 fgw <tibble [79 x 5]>
5 gms <tibble [94 x 5]>
6 infratest <tibble [102 x 5]>
7 insa <tibble [291 x 5]>
> pool_surveys(dat, last_date = as.Date("2017-11-21"))
# A tibble: 7 x 8
pollster date start end respondents party percent votes
<chr> <date> <date> <date> <dbl> <chr> <dbl> <dbl>
1 pooled 2017-11-21 2017-11-17 2017-11-21 2942.704 afd 20.602092 606.2587
2 pooled 2017-11-21 2017-11-17 2017-11-21 2942.704 cdu 51.859722 1526.0784
3 pooled 2017-11-21 2017-11-17 2017-11-21 2942.704 fdp 17.359794 510.8474
4 pooled 2017-11-21 2017-11-17 2017-11-21 2942.704 greens 18.002364 529.7564
5 pooled 2017-11-21 2017-11-17 2017-11-21 2942.704 left 15.652744 460.6140
6 pooled 2017-11-21 2017-11-17 2017-11-21 2942.704 others 7.038114 207.1109
7 pooled 2017-11-21 2017-11-17 2017-11-21 2942.704 spd 34.422840 1012.9624
tab <- coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/gms.htm")
tab <- tab[tab$datum >= "2016-10-01",] # only take data since october 2016
datum start end cdu spd gruene fdp linke piraten fw afd sonstige befragte
1 2017-05-04 2017-04-30 2017-05-03 36 29 7 7 8 NA NA 9 4 1.004
2 2017-03-23 2017-03-16 2017-03-22 34 31 8 6 8 NA NA 9 4 1.008
3 2017-02-09 2017-02-02 2017-02-08 33 29 9 6 8 NA NA 11 4 1.009
4 2017-01-05 2017-12-28 2017-01-04 36 20 10 7 10 NA NA 13 4 1.006
5 2016-11-03 2016-10-27 2016-11-02 33 22 11 7 11 NA NA 12 4 1.012
6 2016-10-13 2016-10-07 2016-10-12 34 21 12 7 10 NA NA 13 3 1.003
The following yields an error:
tab <- collapse_parties(tab) # reshape to long format
Error in collapse_parties(tab) :
Assertion on 'surveys' failed: Contains only missing values.
Hello, I'm not sure if I'm doing everything correctly, but I'm having trouble pooling some surveys.
The thing is that the pooled percent is not a percent but a large number. Here is a minimal reproducible example.
df <- tibble(pollster = c("poll1", "poll2"),
date = as.Date(c("2019-05-14", "2019-05-13")),
respondents = c(1000, 5000),
party = c("PSOE", "PSOE"),
percent = c(27.1, 28.3),
votes = c(7203000, 7405004))
surveys <- nest(df, survey = c(party, percent, votes))
surveys <- nest(surveys, surveys = c(date, respondents, survey))
x <- pool_surveys(surveys, pollsters = c("poll1", "poll2"), last_date = as.Date("2019-05-15"), period = 14, period_extended = 20)
x$percent
x$percent
[1] 243466.7
Am I doing something wrong?
Thanks for the package.
There is a "hidden" xml page at wahlrecht.de: http://www.wahlrecht.de/umfragen/bundesweite.xml
.
Could be used to get newest data relatively nicely formated directly:
df <- xmlToDataFrame(xmlParse(http://www.wahlrecht.de/umfragen/bundesweite.xml))
However, apparently only last 22 most recent results provided there.
Wahlrecht.de currently contains a duplicate entry for the INSA poll on 05.01.2021. scrape_wahlrecht
outputs both entries, which causes problems in subsequent functions.
Solution: Drop one of the duplicates in scrape_wahlrecht
.
Reproducible example:
library(coalitions)
library(dplyr)
# scrape data
dat <- scrape_wahlrecht("https://www.wahlrecht.de/umfragen/insa.htm")
dat %>% filter(date == "2021-01-05")
#> date start end cdu spd greens fdp left pirates fw afd
#> 1 2021-01-05 2021-01-01 2021-01-04 36 15 18 7.5 7.5 NA NA 11
#> 2 2021-01-05 2021-01-01 2021-01-04 36 15 18 7.5 7.5 NA NA 11
#> others respondents
#> 1 5 2072
#> 2 5 2072
Created on 2021-05-12 by the reprex package (v0.3.0)
Saarland, Schleswig-Holstein, NRW, Niedersachsen
get_seat_distribution
-> sls
, improvement possible? maybe using Rcpp? utils::globalVariables(..
list where possibleSimple example:
dHondt(c(250, 0, 100), c("A", "B", "C"), n_seats=3)
[1] 2 1
I believe the result is wrong - it schould be
[1] 2 0 1
If this is not fixed in rvest
, coalitions
will go down as well. If the issue will not be fixed, we would have to rewrite a lot of the scrappers.
get_surveys
as example data)Includes
Probable reason: Weird question marks in the table on wahlrecht.de: http://www.wahlrecht.de/umfragen/forsa.htm
Need functions that return how often specific events happened, e.g.
Alle Funktionen die Data Frames zurückliefern ggf. umstellen auf echtes "Long-Format".
Datum|Partei|Prozent|
Hello,
I used your package recently to make some simulations of seat distribution in the parliament. Specifically, I needed to use the Sainte-Lagüe formula and slightly modified your dhondt function. I leave it here just in case you think it's useful and add it to the package! :)
Best regards,
Pau Grau
slague <- function (votes, parties, n_seats) { divisor.mat <- sum(votes)/sapply(votes, "/", seq(1, n_seats, 2)) # The only difference from hondt is that 2 here colnames(divisor.mat) <- parties m.mat <- tidyr::gather(as.data.frame(divisor.mat), key = "name", value = "value", everything()) m.mat <- m.mat[rank(m.mat$value, ties.method = "random") <= n_seats, ] rle.seats <- rle(as.character(m.mat$name)) if (sum(rle.seats$length) != n_seats) stop(paste("Number of seats distributed not equal to", n_seats)) if (any(!(parties %in% rle.seats$values))) { missing_parties <- parties[!(parties %in% rle.seats$values)] for (party in missing_parties) { rle.seats$lengths <- c(rle.seats$lengths, 0) rle.seats$values <- c(rle.seats$values, party) } rle.seats$lengths <- rle.seats$lengths[match(parties, rle.seats$values)] rle.seats$values <- rle.seats$values[match(parties, rle.seats$values)] } rle.seats$length }
Errors are arising for INSA and ForschGrWahlen:
coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/insa.htm")
coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/politbarometer.htm")
Applying coalitions::get_surveys()
leads to warnings I've never had before:
e.g. In extract_num(cdu) : NAs durch Umwandlung erzeugt
Possibly a problem caused by the first line of the Emnid/Forsa/... tables on wahlrecht.de, which currently are no surveys, but show the btw17 result?
https://cran.r-project.org/web/checks/check_results_coalitions.html
Dear maintainer,
Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_coalitions.html.Please correct before 2018-10-08 to safely retain your package on CRAN.
Best,
-k
And again. It seems we need to remind you of the CRAN policy:
'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available (and not give a
check warning nor error).'Please submit a fully compliant update ASAP and before Jan 23.
@bauer-alex Do you want to create a hex-sticker for the package?
Example: https://github.com/tidyverse/dplyr/blob/master/man/figures/logo.png
Forsa released 2 surveys in one day. Appears to cause errors
Function takes surveys df + time-window specification and calculates the respective effective sample size, then return survey df with this sample size and weighted means of vote shares, etc.
library(coalitions)
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 8 7
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 8 7
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
(tested with latest version)
... to be able to define S3-Methods for different types of (nested) survey objects.
Kind of annoying to have to call nest()
/unnest()
all the time...
Most results are rounded to the next integer percentage.
Allow extra variation by correcting for rounding runif(-.5, +.5)
coalitions::get_surveys()
leads to Error in `[.data.frame`(atab, , !ind.empty) : undefined columns selected
.
This is caused by the latest update of Wahlrecht's GMS table, which now leads to two trailing NA columns in the following dataset:
atab <- read_html(address) %>%
html_nodes("table") %>% .[[2]] %>%
html_table(fill = TRUE)
However, scrape_wahlrecht()
currently cannot handle such NA columns.
https://cran.r-project.org/web/checks/check_results_coalitions.html
I think most of them are related to tbl_df
vs. as_tibble
, which has been fixed in the latest PR. But I also got "ripleyed":
Dear maintainer,
Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_coalitions.html.Please correct before 2020-08-29 to safely retain your package on CRAN.
The CRAN Team
This is not complying with the CRAN policy
'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available or has changed (and
not give a check warning nor error).'and taking 90m to fail its tests is anti-social.
@bauer-alex do you have time to look into this? Not sure where the 90m come from?
The scrape_austria
function isn't working anymore, obviously because the link to the json database is dead.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.