adibender / coalitions Goto Github PK

View Code? Open in Web Editor NEW

21.0 4.0 4.0 6.7 MB

Coalition probabilities in multi-party democracies

Home Page: https://adibender.github.io/coalitions/

License: Other

R 99.29% TeX 0.71%

r coalition-probabilities election-analysis rstats elections

coalitions's People

Contributors

Stargazers

Watchers

Forkers

fsolt zedtaha bauer-alex romainfrancois

coalitions's Issues

Application of survey correction not optimal

In draw_from_posterior the correction is currently added to the simulated shares. It should however be added to the raw shares before simulation (of course with newly drawn runifcorrections for each simulation) as we want to adjust the raw shares with it and as different alpha values also affect the variance of the rdirichlet() drawings, which in the end (slightly) changes the resulting distribution of simulated shares.

Hashtag on updated website

The hashtag on the updated website is still #koala17. Maybe you need a new one? #koalaneuwahlen, #koala18, #koala21, who knows? 🤣

Set up HP update via travis CI

see adibender/pammtools#79

Get coalitions back on CRAN

So apparently coalitions has been archived on CRAN because I got ill and wasn't able to submit in time. However, it still doesn't appear to be compliant with CRAN policy, not sure however, what part exactly it refers to. I guess the skip_on_cran part. I replaced it by skip_if_offline in the current devel branch. Hope this will do.

we fixed the issue and build in checks within the functions that use
internet resources. Additionally all test and examples that require
internet connection have been wrapped in either \dontrun{} or
skip_cran() statements.

That does not comply with the policy, which does not apply only to usage
by CRAN.

Unfortunately, I was not able to re-submit before today due to illness
and now it appears the package has already been archived.
Please advice on how to proceed.

Resubmit when you have 'a fully compliant update'.

Bug in pool_surveys()

There is a new LTW survey where CDU and SPD have exactly the same result. Because of this, the pool_survey() function brings up an error.

Cause of the error: The max_party object has to be only one party (which is used for the effective sample size in the function), but with the two parties having the same share max_party is a vector of the two parties.

Solution: Just add max_party <- max_party[1] or something like that

Error when using dplyr 0.7.1

Using coalitions::get_surveys() with dplyr v0.7.1 throws an error: Evaluation error: Variable context not set

The error comes from gather(..., one_of(av.parties)) in collapse_parties.

I don't know how to solve the error with one_of(). My current workaround is to use cdu:sonstige instead of the one_of() argument

Bug in get_surveys()

Even though get_surveys() uses collapse_parties() the NA values (e.g. Piraten in GMS) are not deleted in the way they should, leading to an error when applying draw_from_posterior() to a survey with a Piraten and FW share of NA.

get_surveys() %>%
filter(institute == "gms") %>%
unnest() %>%
filter(datum == "2017-06-01") %>%
unnest()

institute datum start end befragte party percent votes
1 gms 2017-06-01 2017-05-26 2017-05-31 1016 cdu 39 396.24
2 gms 2017-06-01 2017-05-26 2017-05-31 1016 spd 23 233.68
3 gms 2017-06-01 2017-05-26 2017-05-31 1016 gruene 8 81.28
4 gms 2017-06-01 2017-05-26 2017-05-31 1016 fdp 10 101.60
5 gms 2017-06-01 2017-05-26 2017-05-31 1016 linke 8 81.28
6 gms 2017-06-01 2017-05-26 2017-05-31 1016 piraten NA NA
7 gms 2017-06-01 2017-05-26 2017-05-31 1016 fw NA NA
8 gms 2017-06-01 2017-05-26 2017-05-31 1016 afd 8 81.28
9 gms 2017-06-01 2017-05-26 2017-05-31 1016 sonstige 4 40.64

what does work is the following, so there's no problem with collapse_parties():
lookup <- coalitions:::institutes_df
address <- lookup$address[lookup$institute == "gms"]
coalitions::scrape_wahlrecht(address) %>%
filter(datum == "2017-06-01") %>%
coalitions::collapse_parties() %>%
unnest()

   datum      start        end befragte    party percent  votes

1 2017-06-01 2017-05-26 2017-05-31 1016 cdu 39 396.24
2 2017-06-01 2017-05-26 2017-05-31 1016 spd 23 233.68
3 2017-06-01 2017-05-26 2017-05-31 1016 gruene 8 81.28
4 2017-06-01 2017-05-26 2017-05-31 1016 fdp 10 101.60
5 2017-06-01 2017-05-26 2017-05-31 1016 linke 8 81.28
6 2017-06-01 2017-05-26 2017-05-31 1016 afd 8 81.28
7 2017-06-01 2017-05-26 2017-05-31 1016 sonstige 4 40.64

Release coalitions 0.6.10

get_surveys() crashes from time to time

On the Ubuntu 15.04 server (running R 3.4.4, coalitions 0.6.4) get_surveys() sometimes throws an error:
Error in mutate_impl(.data, dots) : Evaluation error: schannel: next InitializeSecurityContext failed: SEC_E_BUFFER_TOO_SMALL (0x80090321) - Der Puffer, der einer Funktion <fc>bergeben wurde, war zu klein..

I was also able to reproduce this on Windows (R 3.5.1, coalitions 0.6.4).

As I noted, it doesn't happen all the time, but by using for(i in 1:20) { coalitions::get_surveys() } I was able to reproduce the error right away.

@adibender Do you know this error?

Make all functions work for "survey" objects

Should have versions of all functions that can be applied directly to e.g. as_survey output/without using purrr::map

Improve coverage and testing

Coverage down to 17%, needs to be improved as well as testing

Prepare new CRAN release

Error in scraping the new structure of fgw surveys

Some of the newest surveys of Forschungsgruppe Wahlen are not scraped by get_surveys() since the structure of their Sonstige column was changed.

Structure of the newest surveys on wahlrecht.de:

Trying to scrape these data:

dat <- coalitions::get_surveys()
dat %>% 
  filter(pollster == "fgw") %>% 
  unnest() %>%
  head()

# Result
# A tibble: 6 × 6
  pollster date       start      end        respondents survey          
  <chr>    <date>     <date>     <date>           <dbl> <list>          
1 fgw      2021-08-13 2021-08-10 2021-08-12        1252 <tibble [7 × 3]>
2 fgw      2021-07-16 2021-07-13 2021-07-15        1224 <tibble [7 × 3]>
3 fgw      2021-06-25 2021-06-22 2021-06-24        1271 <tibble [7 × 3]>
4 fgw      2021-05-07 2021-05-04 2021-05-06        1271 <tibble [7 × 3]>
5 fgw      2021-04-16 2021-04-13 2021-04-15        1292 <tibble [7 × 3]>
6 fgw      2021-03-26 2021-03-23 2021-03-25        1030 <tibble [7 × 3]>

`pool_surveys()` problem with two Forsa polls published on one day

Pooled shares don't sum to 100%:

> dat <- get_surveys()
There were 14 warnings (use warnings() to see them)
> dat
# A tibble: 7 x 2
    pollster            surveys
       <chr>             <list>
1 allensbach  <tibble [39 x 5]>
2      emnid <tibble [213 x 5]>
3      forsa <tibble [219 x 5]>
4        fgw  <tibble [79 x 5]>
5        gms  <tibble [94 x 5]>
6  infratest <tibble [102 x 5]>
7       insa <tibble [291 x 5]>
> pool_surveys(dat, last_date = as.Date("2017-11-21"))
# A tibble: 7 x 8
  pollster       date      start        end respondents  party   percent     votes
     <chr>     <date>     <date>     <date>       <dbl>  <chr>     <dbl>     <dbl>
1   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    afd 20.602092  606.2587
2   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    cdu 51.859722 1526.0784
3   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    fdp 17.359794  510.8474
4   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704 greens 18.002364  529.7564
5   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704   left 15.652744  460.6140
6   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704 others  7.038114  207.1109
7   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    spd 34.422840 1012.9624

`collapse_parties()` cannot handle only-NA columns

tab <- coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/gms.htm")
tab <- tab[tab$datum >= "2016-10-01",] # only take data since october 2016

   datum      start        end cdu spd gruene fdp linke piraten fw afd sonstige befragte

1 2017-05-04 2017-04-30 2017-05-03 36 29 7 7 8 NA NA 9 4 1.004
2 2017-03-23 2017-03-16 2017-03-22 34 31 8 6 8 NA NA 9 4 1.008
3 2017-02-09 2017-02-02 2017-02-08 33 29 9 6 8 NA NA 11 4 1.009
4 2017-01-05 2017-12-28 2017-01-04 36 20 10 7 10 NA NA 13 4 1.006
5 2016-11-03 2016-10-27 2016-11-02 33 22 11 7 11 NA NA 12 4 1.012
6 2016-10-13 2016-10-07 2016-10-12 34 21 12 7 10 NA NA 13 3 1.003

The following yields an error:
tab <- collapse_parties(tab) # reshape to long format
Error in collapse_parties(tab) :
Assertion on 'surveys' failed: Contains only missing values.

Add scrapper for Hessen election

pool_surveys() not pooling properly

Hello, I'm not sure if I'm doing everything correctly, but I'm having trouble pooling some surveys.

The thing is that the pooled percent is not a percent but a large number. Here is a minimal reproducible example.

df <- tibble(pollster = c("poll1", "poll2"),
             date = as.Date(c("2019-05-14", "2019-05-13")),
             respondents = c(1000, 5000),
             party = c("PSOE", "PSOE"),
             percent = c(27.1, 28.3),
             votes = c(7203000, 7405004))

surveys <- nest(df, survey = c(party, percent, votes))
surveys <- nest(surveys, surveys = c(date, respondents, survey))


x <- pool_surveys(surveys, pollsters = c("poll1", "poll2"), last_date = as.Date("2019-05-15"), period = 14, period_extended = 20)

x$percent

x$percent
[1] 243466.7

Am I doing something wrong?

Thanks for the package.

Use wahlrecht api?

There is a "hidden" xml page at wahlrecht.de: http://www.wahlrecht.de/umfragen/bundesweite.xml.
Could be used to get newest data relatively nicely formated directly:

df <- xmlToDataFrame(xmlParse(http://www.wahlrecht.de/umfragen/bundesweite.xml))

However, apparently only last 22 most recent results provided there.

Correctly handle duplicate wahlrecht entries

Wahlrecht.de currently contains a duplicate entry for the INSA poll on 05.01.2021. scrape_wahlrecht outputs both entries, which causes problems in subsequent functions.

Solution: Drop one of the duplicates in scrape_wahlrecht.

Reproducible example:

library(coalitions)
library(dplyr)

# scrape data
dat <- scrape_wahlrecht("https://www.wahlrecht.de/umfragen/insa.htm")

dat %>% filter(date == "2021-01-05")
#>         date      start        end cdu spd greens fdp left pirates fw afd
#> 1 2021-01-05 2021-01-01 2021-01-04  36  15     18 7.5  7.5      NA NA  11
#> 2 2021-01-05 2021-01-01 2021-01-04  36  15     18 7.5  7.5      NA NA  11
#>   others respondents
#> 1      5        2072
#> 2      5        2072

^{Created on 2021-05-12 by the reprex package (v0.3.0)}

Create new scraper functions for 2022 German state elections

Saarland, Schleswig-Holstein, NRW, Niedersachsen

speed improvements?

bottleneck in get_seat_distribution -> sls, improvement possible? maybe using Rcpp?

Update to tidyeval

make all functions compliant with tidyeval framework
remove variables from utils::globalVariables(.. list where possible

Bug in dHondt function in case of not ordered input data?

Simple example:

dHondt(c(250, 0, 100), c("A", "B", "C"), n_seats=3)
[1] 2 1

I believe the result is wrong - it schould be
[1] 2 0 1

CRAN issue with rvest

tidyverse/rvest#281

If this is not fixed in rvest, coalitions will go down as well. If the issue will not be fixed, we would have to rewrite a lot of the scrappers.

@bauer-alex

prepare CRAN release

add fixed data example data sets (subset of get_surveys as example data)
rewrite tests to use this internal data sets/remove dependency on internet connection for tests, where possible
compliance with tidyeval
~~- [ ] describe usage without using predefined scrappers (enter data manually)~~
Fix R-dev CI errors
JOSS submission/publication
Update all addresses to https (especially on homepage)
Submit to CRAN
Add Git Tag + GitHub release

Add support for LTW

Includes

scrappers for LTW tables , e.g. http://www.wahlrecht.de/umfragen/landtage/niedersachsen.htm
impementation of dHondt compatible with new implementations.

get_surveys() does not scrape Forsa poll from 10.12.17

Probable reason: Weird question marks in the table on wahlrecht.de: http://www.wahlrecht.de/umfragen/forsa.htm

Add vignette on pooling

Usage + methods

scrape_wahlrecht not working

scrape_wahlrecht isn't working anymore...

implement pooling

number of times functions

Need functions that return how often specific events happened, e.g.

SPD more seats than CDU/CSU and vice versa
FDP not getting over 5% Hurdle

scrappers for upcomming federal elections

Sachsen
Brandenburg
Thüringen

Datenformat umstellen

Alle Funktionen die Data Frames zurückliefern ggf. umstellen auf echtes "Long-Format".

Datum|Partei|Prozent|

Dates in pooled table aren't correct

Adding Sainte-Lagüe formula

Hello,

I used your package recently to make some simulations of seat distribution in the parliament. Specifically, I needed to use the Sainte-Lagüe formula and slightly modified your dhondt function. I leave it here just in case you think it's useful and add it to the package! :)

Best regards,

Pau Grau

slague <- function (votes, parties, n_seats) { divisor.mat <- sum(votes)/sapply(votes, "/", seq(1, n_seats, 2)) # The only difference from hondt is that 2 here colnames(divisor.mat) <- parties m.mat <- tidyr::gather(as.data.frame(divisor.mat), key = "name", value = "value", everything()) m.mat <- m.mat[rank(m.mat$value, ties.method = "random") <= n_seats, ] rle.seats <- rle(as.character(m.mat$name)) if (sum(rle.seats$length) != n_seats) stop(paste("Number of seats distributed not equal to", n_seats)) if (any(!(parties %in% rle.seats$values))) { missing_parties <- parties[!(parties %in% rle.seats$values)] for (party in missing_parties) { rle.seats$lengths <- c(rle.seats$lengths, 0) rle.seats$values <- c(rle.seats$values, party) } rle.seats$lengths <- rle.seats$lengths[match(parties, rle.seats$values)] rle.seats$values <- rle.seats$values[match(parties, rle.seats$values)] } rle.seats$length }

Include scraper functions for Rhineland-Palatinate

Scraping data still not working for some institutes

Errors are arising for INSA and ForschGrWahlen:
coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/insa.htm")
coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/politbarometer.htm")

get_surveys() possibly needs refinement

Applying coalitions::get_surveys() leads to warnings I've never had before:
e.g. In extract_num(cdu) : NAs durch Umwandlung erzeugt

Possibly a problem caused by the first line of the Emnid/Forsa/... tables on wahlrecht.de, which currently are no surveys, but show the btw17 result?

sls doesn't preserve ordering of parties

fix CRAN errors

https://cran.r-project.org/web/checks/check_results_coalitions.html

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_coalitions.html.

Please correct before 2018-10-08 to safely retain your package on CRAN.

Best,
-k

And again. It seems we need to remind you of the CRAN policy:

'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available (and not give a
check warning nor error).'

Please submit a fully compliant update ASAP and before Jan 23.

Hex sticker?

@bauer-alex Do you want to create a hex-sticker for the package?
Example: https://github.com/tidyverse/dplyr/blob/master/man/figures/logo.png

error in get_seats

Forsa released 2 surveys in one day. Appears to cause errors

Add wrapper function for pooled survey

Function takes surveys df + time-window specification and calculates the respective effective sample size, then return survey df with this sample size and weighted means of vote shares, etc.

Recalculating same input data multiple times - gives different results (depending on what?)

library(coalitions)
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 8 7
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 8 7
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6

(tested with latest version)

Need to introduce special survey class...

... to be able to define S3-Methods for different types of (nested) survey objects.
Kind of annoying to have to call nest()/unnest() all the time...

scrapper for LTW hessen broken

"sonstige" column in newest poll contains two entries ("FW 1,5%, "Sonst. 2.5%") -> whole row will be ignored

Add option for rounding errors

Most results are rounded to the next integer percentage.
Allow extra variation by correcting for rounding runif(-.5, +.5)

Error caused by newly introduced NA columns in GMS Wahlrecht table

coalitions::get_surveys()

leads to Error in `[.data.frame`(atab, , !ind.empty) : undefined columns selected.
This is caused by the latest update of Wahlrecht's GMS table, which now leads to two trailing NA columns in the following dataset:

atab <- read_html(address) %>%
html_nodes("table") %>% .[[2]] %>%
html_table(fill = TRUE)

However, scrape_wahlrecht() currently cannot handle such NA columns.

New failures on CRAN

https://cran.r-project.org/web/checks/check_results_coalitions.html

I think most of them are related to tbl_df vs. as_tibble, which has been fixed in the latest PR. But I also got "ripleyed":

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_coalitions.html.

Please correct before 2020-08-29 to safely retain your package on CRAN.

The CRAN Team

This is not complying with the CRAN policy

'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available or has changed (and
not give a check warning nor error).'

and taking 90m to fail its tests is anti-social.

@bauer-alex do you have time to look into this? Not sure where the 90m come from?

scrape_austria not working

The scrape_austria function isn't working anymore, obviously because the link to the json database is dead.