Git Product home page Git Product logo

coalitions's People

Contributors

adibender avatar andinatro avatar bauer-alex avatar fsolt avatar hadley avatar hssnippets avatar romainfrancois avatar schade-rebekka avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

coalitions's Issues

Application of survey correction not optimal

In draw_from_posterior the correction is currently added to the simulated shares. It should however be added to the raw shares before simulation (of course with newly drawn runifcorrections for each simulation) as we want to adjust the raw shares with it and as different alpha values also affect the variance of the rdirichlet() drawings, which in the end (slightly) changes the resulting distribution of simulated shares.

Hashtag on updated website

The hashtag on the updated website is still #koala17. Maybe you need a new one? #koalaneuwahlen, #koala18, #koala21, who knows? 🤣

Get coalitions back on CRAN

So apparently coalitions has been archived on CRAN because I got ill and wasn't able to submit in time. However, it still doesn't appear to be compliant with CRAN policy, not sure however, what part exactly it refers to. I guess the skip_on_cran part. I replaced it by skip_if_offline in the current devel branch. Hope this will do.

we fixed the issue and build in checks within the functions that use
internet resources. Additionally all test and examples that require
internet connection have been wrapped in either \dontrun{} or
skip_cran() statements.

That does not comply with the policy, which does not apply only to usage
by CRAN.

Unfortunately, I was not able to re-submit before today due to illness
and now it appears the package has already been archived.
Please advice on how to proceed.

Resubmit when you have 'a fully compliant update'.

Bug in pool_surveys()

There is a new LTW survey where CDU and SPD have exactly the same result. Because of this, the pool_survey() function brings up an error.

Cause of the error: The max_party object has to be only one party (which is used for the effective sample size in the function), but with the two parties having the same share max_party is a vector of the two parties.

Solution: Just add max_party <- max_party[1] or something like that

Error when using dplyr 0.7.1

Using coalitions::get_surveys() with dplyr v0.7.1 throws an error: Evaluation error: Variable context not set

The error comes from gather(..., one_of(av.parties)) in collapse_parties.

I don't know how to solve the error with one_of(). My current workaround is to use cdu:sonstige instead of the one_of() argument

Bug in get_surveys()

Even though get_surveys() uses collapse_parties() the NA values (e.g. Piraten in GMS) are not deleted in the way they should, leading to an error when applying draw_from_posterior() to a survey with a Piraten and FW share of NA.

get_surveys() %>%
filter(institute == "gms") %>%
unnest() %>%
filter(datum == "2017-06-01") %>%
unnest()

institute datum start end befragte party percent votes
1 gms 2017-06-01 2017-05-26 2017-05-31 1016 cdu 39 396.24
2 gms 2017-06-01 2017-05-26 2017-05-31 1016 spd 23 233.68
3 gms 2017-06-01 2017-05-26 2017-05-31 1016 gruene 8 81.28
4 gms 2017-06-01 2017-05-26 2017-05-31 1016 fdp 10 101.60
5 gms 2017-06-01 2017-05-26 2017-05-31 1016 linke 8 81.28
6 gms 2017-06-01 2017-05-26 2017-05-31 1016 piraten NA NA
7 gms 2017-06-01 2017-05-26 2017-05-31 1016 fw NA NA
8 gms 2017-06-01 2017-05-26 2017-05-31 1016 afd 8 81.28
9 gms 2017-06-01 2017-05-26 2017-05-31 1016 sonstige 4 40.64

what does work is the following, so there's no problem with collapse_parties():
lookup <- coalitions:::institutes_df
address <- lookup$address[lookup$institute == "gms"]
coalitions::scrape_wahlrecht(address) %>%
filter(datum == "2017-06-01") %>%
coalitions::collapse_parties() %>%
unnest()

   datum      start        end befragte    party percent  votes

1 2017-06-01 2017-05-26 2017-05-31 1016 cdu 39 396.24
2 2017-06-01 2017-05-26 2017-05-31 1016 spd 23 233.68
3 2017-06-01 2017-05-26 2017-05-31 1016 gruene 8 81.28
4 2017-06-01 2017-05-26 2017-05-31 1016 fdp 10 101.60
5 2017-06-01 2017-05-26 2017-05-31 1016 linke 8 81.28
6 2017-06-01 2017-05-26 2017-05-31 1016 afd 8 81.28
7 2017-06-01 2017-05-26 2017-05-31 1016 sonstige 4 40.64

Release coalitions 0.6.10

  • local checks
  • travis + appveyor
  • win_builder
  • rhub checks
  • update news.md
  • update CRAN comments
  • devtools::release()
  • Create GitHub release

get_surveys() crashes from time to time

On the Ubuntu 15.04 server (running R 3.4.4, coalitions 0.6.4) get_surveys() sometimes throws an error:
Error in mutate_impl(.data, dots) : Evaluation error: schannel: next InitializeSecurityContext failed: SEC_E_BUFFER_TOO_SMALL (0x80090321) - Der Puffer, der einer Funktion <fc>bergeben wurde, war zu klein..

I was also able to reproduce this on Windows (R 3.5.1, coalitions 0.6.4).

As I noted, it doesn't happen all the time, but by using for(i in 1:20) { coalitions::get_surveys() } I was able to reproduce the error right away.

@adibender Do you know this error?

Prepare new CRAN release

  • fix test error that resultet from changes in html_table (see PR #132)
  • update Description
  • update News
  • devtools::check()
  • wait for utf8 to be fixed for Win devel
  • devtools::release
  • tag release on github

Error in scraping the new structure of fgw surveys

Some of the newest surveys of Forschungsgruppe Wahlen are not scraped by get_surveys() since the structure of their Sonstige column was changed.

Structure of the newest surveys on wahlrecht.de:
grafik

Trying to scrape these data:

dat <- coalitions::get_surveys()
dat %>% 
  filter(pollster == "fgw") %>% 
  unnest() %>%
  head()

# Result
# A tibble: 6 × 6
  pollster date       start      end        respondents survey          
  <chr>    <date>     <date>     <date>           <dbl> <list>          
1 fgw      2021-08-13 2021-08-10 2021-08-12        1252 <tibble [7 × 3]>
2 fgw      2021-07-16 2021-07-13 2021-07-15        1224 <tibble [7 × 3]>
3 fgw      2021-06-25 2021-06-22 2021-06-24        1271 <tibble [7 × 3]>
4 fgw      2021-05-07 2021-05-04 2021-05-06        1271 <tibble [7 × 3]>
5 fgw      2021-04-16 2021-04-13 2021-04-15        1292 <tibble [7 × 3]>
6 fgw      2021-03-26 2021-03-23 2021-03-25        1030 <tibble [7 × 3]>

`pool_surveys()` problem with two Forsa polls published on one day

Pooled shares don't sum to 100%:

> dat <- get_surveys()
There were 14 warnings (use warnings() to see them)
> dat
# A tibble: 7 x 2
    pollster            surveys
       <chr>             <list>
1 allensbach  <tibble [39 x 5]>
2      emnid <tibble [213 x 5]>
3      forsa <tibble [219 x 5]>
4        fgw  <tibble [79 x 5]>
5        gms  <tibble [94 x 5]>
6  infratest <tibble [102 x 5]>
7       insa <tibble [291 x 5]>
> pool_surveys(dat, last_date = as.Date("2017-11-21"))
# A tibble: 7 x 8
  pollster       date      start        end respondents  party   percent     votes
     <chr>     <date>     <date>     <date>       <dbl>  <chr>     <dbl>     <dbl>
1   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    afd 20.602092  606.2587
2   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    cdu 51.859722 1526.0784
3   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    fdp 17.359794  510.8474
4   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704 greens 18.002364  529.7564
5   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704   left 15.652744  460.6140
6   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704 others  7.038114  207.1109
7   pooled 2017-11-21 2017-11-17 2017-11-21    2942.704    spd 34.422840 1012.9624

`collapse_parties()` cannot handle only-NA columns

tab <- coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/gms.htm")
tab <- tab[tab$datum >= "2016-10-01",] # only take data since october 2016

   datum      start        end cdu spd gruene fdp linke piraten fw afd sonstige befragte

1 2017-05-04 2017-04-30 2017-05-03 36 29 7 7 8 NA NA 9 4 1.004
2 2017-03-23 2017-03-16 2017-03-22 34 31 8 6 8 NA NA 9 4 1.008
3 2017-02-09 2017-02-02 2017-02-08 33 29 9 6 8 NA NA 11 4 1.009
4 2017-01-05 2017-12-28 2017-01-04 36 20 10 7 10 NA NA 13 4 1.006
5 2016-11-03 2016-10-27 2016-11-02 33 22 11 7 11 NA NA 12 4 1.012
6 2016-10-13 2016-10-07 2016-10-12 34 21 12 7 10 NA NA 13 3 1.003

The following yields an error:
tab <- collapse_parties(tab) # reshape to long format
Error in collapse_parties(tab) :
Assertion on 'surveys' failed: Contains only missing values.

pool_surveys() not pooling properly

Hello, I'm not sure if I'm doing everything correctly, but I'm having trouble pooling some surveys.

The thing is that the pooled percent is not a percent but a large number. Here is a minimal reproducible example.

df <- tibble(pollster = c("poll1", "poll2"),
             date = as.Date(c("2019-05-14", "2019-05-13")),
             respondents = c(1000, 5000),
             party = c("PSOE", "PSOE"),
             percent = c(27.1, 28.3),
             votes = c(7203000, 7405004))

surveys <- nest(df, survey = c(party, percent, votes))
surveys <- nest(surveys, surveys = c(date, respondents, survey))


x <- pool_surveys(surveys, pollsters = c("poll1", "poll2"), last_date = as.Date("2019-05-15"), period = 14, period_extended = 20)

x$percent

x$percent
[1] 243466.7

Am I doing something wrong?

Thanks for the package.

Use wahlrecht api?

There is a "hidden" xml page at wahlrecht.de: http://www.wahlrecht.de/umfragen/bundesweite.xml.
Could be used to get newest data relatively nicely formated directly:

df <- xmlToDataFrame(xmlParse(http://www.wahlrecht.de/umfragen/bundesweite.xml))

However, apparently only last 22 most recent results provided there.

Correctly handle duplicate wahlrecht entries

Wahlrecht.de currently contains a duplicate entry for the INSA poll on 05.01.2021. scrape_wahlrecht outputs both entries, which causes problems in subsequent functions.

Solution: Drop one of the duplicates in scrape_wahlrecht.

Reproducible example:

library(coalitions)
library(dplyr)

# scrape data
dat <- scrape_wahlrecht("https://www.wahlrecht.de/umfragen/insa.htm")

dat %>% filter(date == "2021-01-05")
#>         date      start        end cdu spd greens fdp left pirates fw afd
#> 1 2021-01-05 2021-01-01 2021-01-04  36  15     18 7.5  7.5      NA NA  11
#> 2 2021-01-05 2021-01-01 2021-01-04  36  15     18 7.5  7.5      NA NA  11
#>   others respondents
#> 1      5        2072
#> 2      5        2072

Created on 2021-05-12 by the reprex package (v0.3.0)

speed improvements?

  • bottleneck in get_seat_distribution -> sls, improvement possible? maybe using Rcpp?

Update to tidyeval

  • make all functions compliant with tidyeval framework
  • remove variables from utils::globalVariables(.. list where possible

prepare CRAN release

  • add fixed data example data sets (subset of get_surveys as example data)
  • rewrite tests to use this internal data sets/remove dependency on internet connection for tests, where possible
  • compliance with tidyeval
    - [ ] describe usage without using predefined scrappers (enter data manually)
  • Fix R-dev CI errors
  • JOSS submission/publication
  • Update all addresses to https (especially on homepage)
  • Submit to CRAN
  • Add Git Tag + GitHub release

number of times functions

Need functions that return how often specific events happened, e.g.

  • SPD more seats than CDU/CSU and vice versa
  • FDP not getting over 5% Hurdle

Datenformat umstellen

Alle Funktionen die Data Frames zurückliefern ggf. umstellen auf echtes "Long-Format".

Datum|Partei|Prozent|

Adding Sainte-Lagüe formula

Hello,

I used your package recently to make some simulations of seat distribution in the parliament. Specifically, I needed to use the Sainte-Lagüe formula and slightly modified your dhondt function. I leave it here just in case you think it's useful and add it to the package! :)

Best regards,

Pau Grau

slague <- function (votes, parties, n_seats) { divisor.mat <- sum(votes)/sapply(votes, "/", seq(1, n_seats, 2)) # The only difference from hondt is that 2 here colnames(divisor.mat) <- parties m.mat <- tidyr::gather(as.data.frame(divisor.mat), key = "name", value = "value", everything()) m.mat <- m.mat[rank(m.mat$value, ties.method = "random") <= n_seats, ] rle.seats <- rle(as.character(m.mat$name)) if (sum(rle.seats$length) != n_seats) stop(paste("Number of seats distributed not equal to", n_seats)) if (any(!(parties %in% rle.seats$values))) { missing_parties <- parties[!(parties %in% rle.seats$values)] for (party in missing_parties) { rle.seats$lengths <- c(rle.seats$lengths, 0) rle.seats$values <- c(rle.seats$values, party) } rle.seats$lengths <- rle.seats$lengths[match(parties, rle.seats$values)] rle.seats$values <- rle.seats$values[match(parties, rle.seats$values)] } rle.seats$length }

Scraping data still not working for some institutes

Errors are arising for INSA and ForschGrWahlen:
coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/insa.htm")
coalitions::scrape_wahlrecht("http://www.wahlrecht.de/umfragen/politbarometer.htm")

get_surveys() possibly needs refinement

Applying coalitions::get_surveys() leads to warnings I've never had before:
e.g. In extract_num(cdu) : NAs durch Umwandlung erzeugt

Possibly a problem caused by the first line of the Emnid/Forsa/... tables on wahlrecht.de, which currently are no surveys, but show the btw17 result?

fix CRAN errors

https://cran.r-project.org/web/checks/check_results_coalitions.html

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_coalitions.html.

Please correct before 2018-10-08 to safely retain your package on CRAN.

Best,
-k

And again. It seems we need to remind you of the CRAN policy:

'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available (and not give a
check warning nor error).'

Please submit a fully compliant update ASAP and before Jan 23.

Add wrapper function for pooled survey

Function takes surveys df + time-window specification and calculates the respective effective sample size, then return survey df with this sample size and weighted means of vote shares, etc.

Recalculating same input data multiple times - gives different results (depending on what?)

library(coalitions)
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 8 7
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 8 7
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6
dHondt(c(360, 280), c("A", "B"), n_seats=15)
[1] 9 6

(tested with latest version)

Add option for rounding errors

Most results are rounded to the next integer percentage.
Allow extra variation by correcting for rounding runif(-.5, +.5)

Error caused by newly introduced NA columns in GMS Wahlrecht table

coalitions::get_surveys()

leads to Error in `[.data.frame`(atab, , !ind.empty) : undefined columns selected.
This is caused by the latest update of Wahlrecht's GMS table, which now leads to two trailing NA columns in the following dataset:

atab <- read_html(address) %>%
html_nodes("table") %>% .[[2]] %>%
html_table(fill = TRUE)

However, scrape_wahlrecht() currently cannot handle such NA columns.

New failures on CRAN

https://cran.r-project.org/web/checks/check_results_coalitions.html

I think most of them are related to tbl_df vs. as_tibble, which has been fixed in the latest PR. But I also got "ripleyed":

Dear maintainer,

Please see the problems shown on
https://cran.r-project.org/web/checks/check_results_coalitions.html.

Please correct before 2020-08-29 to safely retain your package on CRAN.

The CRAN Team

This is not complying with the CRAN policy

'Packages which use Internet resources should fail gracefully with an
informative message if the resource is not available or has changed (and
not give a check warning nor error).'

and taking 90m to fail its tests is anti-social.

@bauer-alex do you have time to look into this? Not sure where the 90m come from?

scrape_austria not working

The scrape_austria function isn't working anymore, obviously because the link to the json database is dead.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.