Git Product home page Git Product logo

discovr's Introduction

discovr: Resources for Discovering Statistics Using R and RStudio (Field, 2023)

NOTE: This package is incomplete but under active development. Check back here for updates/new tutorials.

The discovr package contains resources for my 2023 textbook Discovering Statistics Using and . There are tutorials written using learnr. Once a tutorial is running it’s a bit like reading a book but with places where you can practice the code that you have just been taught. The discovr package is free and offered to support tutors and students using my textbook who want to learn .

Installing discovr

To use discovr you first need to install and and familiarise yourself with , and good workflow practice. You can do this using this interactive tutorial. Once you have installed and you can install discovr. The package is in development so you have to install it from github. To install the package execute (in ):

if(!require(remotes)){
  install.packages('remotes')
}

remotes::install_github("profandyfield/discovr")

If you are trying to install on a networked computer the install might fail (it’s to do install.packages not liking UNC paths, which I’m not even going to pretend to understand). The solution is to specify the location of your library at the point of install. Most networks will map network locations to a drive name (for example, at my own University, users accounts are on the ‘N’ drive). Find the location of your library (e.g., N:/Documents/R/win-library/3.5), possibly executing .libPaths() to help you, and specify this location using the lib argument:

remotes::install_github("profandyfield/discovr", lib = "N:/Documents/R/win-library/3.5")

Interactive tutorials

I recommend working through this playlist of tutorials on how to install, set up and work within and before starting the interactive tutorials.

List of tutorials

  • discovr_01: Key concepts in (functions and objects, packages and functions, style, data types, tidyverse, tibbles)
  • discovr_02: Summarizing data (frequency distributions, grouped frequency distributions, relative frequencies, histograms, mean, median, variance, standard deviation, interquartile range)
  • discovr_03: Confidence intervals: interactive app demonstrating what a confidence interval is, computing normal and bootstrap confidence intervals using , adding confidence intervals to data summaries.
  • discovr_05: Visualizing data. The ggplot2 package, boxplots, plotting means, violin plots, scatterplots, grouping by colour, grouping using facets, adjusting scales, adjusting positions.”
  • discovr_06: The beast of bias. Restructuring data from messy to tidy format (and back). Spotting outliers using histograms and boxplots. Calculating z-scores (standardizing scores). Writing your own function. Using z-scores to detect outliers. Q-Q plots. Calculating skewness, kurtosis and the number of valid cases. Grouping summary statistics by multiple categorical/grouping variables.
  • discovr_07: Associations. Plotting data with GGally. Pearson’s r, Spearman’s Rho, Kendall’s tau, robust correlations.
  • discovr_08: The general linear model (GLM). Visualizing the data, fitting GLMs with one and two predictors. Viewing model parameters with broom, model parameters, standard errors, confidence intervals, fit statistics, significance, Bayes factors and Bayesian estimates (using default priors).
  • discovr_09: Categorical predictors with two categories (comparing two means). Comparing two independent means, comparing two related means, effect sizes, robust comparisons of means (independent and related), Bayes factors and estimation (independent and related means).
  • discovr_10: Moderation and mediation. Centring variables (grand mean centring), specifying interaction terms, moderation analysis, simple slopes analysis, Johnson-Neyman intervals, mediation with one predictor, direct and indirect effects, mediation using lavaan.
  • discovr_11: Comparing several means. Essentially ‘One-way independent ANOVA’ but taught using a general linear model framework. Covers setting contrasts (dummy coding, contrast coding, and linear and quadratic trends), the F-statistic and Welch’s robust F, robust parameter estimation, heteroscedasticity-consistent tests of parameters, robust tests of means based on trimmed data, post hoc tests, Bayes factors.
  • discovr_12: Comparing means adjusted for other variables. Essentially ‘Analysis of Covariance (ANCOVA)’ designs but taught using a general linear model framework. Covers setting contrasts, Type III sums of squares, the F-statistic, robust parameter estimation, heteroscedasticity-consistent tests of parameters, robust tests of adjusted means, post hoc tests, Bayes factors.
  • discovr_13: Factorial designs. Fitting models for two-way factorial designs (independent measures) using both lm() and the afex package. This tutorial builds on previous ones to show how models can be fit with two categorical predictors to look at the interaction between them. We look at fitting the models, setting contrasts for the two categorical predictors, obtaining estimated marginal means, interaction plots, simple effects analysis, diagnostic plots, partial eta-squared and partial omega-squared, robust models and Bayes factors.
  • discovr_14: Multilevel models.
  • discovr_15: Repeated measures designs. Fitting models for one- and two-way repeated measures designs using the afex package. This tutorial builds on previous ones to show how models can be fit with one or two categorical predictors when these variables have been manipulated within the same entities. We look at fitting the models, setting contrasts for the categorical predictors, obtaining estimated marginal means, interaction plots, simple effects analysis, diagnostic plots, robust models and Bayes factors.
  • discovr_16: Mixed designs. Fitting models for mixed designs using the afex package. This tutorial builds on previous ones to show how models can be fit with one or two categorical predictors when at least one of these variables has been manipulated within the same entities and at least one other has been manipulated using different entities. We look at fitting the models, setting contrasts for the categorical predictors, obtaining estimated marginal means, interaction plots, simple effects analysis, diagnostic plots, robust models and Bayes factors.
  • discovr_18: Exploratory Factor Analysis (EFA). Applying factor analysis using the psych package. This tutorial uses a fictitious questionnaire (the Anxiety Scale, RAQ) with 23 items to show how EFA can be used to identify clusters of items that may, or may not, represent constructs associated with anxiety about using . We look at inspecting the correlation matrix, obtaining the Bartlett test and KMO statistics, using parallel analysis to determine the number of factors to extract, extracting factors, rotating the solution nd interpretation of the factors. We also learn to obtain Cronbach’s alpha on each of the subscales.
  • discovr_19: Categorical variables. Entering categorical data, contingency tables, associations between categorical variables, the chi-square test, standardized residuals, Fisher’s exact test.
  • discovr_20: Categorical outcomes (logistic regression). This tutorial builds on previous ones to show how the general linear model model extends to situations where you want to predict a binary outcome (logistic regression). We look at fitting the models and interpretting the odds ratio.

Running a tutorial

In Version 1.3 onwards there is a tutorial pane. Having executed

library(discovr)

A list of tutorials appears in this pane. Scroll through them and click on the button to run the tutorial:

Alternatively, to run a particular tutorial from the console execute:

library(discovr)
learnr::run_tutorial("name_of_tutorial", package = "discovr")

and replace “name of tutorial” with the name of the tutorial you want to run. For example, to run tutorial 2 execute:

learnr::run_tutorial("discovr_02", package = "discovr")

The name of each tutorial is in bold in the list above. Once the command to run the tutorial is executed it will spring to life in a web browser.

Suggested workflow

The tutorials are self-contained (you practice code in code boxes) so you don’t need to use at the same time. However, to get the most from them I would recommend that you create an project and within that open (and save) a new RMarkdown file each time to work through a tutorial. Within that Markdown file, replicate parts of the code from the tutorial (in code chunks) and use Markdown to write notes about what you have done, and to reflect on things that you have struggled with, or note useful tips to help you remember things. Basically, write a learning journal. This workflow has the advantage of not just teaching you the code that you need to do certain things, but also provides practice in using itself.

See this video explaining my suggested workflow:

<iframe width="560" height="315" src="https://www.youtube.com/embed/FhoYCsZttGc" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> </iframe>

Colour palettes

Inspired by the rockthemes package and adapting code form that package I have come up with a bunch of colour themes based around the studio albums of my favourite band Iron Maiden. Full disclosure, I’m not a designer, so this largely involved uploading images of their sleeves to colorpalettefromimage.com and seeing what happened. If you have a better palette design send me the hex codes for the colours! If you’re wondering why some albums are missing, here’s the explanation: X Factor (would basically be 8 shades of gray), Fear of the Dark (shit album), The Book of Souls (would basically be 8 shades of black).

There is also a colourblind-friendly colour pallette from

The following palettes exist.

  • amolad_pal(): Colour palette (8 colour) based on Iron Maiden’s A Matter of Life and Death album sleeve. In ggplot2 use scale_color_amolad() and scale_fill_amolad().
  • bnw_pal(): Colour palette (8 colour) based on Iron Maiden’s Brave New World album sleeve. In ggplot2 use scale_color_bnw() and scale_fill_bnw().
  • dod_pal(): Colour palette (8 colour) based on Iron Maiden’s Dance of Death album sleeve. In ggplot2 use scale_color_dod() and scale_fill_dod().
  • frontier_pal(): Colour palette (8 colour) based on Iron Maiden’s The Final Frontier album sleeve. In ggplot2 use scale_color_frontier() and scale_fill_frontier().
  • im_pal(): Colour palette (8 colour) based on Iron Maiden’s eponymous album sleeve. In ggplot2 use scale_color_im() and scale_fill_im().
  • killers_pal(): Colour palette (8 colour) based on Iron Maiden’s Killers album sleeve. In ggplot2 use scale_color_killers() and scale_fill_killers().
  • nob_pal(): Colour palette (8 colour) based on Iron Maiden’s The Number of the Beast album sleeve. In ggplot2 use scale_color_nob() and scale_fill_nob().
  • okabe_ito_pal: Colourblind-friendly palette (8 colour) from Okabe and Ito. In ggplot2 use scale_color_oi() and scale_fill_oi().
  • pom_pal(): Colour palette (8 colour) based on Iron Maiden’s Piece of Mind album sleeve. In ggplot2 use scale_color_pom() and scale_fill_pom().
  • power_pal(): Colour palette (8 colour) based on Iron Maiden’s Powerslave album sleeve. In ggplot2 use scale_color_power() and scale_fill_power().
  • prayer_pal(): Colour palette (8 colour) based on Iron Maiden’s No Prayer for the Dying album sleeve. Use scale_color_prayer() and scale_fill_prayer().
  • senjutsu_pal(): Colour palette (10 colour) based on the inner gatefold image of Iron Maiden’s Senjutsu album album sleeve. In ggplot2 use scale_color_senjutsu() and scale_fill_senjutsu().
  • sit_pal(): Colour palette (8 colour) based on Iron Maiden’s Somewhere in Time album sleeve. In ggplot2 use scale_color_sit() and scale_fill_sit().
  • ssoass_pal(): Colour palette (8 colour) based on Iron Maiden’s Seventh Son of a Seventh Son album sleeve. In ggplot2 use scale_color_ssoass() and scale_fill_ssoass().
  • virtual_pal(): Colour palette (8 colour) based on Iron Maiden’s Virtual IX album sleeve. In ggplot2 use scale_color_virtual() and scale_fill_virtual().

To view the palette execute

scales::show_col(name_of_palette()(8))

Replacing name_of_palette() with the name, for example

scales::show_col(pom_pal()(8))

To apply, for example, the Powerslave palette to the colours of a ggplot2 plot add scale_color_power() as a layer:

library(ggplot2)

# Get albums in the classic era from the discovr::eddiefy data.
# I'm not including fear of the dark because it's not in any way classic.
# No prayer for the dying was pushing its luck too if I'm honest.

classic_era <- subset(discovr::eddiefy, year < 1992)
#> Loading required package: lubridate
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
 
ggplot(classic_era, aes(x = energy, y = valence, color = album_name)) +
  geom_point(size = 2) +
  discovr::scale_color_power() +
  theme_minimal()

Similarly to apply the Powerslave palette to the fill of objects in a ggplot add scale_fill_power() as a layer:

ggplot(classic_era, aes(x = album_name, y = valence, fill = album_name)) +
  geom_violin() +
  discovr::scale_fill_power() + 
  theme(axis.text.x = element_text(angle = 45)) +
  theme_minimal()

Datasets

See the book or data descriptions for more details. This is a list of available datasets within the package. Raw CSV files are available from the book’s website.

  • acdc: Data about whether Bon Scott or Brian Johnson is the best singer of AC/DC. For details execute ?acdc.
  • album_sales: fictitious data about predicting album sales from advertising, airplay and the band’s image. For details execute ?album_sales.
  • alien_scents: fictitious data about training sniffer dogs to detect alien space lizards when they try to mask their identity with different scents. For details execute ?alien_scents.
  • animal_bride: fictitious data about life satisfaction when married to a dog or a goat. For details execute ?animal_bride.
  • angry_pigs: fictitious data about whether playing the video game angry pigs makes people more aggressive towards pigs. For details execute ?angry_pigs.
  • angry_real: fictitious data about whether playing the video game angry pigs makes people more aggressive in everyday life. For details execute ?angry_real.
  • animal_dance: Fictitious data about training cats and dogs to dance.?animal_dance
  • beckham_1929: Data from a study by Beckham (1929). For details execute ?beckham_1929.
  • big_hairy_spider: fictitious data about whether anxiety is greater after exposure to real spiders or pictures of spiders. For details execute ?big_hairy_spider.
  • biggest_liar: fictitious data about creativity and telling lies. For details execute ?biggest_liar.
  • bronstein_2019: Data about whether delusion proneness predicts belief in fake news because of less analytic thinking. For details execute ?bronstein_2019.
  • bronstein_miss_2019: The data in [bronstein_2019] but with missing values inserted using MCAR amputation. For details execute ?bronstein_miss_2019.
  • catterplot: fictitious data for plotting a catterplot. For details execute ?catterplot.
  • cat_dance: fictitious data about training cats to dance. For details execute ?cat_dance.
  • cat_reg: fictitious data about training cats to dance. For details execute ?cat_reg.
  • cetinkaya_2006: data from a study by Cetinkaya and Domjan (2006) about quails with sexual fetishes. Seriously. For details execute ?cetinkaya_2006.
  • chamorro_premuzic: Data about what students want (personality wise) from their lecturers. For details execute ?chamorro_premuzic.
  • child_aggression: fictitious data (based on real research) about predicting aggression in children. For details execute ?child_aggression.
  • coldwell_2006: Data predicting childhood adjustment from various parenting variables. For details execute ?coldwell_2006.
  • cosmetic: Fictitious multilevel data predicting quality of life from cosmetic surgery. For details execute ?cosmetic.
  • daniels_2012: Data about the effects of sexualised sports images on self-image. For details execute ?daniels_2012.
  • dark_lord: fictitious data about the subliminal messages in songs. For details execute ?dark_lord.
  • davey_2003: Data about the effects mood and stop rules on checking behaviour. For details execute ?davey_2003.
  • download: fictitious data about the download music festival and being smelly. For details execute ?download.
  • df_beta: fictitious data used to illustrate the DF Beta statistic. For details execute ?df_beta.
  • eel: Fictitious data about a randomized control trial to test whether eel therapy is an effective treatment of constipation. For details execute ?eel.
  • elephooty: Fictitious data about elephants playing football (soccer). For details execute ?elephooty.
  • escape: Fictitious data about whether I’m a better songwriter than my old bandmate Malcolm ?escape.
  • essay_marks: fictitious data about essay marking. For details execute ?essay_marks.
  • exam_anxiety: fictitious data about exam performance, anxiety and revision. For details execute ?exam_anxiety.
  • field_2006: Data that tests a hypothesis that threat information affects children’s avoidance of novel animals. For details execute ?field_2006.
  • gallup_2003: Data that tests a hypothesis about why penises have a bell end. For details execute ?gallup_2003.
  • gelman_2009: Data used to critically evaluate the explanations (and claim) that there are more beautiful women than men in the world. For details execute ?gelman_2009.
  • glastonbury: More fictitious data about music festivals and being smelly. For details execute ?glastonbury.
  • goggles: fictitious data about whether alcohol affects perception of physical attractiveness. For details execute ?goggles.
  • goggles_lighting: fictitious data about the moderating effect of lighting on the ratings of attractivenesses of faces after different doses of alcohol. For details execute ?goggles_lighting.
  • grades: fictitious data about statistics grades. For details execute ?grades.
  • hangover: fictitious data about the efficacy of different drinks as cures for a hangover. For details execute ?hangover
  • hiccups: fictitious data on digital rectal stimulation and hiccups. For details execute ?hiccups.
  • hill_2007: Data from Hill et al. (2007) testing the effect of different forms of psychoeducation on exercise behaviour. For details execute ?hill_2007.
  • honesty_lab: fictitious data about perceptions of honesty. For details execute ?honesty_lab.
  • ice_bucket: Data about the ice bucket challenge. For details execute ?ice_bucket.
  • invisibility_base: Fictitious data about how much mischief people would get up to if they had an invisibility cloak using a pre-post study design.?invisibility_base
  • invisibility_cloak: fictitious data about how much mischief people would get up to if they had an invisibility cloak using an independent design. For details execute ?invisibility_cloak.
  • invisibility_rm: fictitious data about how much mischief people would get up to if they had an invisibility cloak but using a repeated measures design. For details execute ?invisibility_rm.
  • jiminy_cricket: fictitious data about whether wishing on a star makes you successful. For details execute ?jiminy_cricket.
  • johns_2012: Data about whether the colour red is a mating signal to men. For details execute ?johns_2012.
  • lambert_2012: Data about whether pornography use is related to relationhsip commitment and infidelity. For details execute ?lambert_2012.
  • massar_2012: Data about whether gossiping has an evolutionary function. For details execute ?massar_2012.
  • mcnulty_2008: Simulated data to match the results of a study about whether attractivenes sis linked to the support given within a relationship. For details execute ?mcnulty_2008.
  • men_dogs: fictitious data about whether men exhibit dog-like behaviours (compared to dogs). For details execute ?men_dogs.
  • metal: Fictitious data about whether listening to metal music makes you angry ?metal.
  • metal_health: fictitious data about whether listening to heavy metal negatively affects mental health. For details execute ?metal_health.
  • metallica: Data about thrash metal band, Metallica. For details execute ?metallica.
  • miller_2007: Data from Miller et al. (2007) testing the hidden-estrus theory. For details execute ?miller_2007.
  • mixed_attitude: Fictitious data about whether different type of imagery in advertising affect ratings of different types of drinks based on the gender identity of the participant. For details execute ?mixed_attitude.
  • murder: fictitious data about the number of murder each month at three street locations (Ruskin Avenue, Acacia Avenue and Rue Morgue). For details execute ?murder.
  • muris_2008: Data about whether you can train children to interpret ambiguous situations in a particular way. For details execute ?muris_2008.
  • nichols_2004: Data from the development of the Internet Addiction Scale, IAS (Nichols & Nicki, 2004). For details execute ?nichols_2004.
  • notebook: fictitious data about whether watching the film the notebook is emotionally arousing. For details execute ?notebook.
  • ocd: Fictitious data about interventions for obsessive compulsive disorder. For details execute ?ocd.
  • ong_2011: Data about social media profile pictures and personality traits. For details execute ?ong_2011.
  • ong_tidy: Data about social media profile pictures and personality traits. For details execute ?ong_tidy.
  • penalty: Fictitious data about predictors of penalty kick success in soccer (or whatever sport you enjoy). For details execute ?penalty.
  • profile_pic: Fictitious data related to whether the number of friend requests from random people on social media is affected by whether your profile picture depicts you as single or part of a romantic couple. For details execute ?profile_pic.
  • pubs: Data illustrating the difference between an outlier and an influencial case. For details execute ?pubs.
  • puppies: Fictitious data related to whether puppy therapy works. For details execute ?puppies.
  • puppy_love: Fictitious data related to whether puppy therapy works when you adjust for a person’s love of puppies. For details execute ?puppy_love.
  • r_exam: Fictitious data relating to an R exam at two universities. For details execute ?r_exam.
  • reality_tv: Fictitious data relating to whether being on a reality TV show exacerbates personality disorder traits. For details execute ?reality_tv.
  • raq: Fictitious data relating to a fictional questionnaire about R anxiety that is not an actual questionnaire. For details execute ?raq.
  • roaming_cats: fictitious data about how far cats roam from their homes. For details execute ?roaming_cats.
  • rollercoaster: Fictitious data about how roller-coaster induced fear affects attractiveness ratings. For details execute ?rollercoaster.
  • santas_log: Fictitious data related to whether the type and quantity of treat consumed on Christmas night affects whether elves successfully deliver presents. For details execute ?santas_log.
  • self_help: fictitious data about whether self-help books improve relationship satisfaction. For details execute ?self_help.
  • self_help_dsur: fictitious data about whether self-help books improve relationship satisfaction compared to statistics books. For details execute ?self_help_dsur.
  • sharman_2015: Data from Sharman & Dingle (2015) about whether listening to metal music increases anger ?sharman_2015.
  • shopping: fictitious data about shopping For details execute ?shopping_exercise.
  • sniffer_dogs: fictitious data about training sniffer dogs to detect alien space lizards. For details execute ?sniffer_dogs.
  • social_anxiety: fictitious (I think) data about whether social anxiety symptoms are specific to social anxiety. For details execute ?social_anxiety.
  • social_media: fictitious data about the effects of social media on grammar. For details execute ?social_media.
  • soya: fictitious data about the effects of eating soya on sperm count. For details execute ?soya.
  • speed_date: Fictitious data related to the extent to which interest in dating someone is affected by their looks, personality or the dating strategy they adopt. For details execute ?speed_date.
  • stalker: fictitious data about therapy for stalking. For details execute ?stalker.
  • students: I can’t even remember what this data file contains. For details execute ?student.
  • superhero: fictitious data about whether wearing different superhero costumes leads to more severe physical injuries. For details execute ?superhero.
  • supermodel: fictitious data about supermodel salaries. For details execute ?supermodel.
  • switch: Fictitious data relating to whether injuries from playing video console games can be mitigated by a warm up.?switch
  • tablets: fictitious data about predicting the desirability of computing tablets. For details execute ?tablets.
  • tea_15: fictitious data based on real data about cognitive functioning and drinking tea. For details execute ?tea_makes_you_brainy_15.
  • tea_716: fictitious data based on real data about cognitive functioning and drinking tea. For details execute ?tea_makes_you_brainy_716.
  • teaching: fictitious data about the success of different methods of teaching. For details execute ?teaching.
  • teach_method: more fictitious data about the success of different methods of teaching. For details execute ?teach_method.
  • text_messages: fictitious data about whether use of messaging apps ruins your grammar. For details execute ?text_messages.
  • tosser: Fictitious data relating to a fictional questionnaire about The Teaching of Statistics for Scientific Experiments, which is fictional. For details execute ?tosser.
  • tuk_2011: Data about whether needing to urinate helps decision making. For details execute ?tuk_2011.
  • tumour: fictitious data about mobile phone use and brain tumours. For details execute ?tumour.
  • tutor_marks: fictitious data comparing 4 tutors marks of the same essays. For details execute ?tutor_marks.
  • van_bourg_2020: Data from van Bourg et al (2020) relating to whether dogs would release their distressed owners from a box. For details execute ?van_bourg_2020.
  • video_games: fictitious data about the relationship between video game use, callous unemotional traits and aggression. For details execute ?video_games.
  • williams: Data relating to the development of a questionnaire to measure organizational ability. For details execute ?williams
  • xbox: Fictitious data relating injuries to the type of video console game played and the console it was played on. For details execute ?xbox.
  • zhang_sample: Data about whether performing a maths test under a different name assists performance. For details execute ?zhang_2013_subsample.
  • zibarras_2008: Data from Zibarras, Port, and Woods (2008) relating to the relationship between personality and creativity. For details execute ?zibarras_2008.
  • zombie_growth: fictitious data that mimics a randomised control trial over time testing an intervention to transform zombies back to their pre-zombified state. For details execute ?zombie_growth.
  • zombie_rehab: fictitious data that mimics a randomised control trial testing an intervention to transform zombies back to their pre-zombified state in different clinics. For details execute ?zombie_rehab.

Smart Alex solutions

Solutions for end of chapter tasks are available at www.discovr.rocks.

Labcoat Leni solutions

Solutions for the Labcoat Leni tasks are available at www.discovr.rocks.

Chapter code

Although I recommend working through the interactive solutions, each book Chapter has online code and a downloadable R Markdown file available from www.discovr.rocks.

discovr's People

Contributors

jakub-jedrusiak avatar profandyfield avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

discovr's Issues

Error using across() function within a mutate - Tutorial 10

There appears to be an error in Tutorial 10 when using the across() function.

The tutorial asks you to create centred versions of a couple of variables in one go, and the code provided to do that is the following:

vids_tib <- vids_tib %>%
  dplyr::mutate(
    dplyr::across(c(vid_game, caunts), cent = centre)
    )

This is meant to create two variables called "vid_game_cent" and "caunts_cent", which are the centred versions of the original variables. For me it doesn't create those variables and doesn't seem to do anything else.

I had a look athe across() function and it appears the correct way to specify the names of the new variables is with the .names argument:

vids_tib <- vids_tib %>%
  dplyr::mutate(
    dplyr::across(c(vid_game, caunts), centre, .names = "{.col}_cent")
    ) 

Release discovr 0.1.5.7

First release:

Prepare for release:

  • urlchecker::url_check()
  • devtools::check(remote = TRUE, manual = TRUE)
  • devtools::check_win_devel()
  • rhub::check_for_cran()

Submit to CRAN:

  • usethis::use_version('patch')
  • devtools::submit_cran()
  • Approve email

Wait for CRAN...

  • Accepted 🎉
  • usethis::use_github_release()
  • usethis::use_news_md()
  • usethis::use_dev_version()
  • Update install instructions in README

discover_08: Deprecated function in `across`

across(.fns = list(valid = ~sum(!is.na(.x)), missing = ~sum(is.na(.x))), .names = "{.col}_{.fn}")

The function itself is not deprecated, but the use of the function across without .cols is deprecated in dplyr 1.1.1.

When trying to replicate the function on line 986 in my own .qmd file in VS Code, I get an error, but it only displays once every 8 hours.

VS Code version

Version: 1.86.2 (Universal)
Commit: 903b1e9d8990623e3d7da1df3d33db3e42d80eda
Date: 2024-02-13T19:42:13.651Z
Electron: 27.2.3
ElectronBuildId: 26908389
Chromium: 118.0.5993.159
Node.js: 18.17.1
V8: 11.8.172.18-electron.0
OS: Darwin x64 22.6.0

dplyr version

Type: Package
Package: dplyr
Title: A Grammar of Data Manipulation
Version: 1.1.4
Authors@R: c(
    person("Hadley", "Wickham", , "[email protected]", role = c("aut", "cre"),
           comment = c(ORCID = "0000-0003-4757-117X")),
    person("Romain", "François", role = "aut",
           comment = c(ORCID = "0000-0002-2444-4226")),
    person("Lionel", "Henry", role = "aut"),
    person("Kirill", "Müller", role = "aut",
           comment = c(ORCID = "0000-0002-1416-3412")),
    person("Davis", "Vaughan", , "[email protected]", role = "aut",
           comment = c(ORCID = "0000-0003-4777-038X")),
    person("Posit Software, PBC", role = c("cph", "fnd"))
  )
Description: A fast, consistent tool for working with data frame like
    objects, both in memory and out of memory.
License: MIT + file LICENSE
URL: https://dplyr.tidyverse.org, https://github.com/tidyverse/dplyr
BugReports: https://github.com/tidyverse/dplyr/issues
Depends: R (>= 3.5.0)
Imports: cli (>= 3.4.0), generics, glue (>= 1.3.2), lifecycle (>=
        1.0.3), magrittr (>= 1.5), methods, pillar (>= 1.9.0), R6,
        rlang (>= 1.1.0), tibble (>= 3.2.0), tidyselect (>= 1.2.0),
        utils, vctrs (>= 0.6.4)
Suggests: bench, broom, callr, covr, DBI, dbplyr (>= 2.2.1), ggplot2,
        knitr, Lahman, lobstr, microbenchmark, nycflights13, purrr,
        rmarkdown, RMySQL, RPostgreSQL, RSQLite, stringi (>= 1.7.6),
        testthat (>= 3.1.5), tidyr (>= 1.3.0), withr
VignetteBuilder: knitr
Config/Needs/website: tidyverse, shiny, pkgdown, tidyverse/tidytemplate
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.2.3
NeedsCompilation: yes
Packaged: 2023-11-16 21:48:56 UTC; hadleywickham
Author: Hadley Wickham [aut, cre] (<https://orcid.org/0000-0003-4757-117X>),
  Romain François [aut] (<https://orcid.org/0000-0002-2444-4226>),
  Lionel Henry [aut],
  Kirill Müller [aut] (<https://orcid.org/0000-0002-1416-3412>),
  Davis Vaughan [aut] (<https://orcid.org/0000-0003-4777-038X>),
  Posit Software, PBC [cph, fnd]
Maintainer: Hadley Wickham <[email protected]>
Repository: CRAN
Date/Publication: 2023-11-17 16:50:02 UTC
Built: R 4.2.0; x86_64-apple-darwin17.0; 2023-12-21 16:49:49 UTC; unix
Archs: dplyr.so.dSYM

RStudio version

Version 2023.12.1+402 (2023.12.1+402)

R Version

4.2.3

unable to open discover_16: Mixed designs

What attempting to open the tutorial, I get the following error:

Error in shiny_prerendered_app(target_file, render_args = render_args) :
No server contexts or server.R available for /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library/discovr/tutorials/discovr_16/discovr_16.Rmd
Calls: .rs.sourceWithProgress ... -> force -> -> shiny_prerendered_app
Execution halted

issue with running tutorial discovr_08

processing file: discovr_08.Rmd
solution]

Quitting from lines at lines 1984-2007 [unnamed-chunk-22] (discovr_08.Rmd)
Error in validObject():
! invalid class "ddenseModelMatrix" object: superclass "xMatrix" not defined in the environment of the object's class
Backtrace:

  1. base::summary(BayesFactor::posterior(album_full_bf, iterations = 10000))
  2. BayesFactor::posterior(album_full_bf, iterations = 10000)
  3. BayesFactor::posterior(...)
  4. BayesFactor:::fullDesignMatrix(formula, data, dataTypes)
  5. base::lapply(...)
    ...
  6. MatrixModels::model.Matrix(formula(fmla), data = data, sparse = sparse)
  7. methods::new(...)
  8. Matrix (local) initialize(value, ...)
  9. methods (local) .nextMethod(.Object = .Object, ... = ...)
  10. methods::validObject(.Object)
    Warning messages:
    1: In eval(parse_only(code), envir = envir) :
    2: In eval(parse_only(code), envir = envir) :
    Execution halted

Replace model_parameters() by parameters()

We tend to push for the use of the short alias parameters() instead of model_parameters() in easystats, which is a bit too long and clunky. Let me know if you're interested in replacing that here, happy to make a PR

Issues in tutorial "The beast of bias"

In the section "Spotting outliers"
image
The text says the new columns will have a 'z' appended, while the results of the code don't give this output (in fact the code itself doesn't;t correspond to what the text says)

In the section "Spotting normality"
image
There must be a typo in this self-contradictory sentence...

tutorial fails to run

Upon clicking the button to start tutorial 1, I get the following error:

$ exercise      : logi TRUE
 $ exercise.lines: num 4
  |..................                                                    |  25%
  ordinary text without R code
  |..................                                                    |  26%
label: met_name-hint-1
Quitting from lines 401-404 (discovr_01.Rmd) 
Error in eval(expr, envir, enclos) : object '.....' not found
Calls: sourceWithProgress ... handle -> withCallingHandlers -> withVisible -> eval -> eval
Execution halted

This is using the R 4.0.2 and RStudio 1.3.959 with all of the latest packages and latest version of discovr as of July 10.

Minor content issue in discovr_03.Rmd

Directory: discovr/inst/tutorials/discovr_03/discovr_03.Rmd

In the discovr_03 tutorial, subsection "What is a confidence interval?" Replace all was used a bit too liberally. Line 128 reads:

I think I used Instagram, but that's because I haven't yet realised that Instagram is populated only by people over the age of 40. I'm told that Instagram is the place to be, but no-one wants to see my wrinkly old scrotum of a face so I never use it. I pretty much don't use Instagram either. Did I mention I like statistics? Draw your own conclusions.

while I guess what it SHOULD say is (bold for clarity):

I think I used Facebook, but that's because I haven't yet realised that Facebook is populated only by people over the age of 40. I'm told that Instagram is the place to be, but no-one wants to see my wrinkly old scrotum of a face so I never use it. I pretty much don't use Facebook either. Did I mention I like statistics? Draw your own conclusions.

Issues in tutorial "Moderation and Mediation"

In section "Mediation"

  • Minor formatting errors
    image

  • This is not an error, but I think the coefficient b can be confused with the path b and label b in this particular example.
    image

  • Minor formatting errors: extra dollar signs
    image
    image

A few mistakes in discovr_13

There are a few mistakes in discovr_13:

  1. One of the hints in the Exploring data section uses the word "happiness" instead of "attractiveness".
  2. I'm not sure if it's a mistake, but "vcov.type = “HC4”" in the main text of Robust models uses typographic quotation marks and thus causes an error if copied and pasted.
  3. In the first quiz question in the Transfer task we're asked to "select ALL valid answers" but only one answer can be chosen.
  4. In the Transfer task, in coding challenge with simple effects analysis word "console" is repeated twice.
  5. Interpretting interactions (level 2) quiz in the Transfer task show statistics' values which I can't find in any of the produced tables but I might not have produced the right tables.

That's a good occasion for me to thank you for "Discovering statistics using R", it's the best book to learn statistics I've encountered. Pitty it wasn't published in my country.

calculation of alpha

First, and perhaps most important, the value of Alpha at the very top is Cronbach's $\alpha$, and we are given its 95% confidence interval below: we're looking for values in the range of .7 to .8 (or thereabouts). In this case, `r get_alpha(r1)` so this probably indicates good reliability.

I get different values for alpha here:

Reliability analysis   
Call: psych::alpha(x = dplyr::select(raq_tib, raq_06, raq_07, raq_10, 
    raq_13, raq_14, raq_15, raq_18))

  raw_alpha std.alpha G6(smc) average_r S/N    ase mean   sd median_r
      0.77      0.77    0.75      0.32 3.3 0.0069  3.5 0.64     0.29

    95% confidence boundaries 
         lower alpha upper
Feldt     0.75  0.77  0.78
Duhachek  0.76  0.77  0.78

whereas you report 𝛼 = 0.80 [0.79, 0.81]

I understand that my values are still within the range we are looking for, but just wondering if I should be concerned about the difference.

Ideas

Hi Andy, I am going through the factor analysis tutorial and I thought I might write down some ideas/thoughts here for future reference, and we can discuss them someday and I'll make PRs for what you want ☺️

discovr_18: EFA

  • Unify correlation: Avoid swithing to psych by using the polychoric option in correlation() + its plotting features
  • sphericity and KMO: Use performance::check_factorstructure() for a prettier check output (easystats/performance#552)
  • how many factors: reference (or use, but it might be overkill) n_factors()?
    • TODO for easystats: add plotting example that we have in the vignette + more details about each method)
  • Use directly parameters::factor_analysis()? tough benefits are probably small here
  • Replace model_parameters() calls by alias parameters() as it's a bit less confusing and simple

@strengejacke

Issues in tutorial "Mixed designs (GLM5)"

The following issues come from page "Fitting a model to a mixed design"

  1. Minor typo
    image
    Which should be ref = x instead of rel = x

  2. Some misinterpretation
    image
    I think the first interaction term should show the effect of average dates compared to unattractive dates, comparing playing hard to get to normal. However, in the tutorial, you wrote "The first contrast for the interaction term shows the effect of low attractive dates compared to average dates, comparing playing hard to get to normal."
    Similarly, the second interaction term shows the effect of attractive dates compared to average dates, comparing playing hard to get to normal, rather than "low attractive dates compared to average-looking dates".
    (I also suggest double-checking the interpretations of interaction between looks and personality as well as the three-way interactions. There seem to be similar problems...)

Thank you Andy for your hard work! Will be supporting you as always :D

error in rendering discovr_06.Rmd tutorial from the discovr package

processing file: discovr_06.Rmd

Quitting from lines 16-45 (discovr_06.Rmd)
Error: package or namespace load failed for 'qqplotr' in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]):
there is no package called 'twosamples'
Execution halted

R version 4.2.3

Issues in tutorial "Multilevel model"

In section "Fitting multilevel models":
There are several places where the font effect doesn't show:
image

The overall intervention effect should be 6.79 instead of -4.3:
image

Unable to open discovr_09 :comparing two means

When I try to preview discovr 9 this message appears

"processing file: discovr_09.Rmd

Error in t.test.formula():
! cannot use 'paired' in formula method
Backtrace:

  1. stats::t.test(...)
  2. stats:::t.test.formula(...)

Quitting from lines 511-515 [unnamed-chunk-11] (discovr_09.Rmd)
Execution halted"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.