Git Product home page Git Product logo

learnb4ss's Introduction

Learn Bayesian Analysis for the Speech Sciences

Learning materials

Here you can find the learning materials for the workshop Learn Bayesian Analysis for the Speech Sciences (learnB4SS), in the form of an R package.

For dates and other info, see the workshop homepage.

Let’s get started!

Prerequisites

The first step is to install brms and its dependencies. If you haven’t yet done so, check out the installation instructions here.

Workshop materials

To save you some hassle, we created a Starter Kit, which you can download from here: Go to download page.

The download page also contains instructions to get you set up.

NOTE. The kit is just a convenient way of setting up an RStudio project that you can use during the workshop. If you are comfortable with RStudio, you can choose to set up your own RStudio project instead of downloading this kit. If you choose not to use the Starter Kit, you can just follow the installation instructions for the learnB4SS package. Note that this package and materials are best used from RStudio.

Check list

Here is a summary of all the prepping steps.

  • Install brms and dependencies (essential).
  • Download the Starter Kit (optional) and install learnB4SS (essential).
  • Hang out with us on Slack (link and instructions have been provided via email).
  • Prepare some snacks and refreshments to keep your energy up during the workshop (unfortunately we can’t provide those).

learnb4ss's People

Contributors

jvcasillas avatar stefanocoretta avatar troettge avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

stannam

learnb4ss's Issues

Minutes 2021-01-22

  • decided on name: BASS or B4SS - Bayesian analysis for speech scientists.
  • color and logo proposal approved.
  • discussed data set, need input.
  • distributed some topics, not complete yet

Plan for 2021-01-29:

  • finalize topic distribution (?)
  • discuss data set (bodo's politeness data set)

Exercise Rmd

Sessions with exercises

  • 02. Bayes theorem
  • 03. Priors
  • 06. More Bayesian inference
  • 07 > 08. Leveling up

How to

EDIT

@troettge @jvcasillas

The package now provides two functions:

  • open_slides() to open a session's slides in the browser (e.g. open_slides(1)).
  • open_exercise() to open a copy of the exercise Rmd in the editor (e.g. open_exercise(3)).

To create an exercise Rmd, run (adapt session number and title):

usethis::use_vignette("ex_03", title = "Application to regression II - The prior and Bayesian updating")

EDIT the argument below should be eval not echo.

Make sure to set echo = FALSE in knitr::opts_chunk$set().

Make sure to set eval = FALSE in knitr::opts_chunk$set().

The data table can be loaded with:

library(learnB4SS)
data("polite")

I will write a function that opens the exercise Rmd file of a session in the editor (after copying it into the working directory).

We can just create vignettes in the package learnB4SS by naming them according to the session title. Best if we use the same names as in the folders in the slides repository.

The participants will just have to run:

open_exercise(session = 1)

They will also be able to see the knitted versions from the package website.

So you can go ahead and create vignettes in the package as the exercise Rmd files.

dataset

Bodos dataset

(doesn't support upload of .csvs here LOL)

POP.txt

Minutes - 2021/01/22

Action items

  • Think about what you would prefer to present.
  • Think about branding and co (#7).
    • Colours, typefaces, ... (@troettge will prepare a Keynote template)
    • Name of org (maybe speech related).
    • Logo(s).

Meeting 2021-05-12

  • Practicals
    • Perhaps use .Rmd (vignettes in the package).
  • Timeline:
    • Conceptual content ready for June.
    • Practical content ready for end of June.

Meeting 2021-06-18

Agenda

  • Slides for Day 1 ready.
  • Decide how to deploy Rmd files of live coding and exercises and slides.
    • Would be good to have everything in the learnB4SS package (I remember Jo mentioning something about it but don't remember the deets).
    • Put rmds in package, and add functions to package.
  • Make concrete time-line (when does what happen, how and how long does it take)
  • decide on coding details

Priors

What priors to use and how to explain?

  • normal(0,10) for population-levels for articulation rate
  • normal(0,10) for group-levels for articulation rate
  • LKJ(2) for correlation

If we keep it simple in the applications and maybe extend it in the take-home exercises (to say student-t and cauchy) would probably suffice. For now, normal might be good enough? Dunno. What do you think?

Resources

Dump your favorite online resources (apps / shinyapps) and books / paper here for the last session

exercise07 up

still need to add inference at the end and we need to change naming conventions when we agreed on it. Please have a look and give feedback (typos you can just edit and push)

CHECK: Email to participants

Dear Bayesian enthusiasts,

we are very much looking forward to meeting you all and giving you an introduction on Bayesian data analysis in R using the wonderful and truly game-changing package brms. Our landing page contains the most important information about the workshop.

For you to get the most out of the workshop, we would like to seize the moment and manage expectations. Bayesian Inference is a powerful tool that requires somewhat complex computational resources. We need to prepare ourselves for that.

Moreover, we won’t have time to troubleshoot individual problems with running brms on individual machines during the workshop, we can do our best to assist you before the workshop. Brms has several dependencies and compiles Stan code in C++ under the hood. You don’t need to understand either of these things, but the computations that run in the background are more complex than our everyday R shenanigans. Thus, we highly recommend that we all try our best to fix these issues in advance with the help of capable IT staff and our colleagues. Here, we give you a head start and since we still have three weeks, we are confident that we can get most machines in shape.

To prepare your laptops for brms and its dependencies…

(1) please download / update to the latest version of R: https://cran.r-project.org/mirrors.html

(2) I also highly recommend, installing RStudio, as we will provide you with RMarkdown files as additional resources to review after the workshop. Please download / update the latest version of RStudio: https://www.rstudio.com/products/rstudio/download/#download

(3) Now we need our Bayesian engine. We have written a detailed step-by-step guide, to get your machine ready for this here:
https://learnb4ss.github.io/learnB4SS/articles/install-brms.html

Now this should make 95% of you operational! Some of you might have idiosyncratic issues that are difficult to predict. Again, we hope that you can try to approach capable colleagues at your university to help you solve these things.

In order to streamline our communication in general, we have set up a Slack Channel for all of you. Here we can share resources, ask questions and give answers or simply hang out and being social. It is obviously not mandatory to join the channel, but we think it provides an additional layer of helpful communication and a sense of community. We are all rookies in the domain of statistical inference, but we are keen to learn from and with each other.

If you don’t have Slack, you can download it here: https://slack.com/intl/en-no/

To join our Slack channel, click this link:

We are really looking forward to meeting all of you in July.

Cheers,
Timo, Joseph & Stefano

Package website

The package website is not compiling because it depends on a package that has been just recently updated and there aren't binary versions available yet, so the GH action fails. It should work in the next few days though. Just a heads up.

Minutes - TBA

Action items

  • Start drafting workshop materials (slides, tutorials, etc...).

Discuss

  • Questions from ALP:
    • Who manages registration?
      • Something the Events and Outreach Committee could take the lead on, but let me know if you think otherwise.
      • What else the Events and Outreach Committee can do to facilitate the workshop?
    • Need to limit the number of participants?
      • SC: If we do need to limit number, we should reserve a number of places for people from LICs and MICs and minorities.
    • What can we advertise about the subsequent sharing of resources to ALP members who might not be able to attend?

Day3

Still a big question mark: What do you want to do?

Just 1:1 sessions with people? If we do 15 minutes and we take three breaks (à 15 minutes), we could meet 13 people each. That is 39 people in total which is maybe a realistic number of people that would want to make use of this offer.

What do you think?

Showreel

What should we show in the show reel at the end? Some fancy stuff you can do in principle with BRMS.

  • Maybe a GAM-like model with splines?
  • Mixture model?

Minutes - 2021/01/15

  • 🗓 Discuss schedule in light of Molly's email.
    • We go for Mon 5/Tue 6 July, and Mon 12 July reconvene.
  • 📋 Discuss content of workshop (#1).
    • See #1.
    • Also, think about hands on content based on practical aspects related to learnr.
      • Save model outputs to speed up things.
    • Who does what.
      • One presents, and other can monitor chat for Qs.
  • 🛠 Move repo to organisation for more granular permissions settings on GitHub.

Questionnaire feedback

survey here.

Cool! Two things:

  • Career stage. Its multiply choice but overlapping categories and missing ones. Postgrad and Doctoral student overlap, Postdoc and ECR overlap, Category between assistant prof/lecturer and full professor is missing (associate /reader). I'd say: students vs. phd student vs. postdoc vs. faculty (early career = Assistant & Associate Professor) vs. faculty (late career = Full Professor)
  • Minorities, make sure you don't forget a major category in the example listing

@stefanocoretta

01_intro and 03_bayes_theorem comments

@troettge

General comment: Are you planning on using the template (i.e., H1 colors, title slide, final slide, etc)?

01_intro:

  • good graphics, well described
  • "type 1 in chat" I like this idea to keep audience actively participating... maybe "type F in comments" for lolz

03_bayes_theorem

  • I really like the weather example and how it is grounded in hour personal experience
  • First 3:30 is centered on same image/slide with a lot of talking. I imagine this could make it difficult to keep attention (though it will likely be different because they will also see you), but something to think about
  • It would be nice if you dedicate 30 sec to a minute at the end on relating the info presented back to the info presented in 02 (first application to regression). Specifically, making the connection explicit of using Bayes theory with a single event vs. how we are learning to apply it to regression with an entire data set

Content

Link to new document.


Structure of Timo's Bayesian Workshop:

  • Show them how easy brms works (live coding TIMO)

  • Replication Crisis --> partly due to NHST, we need to rethink statistical inference (TIMO)

  • Bayes theorem (vampire example) (TIMO)

    • Introduces concepts of prior and belief update.
  • Application to Regression (JOSEPH)

    • Introduction to data set, introduction to research question (hypothesis) (JOSEPH)
    • HANDS-ON early on.
  • inference over posterior (STEFANO)

    • ❓ Also introduce random variables?
    • ❓ Maybe little exercise or show them (conceptually) how you can get things from posteriors distrs.
    • Short exercise to do before the reconvening day on wrangling posteriors.
  • NHST vs. Bayesian inference (JOSEPH)

    • ❓ Also introduce random variables?
  • Pros and Con's of Bayesian inference (JOSEPH)

  • "One more thing" running Baysian models is easy! Steve Jobs.

This part is related to R syntax.

  • Introduction to data set, introduction to research question (hypothesis) (JOSEPH)
  • explain brm() syntax, explain outcome (JOSEPH)
  • Interpret output (JOSEPH)
    • interpret outcome, regression coefficients
    • HANDS-ON: RUN MODEL AND INTERPRET OUTPUT, EXTRACT POSTERIORS, PLOT
      • Save model output and ask for interpretations questions.
      • "You are fully operational now!"
  • Hypotheses (TIMO)
    • testing hypothesis, parameter estimation, BF, LOO(?)
    • HANDS-ON: HYPOTHESIS() FUNCTION, parameter estimation
  • Priors (STEFANO)
    • Priors, types of priors, impact on posterior, syntax
      • "Priors are the boogie man"
    • Prior predictive checks?
    • HANDS-ON: SPECIFY PRIORS
  • Diagnostics, model convergence, and posterior predictive check (STEFANO)
    • Approximating probability distributions (island hopping example) (STEFANO)
    • HANDS-ON: Posterior predictive check
  • Show reel, what you can do with it

NOTES ON DATA

  • The articulation rate variable is slightly skewed, which might be a great example later on for using pp_check and using alternative distributional assumptions.

Names, branding, logos, etc.

Creating this issue for documenting ideas I come up with over the course of the week. Feel free to ignore, comment, and/or include your ideas. Will delete later.

  • Key terms
    • bayesian data analysis
    • bayesian approaches
    • bayesian framework
    • speech sciences
    • linguistics
    • linguists
  • Ideas for org. names
    • bda4ss (bayesian data analysis for speech sciences, this kind of looks like it says 'badass')
    • b4ss (bayes for speech sciences)
    • bda4ling (bayesian data analysis for linguists)

meh

agreeR syntax to use

We probably should all agree on and stick to one and the same syntax for the same thing (to avoid confusion).

Here are some things that come up often:

  • define priors (with or without quotation marks etc.)
  • arguments of brm (should probably always define the same arguments across applications)
  • extract posteriors
  • plot extracted posteriors

I think participants would benefit from seeing occurring syntax for these things (feel free to add stuff). You guys are more up to date with code-efficient ways to extract posteriors and plotting, so feel free to suggest it here.

Agenda Meeting 2021-06-18

  • Make concrete time-line (when does what happen, how and how long does it take)
  • decide on coding details

02_a2r_demo

@jvcasillas:

S2: “As we just heard from Timo”, we don’t actually, right? Bayes theorem comes after

So the narrative here should be something like: Before we actually tackle the conceptual background, we will get our hand dirty immediately to lose our fear.

S4: Worth explaining the nature of standard error (so you can relate it to Bayesian equivalent?) Maybe too much..

I think there might be a lot of question marks here: What do plausible lines mean? where do they come from? Why are they different? what is a posterior?

I like the walking through the dataset part. Maybe we should do that everytime we introduce new variables / measures.

S9: This might confuse people. When they hear interecept they think random intercept. Maybe worth talking about estimating the mean of the measurement without any predictors instead. What is a parameter?

S10: explain piping %>%

S13: Code chunk on the right goes out of bounce

you also refer to ar_bayes_int in the code chunk, which was not defined earlier on the slides. More descriptive names would help to avoid confusion maybe

Overall, people might be confused to talk about the estimation sigma at this point. Maybe we can drop that? If not explain what sigma conceptually represents (the variance)?

Minutes - 2021/01/08

  1. Jan 2021

Agreed on format: 3hrs + 3hrs + 3hrs

Agreed on tent. date: 2nd + 3rd July (fr-sa) + 10th July (sa).
Timeslot: start at 8am PST / 10am ET / 4pm CET

To do:

  • Set up GitHub discussion (SC)
  • write to Molly regarding dates and format (TR)
  • write down structure of Timo's workshop (TR)

Idea for third date. Distilling the challenges that people faced and potentially review content related to common questions.

Q&A function on Zoom

as far as I understand, the Q&A function is only available for webinars which a special kind of zoom meeting I don't have access to (and is restricted with my account to 100 participants). @stefanocoretta do you know more about this and do you know what LMU has access to?

Dataset properties

Collect properties of ideal dataset (and then ask on twitter)

  • should be speech related
  • ideally super intuitive even for non-phoneticians (to make it compatible with other linguists)
  • ideally open data set of published study that
    • has run a frequentist analysis
    • did not use the full random effect structure
  • contains at least one continuous variable
  • contains at least two additional variables (categorical because easier to interpret, ideally 2x2).
  • contains at least one grouping variable

to be honest, Bodo's original pitch dataset kind of ticks many of these boxes, but very few data points, but I could ask him for the whole data set from this paper here.

Org name

Organisation name on GItHub

  • BASS (already taken)
  • learnBASS
  • ...

Minutes 2021-01-29

  • Times
    • 7am Pacific Time, 4pm CEST.
    • Four hour block
      • 45 + 10 + 45 + 20 + 45 + 10 + 45
  • Finalise content distribution.
  • Finalise dataset.
  • learnB4SS 🎉

Reasons for switching to Bayes

A random list of reasons.

Practical reasons

  • Fitting frequentist models can lead to anti-consevative p-values (meaning increased false positive rate: there is no effect and yet you got a significant p-value). Another interesting example of this for the non-technically inclined reader can be found here: https://365datascience.com/bayesian-vs-frequentist-approach/. LMER tends to be more sensitive to small sample sizes than Bayesian models (with small sample sizes, Bayesian models return estimates with greater uncertainty, which is a more conservative approach).
  • While very simple models will return very similar confidence and credible intervals, in most cases more complex models won't fit if run with frequentist packages like lme4, especially with not adequate enough sample sizes. So, BRMs always converge, while LMERs don't always do.
  • In reality, LMERs require as much work as BRMs, although the common practice is to skip most of the necessary steps when fitting LMERs which gives the impression of LMERs being quicker. Factoring out the time needed to run MCMCs in BRMs, in LMER you still have to perform robust perspective power analysis and post-hoc model checks.
  • With BRMs, you can iteratively reuse posterior distributions from previous work, which effectively speeds up the discovery process (getting to the real value faster). In other words, you can embed previous knowledge in BRMs while you can't in LMERs.

Conceptual reasons

  • LMER cannot provide evidence for a difference between groups, only evidence to reject the null hypothesis.
  • While a frequentist CI only tell us n% of the time if we run the study multiple times that that CI contains the real value (but we don't know whether the CI we got is one of those 5% CIs that DO NOT CONTAIN the real value), a Bayesian CrI ALWAYS tell us that the real value is within a certain range at n% probability. (Of course all conditional on the model and data, which is true both for frequentist and Bayesian models alike). So, LMER really just gives you a point estimate, while BRMs give a range of values.
  • With BRM you can compare any H, not just Null vs Alternative. (Although you can use Info Criteria with LMER).
  • LMER is based on an imaginary set of experiments that you never actually carry out.
  • BRM will converge towards the true value in the long run. LMER does not.

Of course, there are merits in fitting LMERs, for example in corporate decisions, but you'll still have to do a lot of work. The main conceptual difference then is that LMER and BRMs answer very different questions and as (basic) scientists we are generally interested in questions that BRMs can answer and LMER cannot.

Things to mention

Please add things that I should mention

First session

  • WHY are you here (Replication crisis / tools)
  • Expectations (what will you learn / what won't you learn here)
    • won't learn R
    • won't learn interpretation of regression
  • How to get the most of it
    • join Slack channel
    • prepare your machine for brms
    • relax cause materials will be available for review later
  • Roadmap
  • It's ok not to be ok.
  • Be nice to others and yourself.

General

We should make sure to mention:

  • Running chains in parallel (and multi-threading).
  • Using a seed.
  • Saving output.

How to share screen recordings

Hey, I have screen recorded my short intro to Bayes theorem and will do the initial WHY session. How should we share these videos. What would be a good workflow?

Blurb for website

Draft:

"Our understanding of human speech is increasingly shaped by quantitative data. It is thus of critical importance to evaluate quantitative findings inferentially. This workshop aims at introducing Bayesian inference for the quantification of phonetic data. Following other scientific disciplines, the last decades where dominated by statistical inference within the null-hypothesis-significance-testing framework. This framework comes with many conceptual challenges and pitfalls, and comes with technical limitations that prevent us from analyzing our data in an appropriate way. More recently, many researchers have started to use an alternative inferential framework: Bayesian inference. Bayesian inference more closely answers the research questions we ask; it is much more flexible; and it allows us to run appropriate statistical tests. Until recently, this framework was technically very involved and represented computational challenges. These challenges have now been overcome, making Bayesian inference conceptually, technically, and computationally feasible for researchers across disciplines.
This workshop will introduce the logic of Bayesian inference and contrast it to null-hypothesis-significance-testing. After a brief conceptual introduction, the course will walk through a Bayesian statistical analysis using R and the package brms (Bürkner 2017), explaining how to set up a Bayesian regression model (including setting appropriate priors), how to test ‘hypotheses’ (including parameter estimation and Bayes factor), how to interpret the results, how to diagnose model convergence, and how to visualize and report the results. In hands-on exercises, the participants will immediately apply their knowledge to new data sets in R."

Pre- and post-workshop questionnaires

I think it would be very helpful to do a very short pre- and post-workshop questionnaires to ask participants for their expectations and their feedback.

People feel more engaged when they are directly asked to share their thoughts and in the pre-q we can ask to confirm they installed things for example.

We should probably make the post-q optional.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.