Git Product home page Git Product logo

jaspbook's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

jaspbook's Issues

10.10, spurious period

Effect size calculations for the difference between means can be calculated via the Cohen’s d statistic. (Section 10.7).

This ends with a period but the other items in this list do not.

6.2.2, extra "if"

"If someone offers me a bet that if it rains tomorrow then I win $5, but if it doesn’t rain I lose $5."

The word "If" should be removed from the start of this sentence.

2.75, unclear

When you say "In general, the main effect of homogeneous attrition is likely to be that it makes
your sample unrepresentative." is that just because it makes the sample size smaller? Not knowing much about sample size (presumably that is coming later in the book), it seems like it might be useful to say something here about what makes a sample size too small (even if only as a preview for later sections).

2.74, unhelpful sentence

This is a matter of taste, but I really dislike statements like "Selection bias is a pretty broad term." What does "pretty broad" mean? How does this sentence in any way help me understand what selection bias is? The example that follows helps me understand, but it would be better if this introductory sentence said what selection bias is in general, instead of just saying it's "broad."

JASP now performs z-tests

from Rachel Stephens (University of Adelaide), submitted via email:

"...on p212 you say "Now, as I mentioned earlier, the z-test is almost never used in practice. It’s so rarely used in real life that JASP doesn’t have a built in function for it."

But in case you are not aware, it looks like z-tests are (now?) available as an option within the one-sample t-test."

Unknown

3.7, capitalization

"We still haven’t arrived at anything that resembles data analysis. Maybe the next Chapter will get us
a bit closer!"

chapter should be lowercase here

9.1.7, missing paren

Not surprisingly, JASP provides an analysis that will do these calculations for you. From the main
‘Analyses’ toolbar select ‘Frequencies’ - ‘Multinomial Test’. Then in the analysis window that appears
move the variable you want to analyse (choice_1 across into the ‘Factor’ box.

Missing parent after choice_1.

8.5.3, more help in reasoning about the meaning of p value

I felt bad ever since reading section 8.5.3. I understand the two points it makes about why interpreting the p value as the "probability the null value is false" is wrong. However, given that the intuition is so strong to see it that way, it seems like you could be more helpful here than what is here at present, which is mostly just a kind of scolding and dire admonition.

In particular, I found this: http://www.dcscience.net/2014/03/24/on-the-hazards-of-significance-testing-part-2-the-false-discovery-rate-or-how-not-to-make-a-fool-of-yourself-with-p-values/

Now, I'm not sure if you'll agree with the main example he gives on that page, but I found the tree diagram image and the associated example quite helpful in fleshing out my understanding of the relationship between the p value and the probability that the test is giving a meaningful result. It seems to me that the analysis he gives there is frequentist, but it still lets us reason about the probability that our test is giving a meaningful result. That is, we are not asking the probability of whether the null hypothesis of our test is true, but whether, in the universe of all tests (e.g., for treatments for depression) in the world similar to the one we are running, and given an assumption about the overall success rate of such treatments, plus an assumption about the power of these tests, we can say how likely it is that if we get a significant result it's actually false positive. All of this reasoning is subtle (at least in my mind) so I'm not sure if I'm saying everything correctly. But it seems to me that giving an example/analysis like this would help people understand much better the difference between the p value and the chance that they are actually see a true positive.

It might also be useful to show the values for some other assumptions about treatment success rates, p values, and test powers, as I've done in the following spreadsheet, which helps give an intuition for how these false positive rates vary based on those assumptions.

https://docs.google.com/spreadsheets/d/1Gxl1jObj-Jtrshl0I3HeUQrP3EbUbXemyuSlIy4bO5s/edit#gid=0

2.7.7 Regression to the mean, high school example seems really bad

After coming to understand what "regression to the mean" means, I think the long example given in 2.7.7 is pretty bad. It asks why these college school students are not performing quite as well as they did in high school, and it suggests that the natural intuition might be that psychology classes are hurting them. Well, first of all, that wouldn't be my first intuition. It might be that psychology classes are graded more harshly than other classes at university, or something like that. But the larger point is that this whole example just seems like a really bad example of regression to the mean. Regression to the mean (as I now understand it) means that your initial measurements might have been extreme in some (random) way, so it's not surprising that follow-up measurements are less extreme. But to make that work in this high school example you have to posit that "luck" played a big part in the top students being at the top. I mean, maybe a little, but in general being at the top of a high school class is usually not very luck-based at all. Of course, those students who are at the top in high school might do worse at university for various other reasons -- for instance, maybe the more structured, hand-holding nature of high school worked well for them and the more independent nature of university classes is harder for them. But that's not regression to the mean. It just seems like a really poor example for regression to the mean, to assume that all of these student's high school GPAs are somehow a reflection of a random extreme measurement.

9.1.1, pet peeve

I hate it when books say "Hopefully that's pretty clear." It adds no new information. And if it's not clear to me, it just makes me feel dumb. It just feels like a way for the author to express anxiety that his explanation is not in fact clear, and the reader doesn't need/want to see that.

7.5, typo

"What this is telling is is that the range of values has a 95% probability of containing the population mean µ."

is -> us

10.7.1 typo

"In this case, this is the one sample mean X¯ and one (hypothesised) population mean µo to compare it to."

second this -> there

10.3.5, typo

"The confidence interval reported in Figure 10.10 tells you that there’s a if we replicated this study again and again"

delete "there’s a"

2.7.12, Data mining

It might be good to distinguish between this use of "data mining" and the way it's used in recent computer science.

How does one actually compile from scratch?

Is there a makefile hidden somewhere? In the build directory, there is a simple lsj.tex but when I run pdflatex lsj.tex on a clean repo c03797c I get

! Undefined control sequence.
l.4 \abx@aux@sortscheme
                       {nyt}

When I hit [enter] to just push past it I encounter all sorts of additional undefined control sequences.

Causal validity

Hi,

first of all thanks for writing this wonderful, up-to-date and very accessible textbook and making it available for free! This is a tremendous service to all of the behavioral sciences, and I will be sure to use it in my teaching in the future.

When I read chapter 2 on experimental design and particularly the section on validity, I could not help but miss one crucial concept, which I would term causal validity and which was defined by Borsboom, Mellenbergh & van Heerden (2004, Psychological Review) roughly as follows: A measure of an attribute is valid if and only if (a) the attribute exists and (b) manipulations of the attribute lead to corresponding changes in the measure. Think heat and the thermometer here. This is a natural-science definition of validity and was in fact already used more than 50 years earlier in McClelland and colleagues' derivation of motive measures (see, for instance, McClelland, 1958, 1987, chapter 6). Borsboom et al (2009) later argued that construct validity is no longer a tenable concept (e.g., because it tends to fluctuate over time, such as in the case of the phlogiston concept). While it may not be prudent to purge all textbooks of the concept of construct validity (yet), I think your text would benefit tremendously from incorporating Borsboom et al's (2004) definition of validity and contrasting it with construct validity.

Best wishes,
Oliver

references:
Borsboom, D., Cramer, A. O. J., Kievit, R. A., Zand Scholten, A., & Franic, S. (2009). The end of construct validity. In R. W. Lissitz (Ed.), The concept of validity: Revisions, new directions, and applications (pp. 135-170). Charlotte, NC: Information Age Publishing.
Borsboom, D., Mellenbergh, G. J., & van Heerden, J. (2004). The concept of validity. Psychological Review, 111(4), 1061-1071. doi: 2004-19012-010 [pii]
10.1037/0033-295X.111.4.1061
McClelland, D. C. (1958). Methods of measuring human motivation. In J. W. Atkinson (Ed.), Motives in fantasy, action, and society: A method of assessment and study (pp. 7-42). Princeton, NJ: Van Nostrand.
McClelland, D. C. (1987). Human motivation. New York: Cambridge University Press.

4.6.1, remove apostrophes

"Each one of those deaths had it’s own story, was it’s own tragedy, and only some of those are known to us now."

Both should be its.

10, remove words

Lots of real world situations have that character, and so you’ll find that chi-square tests in particular are quite widely used.

Remove the words "in particular".

10.3.2, population means vs sample means of tutorial groups

Okay, so let’s let µ1 denote the true population mean for group 1 (e.g., Anastasia’s students), and µ2 will be the true population mean for group 2 (e.g., Bernadette’s students),7 and as usual we’ll let X¯1 and X¯2 denote the observed sample means for both of these groups.

It confusing me to think about "population" vs. "sample" means in this sentence because as I understand it, the data set we have for the Dr. Harpo's class includes all the students in Anastasia’s tutorial and all the students in Bernadette’s tutorial. So I think this sentence is talking about the more general case where the sample is just a representation, but in the context of this particular example where we have the complete data it's confusing.

Also, on page 226 it says:

In our example, Anastasia’s students had an average grade of 74.533, and Bernadette’s students had an average grade of 69.056, so the difference between the two sample means is 5.478. But of course the difference between population means might be bigger or smaller than this.

But in this case we have all the students that are in the respective tutorials, right? So what does "population" mean?

Well, I guess the population would be the theoretical students who might take each of the two tutorials in future years. Might be good to call this out. I know there is a footnote about this in 10.3.2. It still seems messy.

10.3.1, "factor"

"The tutor variable is a factor that indicates who each student’s tutor was"

I'm not sure if the word "factor" has been used thus far in the text to describe a type of data. If it's a technical term, it should be explained. If it's just a normal word, it's a bad choice. I would just write "The tutor variable indicates who each student’s tutor was".

section 2.6.4, face validity example would help

In all the other sections in this part, there is an example of the kind of validity. An example of face validity would be helpful also -- that is, an example of one or more claims that don't have good face validity, maybe one that actually turned out to be wrong and one that turned out to be right. As I was reading I was trying to imagine a study with poor face validity, maybe something like "Eating bacon makes you smarter," but as soon as imagined that I imagined most people being genuinely interested in the result and willing to entertain the premise. What is the relationship between face validity and the statement "extraordinary claims require extraordinary proof?"

10.2.2, low-level commands

"So there’s not much point in going through the tedious exercise of showing you how to do the calculations using low level commands."

I think "using low level commands" is a carryover from the R version of the book, and should be changed to "by hand."

6.4.1, "binomial" distribution

I happen to know from prior classes that the "binomial distribution" is so called because of the coefficients you get when multiplying out (a + b)^n. It feels to me that it would be worth saying this (and maybe giving the example of (a + b)^4 to show how the coefficients work out), because the term "binomial" just sounds so mysterious otherwise (you ask yourself: "two numbers? what two numbers?"). Given the choice between saying something about (a + b)^n vs. dumping the factorial formula, I would choose the former (but of course you can do both).

section 2.2.2, averaging

I want to complain a little about:

That said, notice that while we can use the natural ordering of these items to construct sensible
groupings, what we can’t do is average them. For instance, in my simple example here, the “average”
response to the question is 1.97. If you can tell me what that means I’d love to know, because it
seems like gibberish to me!

If we accept the premise that it's meaningful to order the four statements on “the extent to
which they agree with the current science” (which is the point of this section) then an average of 1.97 means "on a scale of 1 to 4, 1 being agreement with current science and 4 being disagreement with it, the average response rated 1.97." I wouldn't call that gibberish. In one population I might get an average of 1.2. In another population I might get an average of 3.5. That tells me a lot about those populations.

continue with (IV) and (DV)

After introducing "predictor" and "outcome" as preferred terms to IV and DV, for a while the book continues to say "predictor (IV)" and "outcome (DV)" for several sections. But by section 2.7 it seems to fall back to just "predictor" and "outcome". I understand the reasons for preferring those words, but still I have to stop and think each time I see them, and I'd prefer if the (IV) and (DV) continued to be next to those words throughout the book.

4.6.1, better word

"This is the job of descriptive statistics, but it’s not a job that can be told solely using the numbers."

I don't think one really "tells" a job. So I'd change it to:

"This is the job of descriptive statistics, but it’s not a story that can be told solely using the numbers."

9.1.3, why square?

In talking about negative vs. positive differences it says: "One easy way to fix this is to square everything"

Well, naively, a simpler solution is just to take the absolute value. It might be worth saying in passing why squaring is preferred to taking the absolute value, even if it's just to say "because of math you can't understand."

6.4.1, dice in hand

"In my hot little hand I’m holding 20 identical six-sided dice."

(a) I'm not averse to the whimsical nature of parts of this book, but "hot little hand" seems a bit unnecessarily silly.

(b) I don't know about you, but I doubt I could hold 20 dice in one hand. I'm not sure how important the number 20 is to this overall example, but maybe that can be lowered, or you can change this to something like "holding this mass of dice in both hands".

9.1.9, pet peeve

"If I wanted to write this result up for a paper or something,"

This makes it sound like it might be some random whim to want to write up the result. But of course virtually everyone who is reading this book is doing it precisely because they will need to be writing up such results. (It's as if in book on learning to drive you said, "if you actually wanted to go somewhere or something.") If you want to acknowledge that there might be other reasons to write up the result aside from a paper, you could say "for a paper or other report". But really just dropping "or something" would be best.

8.4.1, missing paren

(1) we choose an α level (e.g., α “ .05;

Needs paren before semicolon.

Also, this table duplicates the item numbers.

10.3, independent-samples vs one-sample

I'm confused about the difference between the one-sample t test example of section 10.2.1 and the independent-samples t test example of section 10.3. You say

A much more common situation arises when you’ve got two different groups of observations. In psychology,
this tends to correspond to two different groups of participants, where each group corresponds to a
different condition in your study

So the example in 10.3 is dividing the students by who their tutor is, and seeing if it makes a difference in their grades. But 10.2.1 was parallel because it divided the students by whether they are taking a psychology course (vs. students who are not taking a psychology course). So there are two conditions also. Or maybe the distinction is that 10.2.1 is comparing the psychology students to the full set of students in Harpo's class (including the psychology students), whereas in 10.2.1 we are comparing two disjoint groups. Is that's the case, I think it's a subtle point and should be called out. Because when I just think of "two different conditions" it feels to me like 10.3 is also testing two different conditions (being in a psychology class vs (probably) not).

9.1.9, more pet peeve

"This is pretty straightforward and hopefully it seems pretty unremarkable. That said, there’s a few things that you should note about this description:"

Please change this to just "There’s a few things that you should note about this description:". Again, saying it's unremarkable and straightforward just makes me feel dumb if I have any questions about it. And in fact I do have questions about it. It says "A chi-square goodness-of fit test was conducted to test whether the choice probabilities were identical for all four suits." As you state below, it's counter-intuitive to say you're testing for the null hypothesis when in reality what you care about (what you're actually doing the test for) is to check if the alternative hypothesis is true. It's good that you explain that point below, but saying up from that it's "straightforward and unremarkable" just makes me feel dumb if I'm puzzled at first why it's phrased in terms of the null hypothesis.

Actually, on closer reading, I see that your section below does not discuss the strangeness of saying you're testing the null hypothesis and not the alternative hypothesis (you just talk about whether it's helpful to say the hypothesis at all). So I think it would be good to call out the strangeness that even though what you actually care about is whether people choose suits non-randomly, you say you are testing whether they choose suits randomly (and hoping they don't).

2.7, missing word

For the most part, artefactual results tend to be [more of] a concern for experimental studies than for non-experimental studies.

some comments about 8.5.2 and 8.5.3

When you say

Okay, so you can see that there are two rather different but legitimate ways to interpret the p value, one based on Neyman’s approach to hypothesis testing and the other based on Fisher’s.

I'm having trouble grasping that the two are "rather different". On the one hand, we say "what error rate are you willing to tolerate?" and on the hand we say "what is the chance that you might have gotten this particular data given that the null hypothesis is true?" It may be just that I'm not understanding, but to me those sound quite similar (and I personally prefer the later because it feels like a simpler way to say it).

But the main point I want to make is that it seems like the standard/Fisher definition is rushed over too quickly in 8.5.2, with just the sentence "we can define the p-value as the probability that we would have observed a test statistic that is at least as extreme as the one we actually did get". That's an abstract-sounding mouthful, and given that this concept is so fundamental, and that the Fisher formulation is the standard, it seems to me it would be good to restate that in terms of the ESP experiment numbers, as is done in section 8.5.1. So it would be a statement like, "On Fishers definition, by saying p = .021, we are saying that there is a 2.1% chance that, even though the null hypothesis is true (that is, theta does actually equal 0.5 and ESP is in fact bogus), I would nonetheless get the 62 out of 100 result that I got in my particular experiment."

9.1., could be clearer

"For example, suppose a group of patients has been
undergoing an experimental treatment and have had their health assessed to see whether their condition
has improved, stayed the same or worsened. A goodness-of-fit test could be used to determine
whether the numbers in each category - improved, no change, worsened - match the numbers that
would be expected given the standard treatment option."

If I understand what this is trying to say, I think it would be clearer if the second sentence was:

"A goodness-of-fit test could be used to determine
whether the numbers in each category - improved, no change, worsened - match the numbers that
would be expected given the standard treatment option, or differ significantly from those numbers, lending evidence to the claim that the experimental treatment is more effective than the standard treatment."

section 2.5.2, anecdotal evidence

It might be useful in this section to mention the term "anecdotal" and how it relates to "case study." It is common to hear people say "Oh, but that's just anecdotal evidence." But what does that really mean? Where is the line between an anecdote and a case study? And what is the value of "anecdotal evidence" to the overall scientific process?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.