Git Product home page Git Product logo

thinkbayes's Introduction

ThinkBayes

Code repository for Think Bayes: Bayesian Statistics Made Simple by Allen B. Downey

Available from Green Tea Press at http://thinkbayes.com.

Published by O'Reilly Media, October 2013.

thinkbayes's People

Contributors

alessandro-gentilini avatar allendowney avatar apaleyes avatar bgschiller avatar martynovs avatar recursing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

thinkbayes's Issues

Why no binomial distribution in the Euro Problem

In the Euro problem, when calculating the likelihood of the entire set at once, it seems like this should use the binomial distribution. The binomial distribution calculates what the odds are of seeing K instances in N draws if the probability is P, and it seems like that's exactly what the likelihood should be, with N being tails + heads, K being heads, and P being x.

How does this likelihood function differ from a binomial?

No module named 'thinkbayes'

Hey Allen,

I wrote "from thinkbayes.py import Pmf" in order to practice but it shows a message that says "No module named 'thinkbayes'".

TypeError: unhashable type: 'Pmf' (thinkbayes.py)

When trying to run code from hockey.py, I get:

Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/ssrra/.spyder-py3/temp.py', wdir='C:/Users/ssrra/.spyder-py3')

File "C:\Users\ssrra\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "C:\Users\ssrra\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/ssrra/.spyder-py3/temp.py", line 541, in
main()

File "C:/Users/ssrra/.spyder-py3/temp.py", line 435, in main
goal_dist1 = MakeGoalPmf(suite1)

File "C:/Users/ssrra/.spyder-py3/temp.py", line 127, in MakeGoalPmf
metapmf.Set(pmf, prob)

File "C:\Users\ssrra.spyder-py3\thinkbayes.py", line 589, in Set
self.d[x] = y

TypeError: unhashable type: 'Pmf'

opps

These files are all symbolic links, rather than the actual files.

Chapter 7 Predictions, Section 7.6 Sudden Death

Dear Professor Downey,

In chapter 7, predictions, we are calculating the probability of winning in sudden-death overtime:

  1. We are creating a mixture of Exponential distributions, but we are taking the parameters of the Poisson distribution to do so. The posterior is our belief of what lambda is, which is the parameter for the Poisson distribution, which is also the expected goals per game, not the time between goals. So what is the rationale behind constructing the exponential mixture from goals per game posterior?

  2. I am assuming this is done to find the distribution of "games until goals". But figure 7.3 is named as "Distribution of time between goals". What is the relationship between "games until goal" (i.e. Poisson parameter) and "time between goals" (i.e. exponential parameter)?

  3. A) Would you think it would be prudent to use the actual time between goals to do this computation? For example, we can get the time between goals for each time till the last 4 matches. Use it as a prior. Then update it with the time between goals of the last 4 games to get the posterior of our belief about the exponential distribution parameter and then make a mixture of it?

  4. B) I feel this would also factor in situations where both teams score the same number of goals (like 2-2) and go to the overtime for sudden-death. In point 2, we are only considering going to overtime if no team scores a goal if I am not mistaken. (unless that's what the rules are - please forgive my ignorance of hockey games).

I would be grateful if you or anyone can throw some light on the matter.

Thanks a lot!

figures for ThinkBayes

Hello,
Thanks for having your book available here. Would it also be possible to upload the figures so the book compiles?

Missing file "BBB_data_from_Rob.csv"

Hi,
I try to run codes in species.py. for method "RunSubject('B1242', conc=1, high=100)", it requires a csv file, "BBB_data_from_Rob.csv". If it is not due to confidential limit, can you please share it?

Thanks for the book and codes, it is nice explained and wrote, I enjoy to read it.

Thanks,
Qiang

Some clarification for Chapter 8 Observer Bias model formulation

Chapter 8 makes an interesting point about Observer Bias on the Red Line, but it took me a while to understand why the distribution over passengers' observed wait times is greater than the true wait times. After some thought it turns out I was assuming a more complicated model than the text. I don't think either model is unreasonable; my intuition just wasn't on the same page and I didn't find an explicit reason in the text to invalidate my model. The correct model might be obvious to most but perhaps the clarification below will help someone in the future:

The text reads:

The average time between trains, as seen by a ran- dom passenger, is substantially higher than the true average.
Why? Because a passenger is more like (sic) to arrive during a large interval than a small one. Consider a simple example: suppose that the time between trains is either 5 minutes or 10 minutes with equal probability. In that case the average time between trains is 7.5 minutes.
But a passenger is more likely to arrive during a 10 minute gap than a 5 minute gap; in fact, twice as likely. If we surveyed arriving passengers, we would find that 2/3 of them arrived during a 10 minute gap, and only 1/3 during a 5 minute gap. So the average time between trains, as seen by an arriving passenger, is 8.33 minutes.

For this to be true, I believe we have to assume a passenger arriving 0 minutes after the previous train has the same observed waiting time as a passenger arriving any arbitrary n > 0 minutes after the train. In other words, a passenger who just missed the previous train and waited the full gap is treated the same as a passenger who just barely made it the train.

My intuition was as follows: In reality, a passenger can arrive at the 9th minute of a 10 minute gap or the 4th minute of a 5 minute gap. Both passengers wait 1 minute. If you model it this way, the biased distribution actually shifts to the left. Why? Let's say there are two passengers arriving per minute (lam = 2). For a 2 minute gap, you might have the following wait times for 4 passengers: [0, 0, 1, 1]. For a 3 minute gap, you might have the following wait times for 6 passengers: [0, 0, 1, 1, 2, 2]. A passenger who waits 0 has arrived just before the train departs. For an n minute gap, wait time n-1 indicates the passenger arrived within the first minute after the previous train departed. From the 2-minute and 3-minute gaps above, you can deduce that across all trains P(wait n) < P(wait n-1). I.e., there is always be a chance for a passenger to wait 0 minutes. But for an e.g. 5 minute gap, it's impossible to wait 6 minutes.

Here is some code to simulate the process and the resulting histogram.

from math import floor
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(0)

n = 50000  # Number of trains.
l = 2     # Passengers arriving per minute.
T = np.random.normal(10, 2, n) # True time between trains.
W1 = []   # Passengers' observed waiting time (my initial formulation).
W2 = []   # Passengers' observed waiting time (Think Bayes Formulation).

for t in T:
    size = int(floor(t * l)) # This many passengers will end up on the next train.
    W1 += list(np.random.uniform(0, floor(t), size))
    W2 += list(np.ones(size) * t)

bins = int(T.max() - T.min())
plt.hist(T, color='red', bins=bins, alpha=0.3, normed=True, label='True wait $\mu=%.3lf$' % T.mean())
plt.hist(W1, color='blue', bins=bins, alpha=0.3, normed=True, label='Observed wait $\mu=%.3lf$' % np.mean(W1))
plt.hist(W2, color='green', bins=bins, alpha=0.3, normed=True, label='Observed wait simplified $\mu=%.3lf$' % np.mean(W2))
plt.legend(fontsize=8)
plt.show()

figure_1

bug on root2, unresolved references

def GaussianCdfInverse(p, mu=0, sigma=1):
"""Evaluates the inverse CDF of the gaussian distribution.

See http://en.wikipedia.org/wiki/Normal_distribution#Quantile_function  

Args:
    p: float

    mu: mean parameter

    sigma: standard deviation parameter

Returns:
    float
"""
x = root2 * erfinv(2 * p - 1)
return mu + x * sigma

Possible issues with figures and results in Chapter 9

Hi,

I have been trying to replicate some of your results and I might have found an issue in Chapter 9.

My results suggest a much tighter posterior after observing the four data points x=[15, 16, 18, 21] than what is suggested in your Figures 9.2 and 9.5, as well as in the reported posterior credible intervals.

In fact, I get something closer to your posterior plots if I only include the last datapoint x=[21].

You can see my results in this colab notebook

trying to solve cookie3.py from first edition

Hello,

I have been trying to pull off exercise 2.1 to create the cookie example without replacement but after failing miserably I checked the GIT repository for ThinkBayes2, and found a code with the solution for the second edition....

To my surprise I saw that I was on the correct track, however when trying to re-write the solution for the first version of ThinkBayes I still could not make it work...

if I use the following to set the hypos

bowl1=dict(vanilla=30,chocolate=10)
bowl2=dict(vanilla=20,chocolate=20)
pmf=Cookie([bowl1, bowl2])

I get:

TypeError: unhashable type: 'dict'

If I use the following (as cookie3.py in ThinkBayes2 )

bowl1=Hist(dict(vanilla=30,chocolate=10))
bowl2=Hist(dict(vanilla=20,chocolate=20)) 

I get :

AttributeError: 'Hist' object has no attribute 'Normalize'

I really want to see the solution to this, any hint or suggestion will be really appreciated!

Many thanks!

Leo

thinkplot.brewer throwing exception in dungeons.py

If dungeons.py is run as is an exception

Traceback (most recent call last):
File "dungeons.py", line 117, in
main()
File "dungeons.py", line 63, in main
colors = thinkplot.Brewer.Colors()
AttributeError: 'module' object has no attribute 'Brewer'

is thrown. I think line 63 should be thinkplot._Brewer.Colors().
It works then, however I am not sure what is it exactly you intend in terms of the single underscore - weakly hidden method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.