Git Product home page Git Product logo

pml2-book's Introduction

"Probabilistic Machine Learning: Advanced Topics" by Kevin Murphy.

This repo is used to store the pdf for book 2 (see "releases" tab on RHS). This lets me keep track of downloads and issues in a way which can be tracked separately from book 1.

pml2-book's People

Contributors

murphyk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pml2-book's Issues

Typo page 1171 (printed) counterfactual queries

Page 1171 (printed on page, not viewer page), line 45:
" a distinction between Y(O) and Y(1)" -> "... Y(0) and Y(1)", i.e., the argument of the first Y should be a zero, not an uppercase O.

Version: February 28, 2022.

Energy based models

Feb 28th version

This is real nitpicking, so ignore if you don't care about these things.

P857 "autoregressive models" Perhaps I'm just revealing my ignorance here (in which case this is going to be very embarrassing), but I thought that auto-regressive models could model an arbitrary joint probability distribution if each variable is conditioned on all of the previous ones (as might be possible in principle with e.g., transformers). Either I am mistaken, or possibly you are referring to models that only condition on the previous few variables (I didn't read that chapter yet). It stuck out to me anyway.

P852 L15. I don't think the word "datavector" exists. Perhaps change to "at one of the training data examples" or similar.

P852, 853 Algorithm 28 and 29 have inconsistent spacing of semicolons at the end of lines. Sometimes with a space sometimes without. No semicolon at all at the end of Algorithm 28.

P866 Eq 25.40 has a full stop at the end, which is a little weird given that it's part of a continuing sentence. In general the punctuation of equations is inconsistent throughout, but I don't think anyone will notice except pedants like me. In this case though, it's actually wrong.

P872 L42 If this is camera ready production then you need to deal with the overrun of the inline equation here.
P875 L10 Similar issue with a reference.

Chapter 20 typos

Section Page Comment Line Version
20.2.3 736 Mentions “manifestation shift”, but defined as “conditional shift” in Table 20.1 18 February 28, 2022
20.3.1 739 “...method requires that x values that may occur in the test distribution should also be possible in the training distribution” should read $p_\text{te}({\bf x}) > 0 \implies p_\text{tr}({\bf x}) > 0$. It currently reads as $p_\text{tr}({\bf x}) = 0$. 27 February 28, 2022
20.3.2 740 Suggestion: Define $f_{\boldsymbol\theta}$ and $g_{\boldsymbol\phi}$. 30 February 28, 2022
20.3.2 740 Suggestion: Define $\mathcal{X}_1$, $\mathcal{X}_2$, $\mathcal{H}$, and $\mathcal{Y}_t$. 30 February 28, 2022
20.4.1 742 $y$ variable in $p_\text{tr}({\bf x}, {\bf y}) = p_\text{te}({\bf x}, {\bf y})$ is bold, but rest of refereces to $y$ are not bold. 22 February 28, 2022
20.4.2.3 744 Repeated notation $K = \min\left{K: \sum_{c=1}^K f({\bf x}_c > \lambda_2\right}$ 28 February 28, 2022
20.5.5 750 Typo - $n$ instead of $t$: $\mathcal{D}^t = {({\bf x}_n^t, {\bf y}_n^t): n=1:N_t}$. It’s currently written as $t=1:N_t$. 42 February 28, 2022

A prettified version of the table can be found here.

Typo in equation 12.73 and the form of $Sigma^{-1}$

Draft of “Probabilistic Machine Learning: Advanced Topics” by Kevin Murphy. February 28, 2022, page 486 (print page).

I assume the correct form of equation 12.73 is $\theta_{t+1}=\theta_{t}+\eta\Sigma^{-1}v_{t}$.

Besides, it might be better to clarify the form of $\Sigma^{-1}$, as it seems to be assumed that $\Sigma^{-T}=\Sigma^{-1}$.

Chap 24 Normalizing Flows

(Really struggled to find any errata with this chapter...)
Feb 28th version

P538 L43 For the second term --> For the second term in equation 24.7
P848 L42 neural-network --> neural network
P855 L35 Sylsvester flows --> Sylvester flows

Typo page 65, "Thie->The"

Page 65 (printed on book, not viewer page), line 15-16. "Thie MAP" -> "The MAP".
Version: 28 Feb 2022.

page 624, missing reference, typo

Page 624 (printed on book, not viewer page):

Line 6: missing reference.
Line 23: Let $f(x,theta)$ be ...., and $\theta..$ be (not is) ...

Version 28 Feb 2022.

Chap 17 Bayesian Neural networks pt1

28th Feb version

P623 Eq 17.11 should probably define what \mathcal{S} is. I guess some slightly non-typical notation for a softmax operation.
P624 L6 Missing reference.
P625 L18. Split this sentence in two. You have a recursive structure of "which"s.

Typo in Section 2.3.3 [version 2022-03-14]

(latex page 18, line 2)
"The conditional distributions can be shown (see the supplementary material) prequel to this book, to have the following form:"
-> "The conditional distributions can be shown (see the supplementary material) to have the following form:"

20.2.3 Covariate shift

  1. "In this section, we consider covariate shift, which refers to discriminative classifiers of the form p(X)p(Y|X) where p(X) changes"
    Why are these classifiers called discriminative and not generative? (Since we learn p(x) to generative x)?

  2. In "discriminative classifiers of the form p(X)p(Y|X)" and "For a discriminative model of the form p(y|x)": does x and X mean the same thing in this context, or is letter case important here? (I looked at notation section in the first book, it said that notation is context dependent).

ch 16 DNNs (Simon Prince)

Feb 17 2022 build

P588 L45 visualiziob--> visualization
P589 Eq 16.4 Might be worth mentioning that this is not typical def of convolution (no minus signs)
P595 L41 If $f$ and $f$ are affine --> If $f$ and $g$ are affine
P596 L45 \mathbf{h}_{1}^{*} is not defined anywhere
Figure 16.11 caption "square boxes --> dotted square boxes". (there are a lot of square boxes in this figure!)
P600 L37 I don't understand why Q=V=K=X. These matrices don't even usually have the same dimensionality. Something I'm not getting here, but maybe not an error.
P601 L29 Similarly, I don't understand why Q=V=K=Y
P607 L28 "In the practice... abundant pattern" Maybe rewrite this sentence?
P612 L11 "load-bearding" --> load-bearing
P614 L17 "mdeo" --> mode
Throughout -- decide between "forward mode", "forward-mode" and "forwards mode"
Throughout -- figure out if you are going to punctuate equations as 16.54 or not as 16.4

Overall, I would suggest separating into two chapters. The autodiff stuff is at a much finer level of detail relative to the building blocks themselves and it's rather jarring jumping from one to the other. Also, autodiff doesn't really fall under the category of "deep learning"

Typo page 774

Line 40: we often the inputs -> we often denote the inputs

Also, is p(x|c) really image captioning if c is a caption and x is an image? Wouldn't it rather invent an image for a given caption?

Small typo p.598

Chapter DNN
p598
l35
vecotr -> vector

(version of February 28, 2022)

Typo in Section 2.2.1.5 [version 2022-03-14]

(latex page 8, line 11)
"There are (B choose x) ways to choose the B blue balls..."
-> "There are (B choose x) ways to choose the x blue balls..."
OR "There are (B choose x) ways to choose from the B blue balls..."

Small Typo in Section 8.1.2 **Example: Casino HMM**

image

The last sentence, "In a temporal model, there are several kinds of posteriors we may want to computes, as we discuss in Section 8.1.4", "computes" should be "compute". Further what it means by "In a temporal model" isn't clear.

Equation 25.30

Feb 28 version

Has an unnecessary pair of brackets in denominator of second derivative.

ch 21 (genmo overview) Simon Prince

February 17th version

P767 Bayes' rule equation p(x)-->P(x|z)
P768 L35 It's not strictly true that factor analysis is aka PPCA. More precisely, PPCA is a special case of factor analysis.
P770 L5 -LL --> NLL (to be consistent with previous page)
P774 L28 one can train classifier --> one can train a classifier
P775 L4&5 Chapter 24 --> (Chapter 24), Chapter 22 --> (Chapter 22)

Typo in section 3.1.3.1 [version 2022-03-14]

(latex page 64, line 32)
"We say that a sequence of random variables (\vec{x}_1, vx_2, \dots) is..."
-> "We say that a sequence of random variables (\vec{x}_1, \vec{x}_2, \dots) is..."

Three typos

Feb 28 2022 version
The page number is the printed one

p.29, the last line of the first paragraph of Section 2.5.1, "In Section 2.5.3, we show that A is a convex function over the concave set". I believe it should be convex set instead of concave set

p.42, equation (2.237), it should be the plus sign instead of the minus sign

p.303, the line above equation (7.8), he --> the

Chap 19 Structured Prediction

Feb 28 2022 version

P712 L44 "PGM-D" has not been defined nearby
P712 L46 "less modular much slower" --> "less modular and much slower"
P713 L39 I think p(BIO_{1:T}, POS_{1:T}|words_{1:T|) might be clearer
P714 L41 I'm not an expert I think what you are describing here is a PCFG in Chomsky normal form (or almost, there should be another production rule that yields an empty string). I general PCFG can have more general production rules like C->Baa
P717 L1 "Convolutional networks or CNNs" This reads strangely as if these are two different things.
P719 Eq 19.14. I think you have swapped here from (i) x being the data and inferring y to (ii) y being the data, x being an "exogenous variable" and inferring z. I could be wrong but if so then clarifying this change of notation (or better not changing it) would be helpful.
P721 L31 Open bracket with no close bracket
P726 L45 sentence ends with "and"
P727 L8 trends is --> trends are / trends is --> trend is
P729 L19 "in top row" Might have understood this, but I think the observations are the y's which are in the second row.

Chap 7 Inference Algorithms overview

Feb 28th version
(I thought this chapter was really great BTW)

P302 Figure 7.1 you should probably mention what solid vs empty circles mean
P303 L2 "a combination known variational EM"
P303 L14 You should probably define SVI somewhere
P303 L39 "he unknown variables"
P303 Eq 7.8 might be good to define \mathcal{T}(\boldsymbol \theta)
P303 Eq 7.10 Something wrong here. I suspect it's supposed to be two equations and there's a \ missing
P304 Eq 7.13 puzzled me but I am quite stupid so it might be right. I guess what is confusing me is that when we integrate a function by dividing it into bars, there needs to be some sense of the width of the bars. Otherwise, this denominator will just get bigger as K increases whereas it should just become closer to the normalization constant.
P304 Eq 7.14 Technically either Bern(y_n|\theta) or something like Bin_{1}(y_{n}, \theta).
P305 Fig 7.2 (and elsewhere). Tsk... axis labels please.
P306 Fig 7.3c Is this supposed to be increasing? and if so could you plot a smoothed version to show that
P306 Fig 7.3d It's a bit yucky that this goes below zero out of the valid range of the parameters. I assume due to the kernel density process.
P307 Eq 7.22. This is a key chapter for getting the main ideas of inference down; I think this equation could use an extra line. At the moment, you both add in the defn of KL divergence and use Bayes rule in a single step without either being defined nearby which is a lot to take if you don't already know all this stuff.
P309 L42 It seemed weird to consider this a uniform noise model, given that it's variance is a 10th of the noiseless data value. Just maybe double check this is right. It would have made more sense to me if the variances were the other way around.
P310 Fig 7.5. Not an error, but it just made me laugh that the Laplace approximation does so well.

typo

"dstribution" typo in section 19.3.1

Error page 623, relationship cold & tempered priors

Page 623 (printed on book, not viewer page), line 36. The relationship should be $\tilde{\sigma}^{2} = \sigma^{2} / T$, i.e., it should involve variances, not standard deviations.

Version: 28 Feb 2022.

Typo in section 3.1.2 [version 2022-03-14]

(latex page 63, last line of the page)
"Our assumptions about how the data depends on the parameters is captured in the ..."
-> "Our assumptions about how the data depends on the parameters are captured in the ..."

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.