probml / pml2-book Goto Github PK

Probabilistic Machine Learning: Advanced Topics

License: MIT License

pml2-book's Introduction

"Probabilistic Machine Learning: Advanced Topics" by Kevin Murphy.

This repo is used to store the pdf for book 2 (see "releases" tab on RHS). This lets me keep track of downloads and issues in a way which can be tracked separately from book 1.

pml2-book's People

Contributors

Stargazers

Watchers

Forkers

jingmouren zephyr-29 sandy4321 nicoschif pat749 dumpmemory zzmlover yzqctf tigerneil xinglong-li skylaronomics pdrgrc restevesd avudzor mnwrhsn pppnnn lucasgrjn mcomsa snapbuy ndn96 neo4reo svillaza baranwa2 alanpeng0897 cameleonh dumpanalysis wangcho2k souvickg wgova angelbautista17 leomauro namhar leofn deltadedirac emmanuelito ozkanyildirim terragord7 adbmd aneydrac ajitashree mohamed82008 enderakay mrzzy2021 ankitsharma22458 kylelmiller htnghia87 jamesthesnake xiangyuyang-opt sbg3k mahdi-akraminia mbrukman xiaolongguo yhna940 fakerpawnno1 suiyun0234 awvegg sprawesh juampamuc sea-conch hdocmsu dtegegn slesinger smith-co jabhij shermineh-gh aleksei-mashlakov congltk1234 clever-boy projetsplusia collawolley juandp9 qianmuluo innovationexploited zikai1 mastertsai patryknextdoor kumsh mingxiao-li rphilipzhang dothuhahb98 shizukann arshahin nhdzll guoqiang-fu xuewei01 sch39 rohitn j0pgrm voladorlu josuejjv saeednajafi r-shruthi11 evelynmitchell eastjeep darrenmk yyn-whut sofiane-20241050 youtubezou jzd04 mychen233

pml2-book's Issues

Typo page 1171 (printed) counterfactual queries

Page 1171 (printed on page, not viewer page), line 45:
" a distinction between Y(O) and Y(1)" -> "... Y(0) and Y(1)", i.e., the argument of the first Y should be a zero, not an uppercase O.

Version: February 28, 2022.

Energy based models

Feb 28th version

This is real nitpicking, so ignore if you don't care about these things.

P857 "autoregressive models" Perhaps I'm just revealing my ignorance here (in which case this is going to be very embarrassing), but I thought that auto-regressive models could model an arbitrary joint probability distribution if each variable is conditioned on all of the previous ones (as might be possible in principle with e.g., transformers). Either I am mistaken, or possibly you are referring to models that only condition on the previous few variables (I didn't read that chapter yet). It stuck out to me anyway.

P852 L15. I don't think the word "datavector" exists. Perhaps change to "at one of the training data examples" or similar.

P852, 853 Algorithm 28 and 29 have inconsistent spacing of semicolons at the end of lines. Sometimes with a space sometimes without. No semicolon at all at the end of Algorithm 28.

P866 Eq 25.40 has a full stop at the end, which is a little weird given that it's part of a continuing sentence. In general the punctuation of equations is inconsistent throughout, but I don't think anyone will notice except pedants like me. In this case though, it's actually wrong.

P872 L42 If this is camera ready production then you need to deal with the overrun of the inline equation here.
P875 L10 Similar issue with a reference.

Typo in Section 2.3.1 [version 2022-03-14]

(latex page 17, line 3)
"Figure 2.7 plot some MVN densities..."
-> "Figure 2.7 plots some MVN densities..."

Typo - Chapter 22 latex page 787 (Feb 28 2022 version)

When re-expressing the ELBO in equations 22.16 and 22.17 I believe the term $q_{\phi}(z|x)$ should be $log(q_{\phi}(z|x))$.

Typo in Book 2, Figure 26.2, Page 879, Line 24 #291

genertor" -> "generator"

Thanks for the great book(s)!

Chapter 20 typos

Section	Page	Comment	Line	Version
20.2.3	736	Mentions “manifestation shift”, but defined as “conditional shift” in Table 20.1	18	February 28, 2022
20.3.1	739	“...method requires that x values that may occur in the test distribution should also be possible in the training distribution” should read $p_\text{te}({\bf x}) > 0 \implies p_\text{tr}({\bf x}) > 0$. It currently reads as $p_\text{tr}({\bf x}) = 0$.	27	February 28, 2022
20.3.2	740	Suggestion: Define $f_{\boldsymbol\theta}$ and $g_{\boldsymbol\phi}$.	30	February 28, 2022
20.3.2	740	Suggestion: Define $\mathcal{X}_1$, $\mathcal{X}_2$, $\mathcal{H}$, and $\mathcal{Y}_t$.	30	February 28, 2022
20.4.1	742	$y$ variable in $p_\text{tr}({\bf x}, {\bf y}) = p_\text{te}({\bf x}, {\bf y})$ is bold, but rest of refereces to $y$ are not bold.	22	February 28, 2022
20.4.2.3	744	Repeated notation $K = \min\left{K: \sum_{c=1}^K f({\bf x}_c > \lambda_2\right}$	28	February 28, 2022
20.5.5	750	Typo - $n$ instead of $t$: $\mathcal{D}^t = {({\bf x}_n^t, {\bf y}_n^t): n=1:N_t}$. It’s currently written as $t=1:N_t$.	42	February 28, 2022

A prettified version of the table can be found here.

Typo in equation 12.73 and the form of $Sigma^{-1}$

Draft of “Probabilistic Machine Learning: Advanced Topics” by Kevin Murphy. February 28, 2022, page 486 (print page).

I assume the correct form of equation 12.73 is $\theta_{t+1}=\theta_{t}+\eta\Sigma^{-1}v_{t}$.

Besides, it might be better to clarify the form of $\Sigma^{-1}$, as it seems to be assumed that $\Sigma^{-T}=\Sigma^{-1}$.

Chap 24 Normalizing Flows

(Really struggled to find any errata with this chapter...)
Feb 28th version

P538 L43 For the second term --> For the second term in equation 24.7
P848 L42 neural-network --> neural network
P855 L35 Sylsvester flows --> Sylvester flows

small typo, pg.XXXV (pdf viewer page 35) preface, version - 2022-02-28

in second paragraph of preface, "latent varibale" --> "latent variable"

Typo in section 2.2.1.4 and, 2.2.1.5 first sentence; format inconsistency 2.2.2.2 vs. 2.2.2.4 [version 2022-03-14]

2.2.1.4 and, 2.2.1.5 : "or" should become "of"
2.2.2.2 vs. 2.2.2.4 : the expression "folded over" is bold in 2.2.2.2 but not in 2.2.2.4

Typo page 65, "Thie->The"

Page 65 (printed on book, not viewer page), line 15-16. "Thie MAP" -> "The MAP".
Version: 28 Feb 2022.

Typo in section 3.1.5 [version 2022-03-14]

(latex page 65, line 34)
"A simpler approach to this problem to just compute the MAP estimate..."
-> "A simpler approach to this problem is to just compute..."

page 624, missing reference, typo

Page 624 (printed on book, not viewer page):

Line 6: missing reference.
Line 23: Let $f(x,theta)$ be ...., and $\theta..$ be (not is) ...

Version 28 Feb 2022.

Chap 17 Bayesian Neural networks pt1

28th Feb version

P623 Eq 17.11 should probably define what \mathcal{S} is. I guess some slightly non-typical notation for a softmax operation.
P624 L6 Missing reference.
P625 L18. Split this sentence in two. You have a recursive structure of "which"s.

Typo in Section 2.3.3 [version 2022-03-14]

(latex page 18, line 2)
"The conditional distributions can be shown (see the supplementary material) prequel to this book, to have the following form:"
-> "The conditional distributions can be shown (see the supplementary material) to have the following form:"

Broken Link in Figure 8.5

Under section 8.1.4.1 Filtering, the link to kf_tracking_demo.py is given as https://github.com/probml/JSL/blob/main/jsl/demos/kf_tracking_demo.py, instead, it should be https://github.com/probml/JSL/blob/main/jsl/demos/kf_tracking.py

20.2.3 Covariate shift

"In this section, we consider covariate shift, which refers to discriminative classifiers of the form p(X)p(Y|X) where p(X) changes"
Why are these classifiers called discriminative and not generative? (Since we learn p(x) to generative x)?
In "discriminative classifiers of the form p(X)p(Y|X)" and "For a discriminative model of the form p(y|x)": does x and X mean the same thing in this context, or is letter case important here? (I looked at notation section in the first book, it said that notation is context dependent).

ch 16 DNNs (Simon Prince)

Feb 17 2022 build

P588 L45 visualiziob--> visualization
P589 Eq 16.4 Might be worth mentioning that this is not typical def of convolution (no minus signs)
P595 L41 If $f$ and $f$ are affine --> If $f$ and $g$ are affine
P596 L45 \mathbf{h}_{1}^{*} is not defined anywhere
Figure 16.11 caption "square boxes --> dotted square boxes". (there are a lot of square boxes in this figure!)
P600 L37 I don't understand why Q=V=K=X. These matrices don't even usually have the same dimensionality. Something I'm not getting here, but maybe not an error.
P601 L29 Similarly, I don't understand why Q=V=K=Y
P607 L28 "In the practice... abundant pattern" Maybe rewrite this sentence?
P612 L11 "load-bearding" --> load-bearing
P614 L17 "mdeo" --> mode
Throughout -- decide between "forward mode", "forward-mode" and "forwards mode"
Throughout -- figure out if you are going to punctuate equations as 16.54 or not as 16.4

Overall, I would suggest separating into two chapters. The autodiff stuff is at a much finer level of detail relative to the building blocks themselves and it's rather jarring jumping from one to the other. Also, autodiff doesn't really fall under the category of "deep learning"

Misspelled word in section 20.2.2, line 33

beieve -> believe

Typo page 774

Line 40: we often the inputs -> we often denote the inputs

Also, is p(x|c) really image captioning if c is a caption and x is an image? Wouldn't it rather invent an image for a given caption?

Typo p. 622, parameter beta should be boldfaced

Page 622 (printed on book, not viewer page), line 13-14: the \beta parameter in the evidence should be a bold symbol, to be consistent with the previous text.

Version: 28 Feb 2022.

Missing link, page 65 (latex), line 16/17 [version 2022-03-14]

just a missing link ;)

love the book so far!

Small typo p.598

Chapter DNN
p598
l35
vecotr -> vector

(version of February 28, 2022)

Typo in section 3.1.3.1 [version 2022-03-14]

(latex page 65, line 8)
"where \theta is some hidden commmon random variable ..."
-> "where \theta is some hidden common random variable..."

Typo in Section 2.2.1.5 [version 2022-03-14]

(latex page 8, line 11)
"There are (B choose x) ways to choose the B blue balls..."
-> "There are (B choose x) ways to choose the x blue balls..."
OR "There are (B choose x) ways to choose from the B blue balls..."

capital letter mistake, page 65 (latex) last line [version 2022-03-14]

"if" should be "If" :)

Small Typo in Section 8.1.2 Example: Casino HMM

The last sentence, "In a temporal model, there are several kinds of posteriors we may want to computes, as we discuss in Section 8.1.4", "computes" should be "compute". Further what it means by "In a temporal model" isn't clear.

Typo in Section 2.2.2.4 [version 2022-03-14]

(latex page 10, line 26)
"This distribution notable for having..."
-> "This distribution is notable for having..."

Equation 25.30

Feb 28 version

Has an unnecessary pair of brackets in denominator of second derivative.

ch 21 (genmo overview) Simon Prince

February 17th version

P767 Bayes' rule equation p(x)-->P(x|z)
P768 L35 It's not strictly true that factor analysis is aka PPCA. More precisely, PPCA is a special case of factor analysis.
P770 L5 -LL --> NLL (to be consistent with previous page)
P774 L28 one can train classifier --> one can train a classifier
P775 L4&5 Chapter 24 --> (Chapter 24), Chapter 22 --> (Chapter 22)

please specify the version (date) of the book when you post errors

please specify the version (date) of the book when you post errors.
Also please specify if the page number is the pdf viewer page number, or the latex (print) page number.

Copy-Paste mistake? page 48(latex) top [version 2022-03-14]

Typo in Section 2.2.1.2 [version 2022-03-14]

(latex page 6, line 24)
"If the k'th element of x counts the number of time..."
-> "If the k'th element of x counts the number of times..."

11.5 Importance sampling

There is an extra pi(x) in (11.30)

Typo in section 3.1.3.1 [version 2022-03-14]

(latex page 64, line 32)
"We say that a sequence of random variables (\vec{x}_1, vx_2, \dots) is..."
-> "We say that a sequence of random variables (\vec{x}_1, \vec{x}_2, \dots) is..."

Typo in Preface [version 2022-03-14]

(Pdf page 35)
First sentence of fourth paragraph:
"Since this book cover so many topics" -> "Since this book covers so many topics"

Missing word on page 19 (latex) line 10 [version 2022-03-14]

... and we have M different ...

Potential error in section 3.2.1.5? [version 2022-03-14]

(latex page 72, line 3)
"This is like saying we believe we have already seen two heads and two tails before we see the actual data;"
-> shouldn't this be "... one head and one tail before..." ?

Typo in Section 2.2.1.4 [version 2022-03-14]

(latex page 7, line 41)
"This two parameter family has modeling more flexibility than..."
-> "This two parameter family has more modeling flexibility than..."

Three typos

Feb 28 2022 version
The page number is the printed one

p.29, the last line of the first paragraph of Section 2.5.1, "In Section 2.5.3, we show that A is a convex function over the concave set". I believe it should be convex set instead of concave set

p.42, equation (2.237), it should be the plus sign instead of the minus sign

p.303, the line above equation (7.8), he --> the

Typo in section 3.1.5.2 [version 2022-03-14]

(latex page 67, line 30)
"... in the Bayesian approach, we predicting using ..."
-> "in the Bayesian approach, we predict using..."

Typo in Figure 3.3 [version 2022-03-14]

(latex page 68, line 34)
"Used with kind permission of Luis Carvahlo"
-> "Used with kind permission of Luis Carvalho"

Chap 19 Structured Prediction

Feb 28 2022 version

P712 L44 "PGM-D" has not been defined nearby
P712 L46 "less modular much slower" --> "less modular and much slower"
P713 L39 I think p(BIO_{1:T}, POS_{1:T}|words_{1:T|) might be clearer
P714 L41 I'm not an expert I think what you are describing here is a PCFG in Chomsky normal form (or almost, there should be another production rule that yields an empty string). I general PCFG can have more general production rules like C->Baa
P717 L1 "Convolutional networks or CNNs" This reads strangely as if these are two different things.
P719 Eq 19.14. I think you have swapped here from (i) x being the data and inferring y to (ii) y being the data, x being an "exogenous variable" and inferring z. I could be wrong but if so then clarifying this change of notation (or better not changing it) would be helpful.
P721 L31 Open bracket with no close bracket
P726 L45 sentence ends with "and"
P727 L8 trends is --> trends are / trends is --> trend is
P729 L19 "in top row" Might have understood this, but I think the observations are the y's which are in the second row.

Chap 7 Inference Algorithms overview

Feb 28th version
(I thought this chapter was really great BTW)

P302 Figure 7.1 you should probably mention what solid vs empty circles mean
P303 L2 "a combination known variational EM"
P303 L14 You should probably define SVI somewhere
P303 L39 "he unknown variables"
P303 Eq 7.8 might be good to define \mathcal{T}(\boldsymbol \theta)
P303 Eq 7.10 Something wrong here. I suspect it's supposed to be two equations and there's a \ missing
P304 Eq 7.13 puzzled me but I am quite stupid so it might be right. I guess what is confusing me is that when we integrate a function by dividing it into bars, there needs to be some sense of the width of the bars. Otherwise, this denominator will just get bigger as K increases whereas it should just become closer to the normalization constant.
P304 Eq 7.14 Technically either Bern(y_n|\theta) or something like Bin_{1}(y_{n}, \theta).
P305 Fig 7.2 (and elsewhere). Tsk... axis labels please.
P306 Fig 7.3c Is this supposed to be increasing? and if so could you plot a smoothed version to show that
P306 Fig 7.3d It's a bit yucky that this goes below zero out of the valid range of the parameters. I assume due to the kernel density process.
P307 Eq 7.22. This is a key chapter for getting the main ideas of inference down; I think this equation could use an extra line. At the moment, you both add in the defn of KL divergence and use Bayes rule in a single step without either being defined nearby which is a lot to take if you don't already know all this stuff.
P309 L42 It seemed weird to consider this a uniform noise model, given that it's variance is a 10th of the noiseless data value. Just maybe double check this is right. It would have made more sense to me if the variances were the other way around.
P310 Fig 7.5. Not an error, but it just made me laugh that the Laplace approximation does so well.