sethhweidman / dlfs_code Goto Github PK

Code for the book Deep Learning From Scratch, from O'Reilly September 2019

License: MIT License

Jupyter Notebook 97.09% Python 2.91%

dlfs_code's Introduction

Deep Learning From Scratch code

This repo contains all the code from the book Deep Learning From Scratch, published by O'Reilly in September 2019.

It was mostly for me to keep the code I was writing for the book organized, but my hope is readers can clone this repo and step through the code systematically themselves to better understand the concepts.

Structure

Each chapter has two notebooks: a Code notebook and a Math notebook. Each Code notebook contains the Python code for corresponding chapter and can be run start to finish to generate the results from the chapters. The Math notebooks were just for me to store the LaTeX equations used in the book, taking advantage of Jupyter's LaTeX rendering functionality.

`lincoln`

In the notebooks in the Chapters 4, 5, and 7 folders, I import classes from lincoln, rather than putting those classes in the Jupyter Notebook itself. lincoln is not currently a pip installable library; th way I'd recommend to be able to import it and run these notebooks is to add a line like the following your .bashrc file:

export PYTHONPATH=$PYTHONPATH:/Users/seth/development/DLFS_code/lincoln

This will cause Python to search this path for a module called lincoln when you run the import command (of course, you'll have to replace the path above with the relevant path on your machine once you clone this repo). Then, simply source your .bashrc file before running the jupyter notebook command and you should be good to go.

Chapter 5: Numpy Convolution Demos

While I don't spend much time delving into the details in the main text of the book, I have implemented the batch, multi-channel convolution operation in pure Numpy (I do describe how to do this and share the code in the book's Appendix). In this notebook, I demonstrate using this operation to train a single layer CNN from scratch in pure Numpy to get over 90% accuracy on MNIST.

dlfs_code's People

Contributors

Stargazers

Watchers

Forkers

mhatrep gridl timarkanta kmayerb mathusuthan joshuaballance indranig bfsujason manolaz tamirlan1 garcer3 anxueren yahui624 abhishek12div cherishlrx bbw7561135 hushee69 ux123dlq gyusang tianminzheng squeeko zhenchuan-wang subhash-pal walidelsayed9 victorinno aabdraboeid jimmyhyc aabdygaziev jasonhuayen91 flourscent wbuchanan prayagpatel-007 ozaslanmehmet kasalta srmchem rafacm vanamalivanam squaredr98 cerofrais harrisonfeng 13301338176 akulm26 chanokin lzu-cvpr namdori61 gunnups backprop-fr anhnguyendepocen reedikpoopuu dv11364 sudiptamondal1802 opoyc febikambu world4jason ashyrmamatov01 leostar2054 martinmcgarrigle zczelu gaborstefanics zinuyasha plamenti belteki al-mashud-shishir yongheshinian matescharnitzky marqcordova c4oeng shujian2015 foeinlove jingmouren angrybread chensiqia victor3387 yibit wkrea allensmile yiyinianhua rocliulp josephisin fishstoryyy hank-coffin fmarrabal jianhuaou boston123456 aliman80 pushker-stark paweljarosz82 jos-hjd sirojkhuja gaurav-gosain isurumahakumara riccardojommi charliemalicay kyooryoo marcob95 miaviles turchaev datikken spauly girip11

dlfs_code's Issues

clarification in chap1

In the below code, could you clarify why are calculating dLdN when you are not using in subsequent calculations

dLdS = np.ones_like(S)

dSdN = deriv(sigma, N)

dLdN = dLdS * dSdN

dNdX = np.transpose(W, (1, 0))

dLdX = np.dot(dSdN, dNdX)
return dLdX

Typo in 01_foundations/Code.ipynb

In the first plot of the Square and ReLU functions:

ax[0].set_title("Square function")
ax[0].set_xlabel("input")
ax[0].set_ylabel("input")

ylabel is a duplicate of xlabel.

should have been:

ax[0].set_title("Square function")
ax[0].set_xlabel("input")
ax[0].set_ylabel("output")

Typo in 02_fundamentals/Code.ipynb

Dear Mr. Weidman,

I am currently trying to understand the code in [45], the function "loss_gradients".

I just want to ask, if in the line
loss_gradients['B1'] = dLdB1.sum(axis=0)

it should be written instead:
loss_gradients['B1'] = dLdB1

Reason:
The expression dLdB1 in my test project shows me, that the dimension of it is (hidden_size,1).
Also the dimension of weights['B1'] is (hidden_size,1).
If the expression additonally sum over all [hidden_size] entries, then each [hidden_size] entry of weights['B1'] is updated with the same value. That seems not correct for me.

Best Regards

Typo In Table 2-1. Derivative table for neural network

Hi Seth,

Looking at the derivative for the loss function, dLdP, I think it is missing the '2'.

-(forward_info[y] - forward_info[p]) should be -2(forward_info[y]-forward_info[p])

Original site hosting mnist data (chpt4) gives 403

For chapter 4 the mnist data is giving a 403 (from mnist_download). This data is generally available on the internet though thankfully, i.e. here:

As of this writing, the necessary files are hosted here: https://github.com/golbin/TensorFlow-MNIST/blob/master/mnist/data/train-images-idx3-ubyte.gz

I know this repo is prob no longer maintained, but may be good to include the mnist.pkl, or at least the gzipped files that compromise it.

exp_ratios function missing

When trying to import out of lincoln.losses it complains:
ImportError: cannot import name 'exp_ratios' from 'lincoln.utils.np_utils'

I have looked and this method does not exist in lincoln.utils.np_utils (or anywhere else in the repo that I can see)

Wrong glorot initialization?

Thanks for your book! It's great!

I wonder, isn't there should be a square root?

Like that:

if self.weight_init == "glorot":
    scale = np.sqrt(2/(num_in + self.neurons))

Numpy documentation of np.random.normal says, that scale receives standard deviation (suppose no root is applied).
Pytorch documentation says, that there should be a square root.

scikit-learn has removed Boston data set

The Boston house pricing data set was removed from scikit-learn. Trying to import load_boston as in the chapter 2 examples notes it was removed in 1.2, citing this article on problems with the data set. This is not noted in the scikit-learn changelog as far as I can tell. (ETA: Its deprecation in 1.0, September 2021, was noted in that changelog).

Unfortunately the California dataset doesn't work as a direct drop-in, having 7 features instead of 13. The Ames dataset has 80(!) features, which is a lot more interesting than I usually give Ames credit for.

I'm not sure of the best path forward, but probably the most expedient is to implement the workaround for pulling the Boston data from the source and patch the feature names back in, as annoying as it is to continue use of it. Otherwise adapting to California is probably workable (but diverges from the text.)

Validation Loss doesn't go down in Chapter 3 Code

If you run the chapter 3 code notebook, the validation loss stops after one epoch after the loss goes up.

Clearing up my confusion on symbols in Chapter 1 and Appendix A.

I had a hard time wrapping my head around the Matrix Chain rule in both the digital and printed version, and there is a couple of typos or at least inconsistencies (I think, I'm not an expert, but it just didn't click for me) in both.

The whole crucade started because I was more or less getting the core idea, but couldn't figure out a) why the same process left weights in a different spot in a calculation and b) how the heck did the W(t) came about. The more I read through the appendix the less I understood.

I did go through p.221-224 multiple times but, frankly, having forgotten most stuff about derivatives excluding the most basic of stuff and adding inconsistent symbols in couple of places and to my untrained eye a couple of typos here and there made it not only hard but also frustrating.

Specifically on p. 221 there was no information on what is actually calculated (wouldn't hurt to remind that, as I learn visually) and on p. 223 we were calculating dLdX(S) that had to be guessed to be equal (or literally be) dLdX(X) page later (what is the difference or if there is no difference why use two different symbols)? Earlier, in chapter 1 it was reinforced that dLdu(S) is a matrix of ones but then on p.224 it was a whole other matrix (that I get where it came from, but the transition could be underlined). I even tried going through the digital version and it wasn't better - not even talking about the zoom issue, but things like dLdu(N)=dLdu(N) (why?) or later, again, that dLdX(S) is this big matrix and then later it is apparently equivalent to dLdX(X) which is also equivalent to dLdu(S) that is also (???) equivalent to dLdu(S) multiplied by W(t). The last point could also be a little clearer.

I get that this is trivial to you, but I'm trying to learn after a fairly long break from advanced math in general and if it really is supposed to be 'from scratch' where it is underlined how important it is to grasp the concepts (and I guess how gradients are actually calculated is pretty important) it would be beneficial to have the math extra clear so everybody plays on even playing field. Somebody that was hearing about deep learning and just took the book off the shelf to give it a go could be really put off and maybe discouraged to the whole idea by something like this despite reading dilligently. I hope I didn't come off as venting - I just really want to have a good grasp on this subject so I can feel confident about coding myself later as I understand the background. Well, back to reading; maybe the coding part will clear that up for me.

Errata in chapter 1

I guess there is an errata page at the publisher, but I'm more concerned about author seeing this. The initial explanations of functions with mixtures of math, diagrams, and code written a few different ways confusingly mixes up ReLU and LeakyReLU repeatedly.

For example, in Figue 1-1, the diagram labelled "ReLU function" shows a LearkyReLU. On page 6, the code defines leaky_relu correctly, but then the note immediately below says that x.clip(min=0) (i.e. ReLU) is equivalent.

I can just see the process in writing of vacillating between whether ReLU or LeakyReLU is a better example, but the result would be really mysterious to someone who was not already familiar with both of those.

the model in chapter 4 doesn't work specifically in passing the data into the forward function in the layer.py and base.py

Batch Normalization and Adam Optimizer

Could please also implement the commonly used Normalization layers and the Adam Optimizer?