Git Product home page Git Product logo

dlfs_code's Introduction

Deep Learning From Scratch code

This repo contains all the code from the book Deep Learning From Scratch, published by O'Reilly in September 2019.

It was mostly for me to keep the code I was writing for the book organized, but my hope is readers can clone this repo and step through the code systematically themselves to better understand the concepts.

Structure

Each chapter has two notebooks: a Code notebook and a Math notebook. Each Code notebook contains the Python code for corresponding chapter and can be run start to finish to generate the results from the chapters. The Math notebooks were just for me to store the LaTeX equations used in the book, taking advantage of Jupyter's LaTeX rendering functionality.

lincoln

In the notebooks in the Chapters 4, 5, and 7 folders, I import classes from lincoln, rather than putting those classes in the Jupyter Notebook itself. lincoln is not currently a pip installable library; th way I'd recommend to be able to import it and run these notebooks is to add a line like the following your .bashrc file:

export PYTHONPATH=$PYTHONPATH:/Users/seth/development/DLFS_code/lincoln

This will cause Python to search this path for a module called lincoln when you run the import command (of course, you'll have to replace the path above with the relevant path on your machine once you clone this repo). Then, simply source your .bashrc file before running the jupyter notebook command and you should be good to go.

Chapter 5: Numpy Convolution Demos

While I don't spend much time delving into the details in the main text of the book, I have implemented the batch, multi-channel convolution operation in pure Numpy (I do describe how to do this and share the code in the book's Appendix). In this notebook, I demonstrate using this operation to train a single layer CNN from scratch in pure Numpy to get over 90% accuracy on MNIST.

dlfs_code's People

Contributors

sethhweidman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dlfs_code's Issues

clarification in chap1

In the below code, could you clarify why are calculating dLdN when you are not using in subsequent calculations

dLdS = np.ones_like(S)

dSdN = deriv(sigma, N)

dLdN = dLdS * dSdN

dNdX = np.transpose(W, (1, 0))

dLdX = np.dot(dSdN, dNdX)
return dLdX

Typo in 01_foundations/Code.ipynb

In the first plot of the Square and ReLU functions:

ax[0].set_title("Square function")
ax[0].set_xlabel("input")
ax[0].set_ylabel("input")

ylabel is a duplicate of xlabel.

should have been:

ax[0].set_title("Square function")
ax[0].set_xlabel("input")
ax[0].set_ylabel("output")

Typo in 02_fundamentals/Code.ipynb

Dear Mr. Weidman,

I am currently trying to understand the code in [45], the function "loss_gradients".

I just want to ask, if in the line
loss_gradients['B1'] = dLdB1.sum(axis=0)

it should be written instead:
loss_gradients['B1'] = dLdB1

Reason:
The expression dLdB1 in my test project shows me, that the dimension of it is (hidden_size,1).
Also the dimension of weights['B1'] is (hidden_size,1).
If the expression additonally sum over all [hidden_size] entries, then each [hidden_size] entry of weights['B1'] is updated with the same value. That seems not correct for me.

Best Regards

exp_ratios function missing

When trying to import out of lincoln.losses it complains:
ImportError: cannot import name 'exp_ratios' from 'lincoln.utils.np_utils'

I have looked and this method does not exist in lincoln.utils.np_utils (or anywhere else in the repo that I can see)

Wrong glorot initialization?

Thanks for your book! It's great!

I wonder, isn't there should be a square root?

Like that:

if self.weight_init == "glorot":
    scale = np.sqrt(2/(num_in + self.neurons))

Numpy documentation of np.random.normal says, that scale receives standard deviation (suppose no root is applied).
Pytorch documentation says, that there should be a square root.

scikit-learn has removed Boston data set

The Boston house pricing data set was removed from scikit-learn. Trying to import load_boston as in the chapter 2 examples notes it was removed in 1.2, citing this article on problems with the data set. This is not noted in the scikit-learn changelog as far as I can tell. (ETA: Its deprecation in 1.0, September 2021, was noted in that changelog).

Unfortunately the California dataset doesn't work as a direct drop-in, having 7 features instead of 13. The Ames dataset has 80(!) features, which is a lot more interesting than I usually give Ames credit for.

I'm not sure of the best path forward, but probably the most expedient is to implement the workaround for pulling the Boston data from the source and patch the feature names back in, as annoying as it is to continue use of it. Otherwise adapting to California is probably workable (but diverges from the text.)

Clearing up my confusion on symbols in Chapter 1 and Appendix A.

I had a hard time wrapping my head around the Matrix Chain rule in both the digital and printed version, and there is a couple of typos or at least inconsistencies (I think, I'm not an expert, but it just didn't click for me) in both.

The whole crucade started because I was more or less getting the core idea, but couldn't figure out a) why the same process left weights in a different spot in a calculation and b) how the heck did the W(t) came about. The more I read through the appendix the less I understood.

I did go through p.221-224 multiple times but, frankly, having forgotten most stuff about derivatives excluding the most basic of stuff and adding inconsistent symbols in couple of places and to my untrained eye a couple of typos here and there made it not only hard but also frustrating.

Specifically on p. 221 there was no information on what is actually calculated (wouldn't hurt to remind that, as I learn visually) and on p. 223 we were calculating dLdX(S) that had to be guessed to be equal (or literally be) dLdX(X) page later (what is the difference or if there is no difference why use two different symbols)? Earlier, in chapter 1 it was reinforced that dLdu(S) is a matrix of ones but then on p.224 it was a whole other matrix (that I get where it came from, but the transition could be underlined). I even tried going through the digital version and it wasn't better - not even talking about the zoom issue, but things like dLdu(N)=dLdu(N) (why?) or later, again, that dLdX(S) is this big matrix and then later it is apparently equivalent to dLdX(X) which is also equivalent to dLdu(S) that is also (???) equivalent to dLdu(S) multiplied by W(t). The last point could also be a little clearer.

I get that this is trivial to you, but I'm trying to learn after a fairly long break from advanced math in general and if it really is supposed to be 'from scratch' where it is underlined how important it is to grasp the concepts (and I guess how gradients are actually calculated is pretty important) it would be beneficial to have the math extra clear so everybody plays on even playing field. Somebody that was hearing about deep learning and just took the book off the shelf to give it a go could be really put off and maybe discouraged to the whole idea by something like this despite reading dilligently. I hope I didn't come off as venting - I just really want to have a good grasp on this subject so I can feel confident about coding myself later as I understand the background. Well, back to reading; maybe the coding part will clear that up for me.

Errata in chapter 1

I guess there is an errata page at the publisher, but I'm more concerned about author seeing this. The initial explanations of functions with mixtures of math, diagrams, and code written a few different ways confusingly mixes up ReLU and LeakyReLU repeatedly.

For example, in Figue 1-1, the diagram labelled "ReLU function" shows a LearkyReLU. On page 6, the code defines leaky_relu correctly, but then the note immediately below says that x.clip(min=0) (i.e. ReLU) is equivalent.

I can just see the process in writing of vacillating between whether ReLU or LeakyReLU is a better example, but the result would be really mysterious to someone who was not already familiar with both of those.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.