fastai / fastbook Goto Github PK

The fastai book, published as Jupyter Notebooks

License: Other

Jupyter Notebook 99.94% Python 0.06%

notebooks fastai deep-learning machine-learning data-science python book

fastbook's Issues

Typo in 01_intro take tou -> take you

In 01_intro notebook :
"The "Show in docs" link take tou to the full documentation"
Tou -> you
Thanks a lot, hope it helps

Error in chapter 10

There is a mistake in the notebook for chapter 10 which makes the sentence mean the opposite of the authors' intent.

Section 1.3.3. Last sentence of the paragraph following the 3rd cell:
The line

The model not without the final layer is
called the encoder.

should be replaced by either

The model without the final layer is
called the encoder.

The model not including the final layer is
called the encoder.

Not sure, is this wrong in chapter 5(0.999 is ten times more than 0.99)?

Is it a typo for "s:"?

Incorrect Momentum Calculation in 16_accel_sgd

I think the momentum calculations in 16_accel_sgd are wrong.

Early in the chapter the formula for weighted average is shown as:

weight.avg = beta * weight.avg + (1-beta) * weight.grad

That looks correct to me for an exponentially weighted average. But the actual implementation a little later in the book is:

def average_grad(p, mom, grad_avg=None, **kwargs):
    if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data)
    return {'grad_avg': grad_avg*mom + p.grad.data}

The last line there is calculating the weight average, but AFAICT its missing the (1-beta) term. Shouldn't it be return {'grad_avg': grad_avg*mom + p.grad.data*(1.0-mom)}?

The same thing occurs in the RMSProp implementation.

Is that intentional? Is the gradient being weighted somewhere else somehow?

As a side note, I noticed with a few runs that training didn't seem to progress much differently with either the versions as they are or with "fixing" them. Kinda odd, since it seems like an important implementation detail to me...

CAM chapter. class selected for plot

Hi
Should the class selected for this plot be 1 instead of 0, since dls.train.vocab is [False, True] with True for a cat?

Missing picture and full explaination on notebook 16.

On notebook 16, it says:
Another approach is to replace the nearest neighbor and convolution combination with a transposed convolution otherwise known as a stride half convolution. This is identical to a regular convolution, but first zero padding is inserted between every pixel in the input. This is easiest to see with a picture — here's a diagram from the excellent convolutional arithmetic paper we have seen before, showing a 3x3 transposed convolution applied to a 3x3 image:

But the problem is this picture is for nearest neighbor interpolation. Thus, it is missing the picture that talks about 3x3 transposed convolution and possible explanation too.

overconfident when training is used but not defined

Thank you for the book it is great.

in notebook 05 you use the term overconfident for a model but don't properly define what it means and if it is good or bad. It would be great to clarify this part.

Strange pattern of missing sentences in notebooks

I am reviewing a few chapters of the notebooks. I am not sure if it is a bug or something, but I consistently noticed that whenever there is a sentence with the words "in <", words/ sentences right after that disappear.

Here are a few examples:

example 1: from 07_sizing_and_tta.ipynb

example 2: from 05_pet_breeds.ipynb

Remarks

I am viewing the notebooks directly in github
When I view the page in raw format, sentences that are missed in markdown are actually present.

It is a bit strange that it happens in many places. Would like to see if others could confirm the phenomenon.

nb_06 transform and normalization introduced without explications.

In nb_06 when doing regression, the following code is introduced without explication, it makes the reading confusing.

  batch_tfms=[*aug_transforms(size=(240,320)), 
                Normalize.from_stats(*imagenet_stats)]

TypeError: 'function' object cannot be interpreted as an integer (After running the first cell of the book)

After I run the first cell of the book got the below error (I also installed all the requirements successfully)

#hide
from utils import *

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-a48dbe7e39cc> in <module>
      1 #hide
----> 2 from utils import *

~/fastbook/utils.py in <module>
      1 # Numpy and pandas by default assume a narrow screen - this fixes that
----> 2 from fastai2.vision.all import *
      3 from nbdev.showdoc import *
      4 from ipywidgets import widgets
      5 from pandas.api.types import CategoricalDtype

~/anaconda3/lib/python3.7/site-packages/fastai2/vision/all.py in <module>
----> 1 from ..basics import *
      2 from ..callback.all import *
      3 from .augment import *
      4 from .core import *
      5 from .data import *

~/anaconda3/lib/python3.7/site-packages/fastai2/basics.py in <module>
----> 1 from .data.all import *
      2 from .optimizer import *
      3 from .callback.core import *
      4 from .learner import *
      5 from .metrics import *

~/anaconda3/lib/python3.7/site-packages/fastai2/data/all.py in <module>
      1 from ..torch_basics import *
----> 2 from .core import *
      3 from .load import *
      4 from .external import *
      5 from .transforms import *

~/anaconda3/lib/python3.7/site-packages/fastai2/data/core.py in <module>
    118 # Cell
    119 @docs
--> 120 class DataLoaders(GetAttr):
    121     "Basic wrapper around several `DataLoader`s."
    122     _default='train'

~/anaconda3/lib/python3.7/site-packages/fastai2/data/core.py in DataLoaders()
    131 
    132     def _set(i, self, v): self.loaders[i] = v
--> 133     train   ,valid    = add_props(lambda i,x: x[i], _set)
    134     train_ds,valid_ds = add_props(lambda i,x: x[i].dataset)
    135 

~/fastcore/fastcore/utils.py in add_props(f, n)
    530 def add_props(f, n=2):
    531     "Create properties passing each of `range(n)` to f"
--> 532     return (property(partial(f,i)) for i in range(n))
    533 
    534 # Cell

TypeError: 'function' object cannot be interpreted as an integer

clean folder needs a copy of utils

the stripped version of code under clean folder is missing utils.py file

ImageClassifierCleaner error in Chapter 2 with rstrip

While running the ImageClassifierCleaner in Chapter 2 Production notebook 02_production.ipynb
for idx,cat in cleaner.change(): shutil.move(cleaner.fns[idx], path/cat)
I am seeing the following error:

AttributeError                            Traceback (most recent call last)
<ipython-input-45-98d27e7a0247> in <module>
      1 #hide
      2 # for idx in cleaner.delete(): cleaner.fns[idx].unlink()
----> 3 for idx,cat in cleaner.change(): shutil.move(cleaner.fns[idx], f"{path}/{cat}")

/opt/conda/envs/fastai/lib/python3.7/shutil.py in move(src, dst, copy_function)
    560             return
    561 
--> 562         real_dst = os.path.join(dst, _basename(src))
    563         if os.path.exists(real_dst):
    564             raise Error("Destination path '%s' already exists" % real_dst)

/opt/conda/envs/fastai/lib/python3.7/shutil.py in _basename(path)
    524     # Thus we always get the last component of the path, even for directories.
    525     sep = os.path.sep + (os.path.altsep or '')
--> 526     return os.path.basename(path.rstrip(sep))
    527 
    528 def move(src, dst, copy_function=copy2):

AttributeError: 'PosixPath' object has no attribute 'rstrip'

evansd/whitenoise#192
It might be related to errbotio/errbot#1340
https://bugs.python.org/issue32689

"Lenet 5" link in 04_mnist_basics.html goes to a site requiring authentication

It just pops up an authentication dialog box with no other information. I realize this external site is not controlled by the authors, but a more accessible site would make for a better link.

Typo: change "benn" to "been" on 7th paragraph under Neural Network History in Chapter 1

Replace "couple" with "tuple" in definition of DataLoader in 06_multicat

"Couple" is grammatically correct, but I believe "tuple" was intended because each mini-batch is a tuple of a couple of x's and y's.

From the nb:

1.1.4 Constructing a DataBlock

How do we convert from a DataFrame object to a DataLoaders object? We generally suggest using the data block API for creating a DataLoaders object, where possible, since it provides a good mix of flexibility and simplicity. Here we will show you the steps that we take to use the data blocks API to construct a DataLoaders object in practice, using this dataset as an example.

As we have seen, PyTorch and fastai have two main classes for representing and accessing a training set or validation set:

Dataset:: A collection that returns a tuple of your independent and dependent variable for a single item
DataLoader:: An iterator that provides a stream of mini-batches, where each mini-batch is a couple of a batch of independent variables and a batch of dependent variables

Font Styling

I see some font inconsistencies should I PR those??

Spelling error: Chapter 8, replace "twice as bit" with "twice as big"

", we can just make it twice as bit, so we" includes a spelling error.

Note: I've been doing these through pull requests, but there are enough duplicate corrections piling up that I feel it's not a good place any longer. Should I move to issues? Alternatively, if contributors could create meaningful PR titles and comments it would be easier to see if the fixes have already been committed.

AttributeError: type object 'Image' has no attribute 'size'

Not working in Colab, but Ok with Binder

While running the 5th notebook, I got error from this line :

dls1 = dblock1.dataloaders([(Path.cwd()/'images'/'grizzly.jpg')]*100, bs=8)

[Errno 2] No such file or directory: '/content/images/grizzly.jpg'

Missing a word in 01_intro

I believe the word "large" or "wide" was accidentally omitted from the sentence:

"Neural networks are special because they are highly flexible, which means they can solve an unusually range of problems just by finding the right weights."

References are empty and unfollowable

I am not sure if it's a problem with notebooks or a problem with GitHub rendering but I cannot see links in notebooks. They are shown as "<". See screenshot attached. Chrome 84, macOS.

Can I Tranlate the Material in different Languages ?

Benefits :

The material would be easily understandable in local languages.
Wider reach and no limits to contribution once understood in local language.
With Languages like Chinese and Japanese engagement would also increase.

Feel Free to comment the benefits everyone can have by this.

Continuity of text in chapter 2

We often talk to people who overestimate both the constraints, and the capabilities of deep learning. Both of these can be problems: underestimating the capabilities means that you might not even try things which could be very beneficial; underestimating the constraints might mean that you fail to consider and react to important issues.

Issue: : In the above text, seems like authors are trying to say- We often talk to people who underestimate, but actual text says overestimate. We might have to correct it if it's the case ..:)

I might be wrong here in inferring it incorrectly, but somehow I feel what text is trying to say and what actually is present is different

log formula typo in chapter 5

Hi, I think there is a typo in the section 'taking the log' in chapter 5.
we should have:
y = b**a to get a = log(y,b)

Best regards

Add bibtex entry to README?

Maybe add a bibtex entry to README.md for citations?

@book{gugger2020deep,
title={Deep Learning for Coders with Fastai and Pytorch: AI Applications Without a PhD},
author={Gugger, S. and Howard, J.},
isbn={9781492045526},
url={https://books.google.no/books?id=xd6LxgEACAAJ},
year={2020},
publisher={O'Reilly Media, Incorporated}
}

Chapter 2 typos

I think they should be `get_idxs` instead of `get_idx`.

Typo: change "Trianing Time" to "Training Time"

Typo: change "Trianing Time" to "Training Time"
before "Sidebar: This Book Was Written in Jupyter Notebooks" in Chapter 1

In nb_10 sentence might need more explanation

I am not sure what this sentence means:

Picking a subword vocab size represents a compromise: a larger vocab means more fewer tokens per sentence, which means faster training, less memory, and less state for the model to remember; but on the downside, it means larger embedding matrices, which require more data to learn.

04_mnist_basics: RuntimeError encountered after running learn.fit_one_cycle

dls = ImageDataLoaders.from_folder(path, num_workers=0)
learn = cnn_learner(dls, resnet18, pretrained=False,
                    loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.1)

RuntimeError Traceback (most recent call last)
in
2 learn = cnn_learner(dls, resnet18, pretrained=False,
3 loss_func=F.cross_entropy, metrics=accuracy)
----> 4 learn.fit_one_cycle(1, 0.1)

~\anaconda3\envs\fastai2\lib\site-packages\fastcore\utils.py in _f(*args, **kwargs)
429 init_args.update(log)
430 setattr(inst, 'init_args', init_args)
--> 431 return inst if to_return else f(*args, **kwargs)
432 return _f
433

~\anaconda3\envs\fastai2\lib\site-packages\fastai2\callback\schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
111 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
112 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 113 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
114
115 # Cell

~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
201 try:
202 self.epoch=epoch; self('begin_epoch')
--> 203 self._do_epoch_train()
204 self._do_epoch_validate()
205 except CancelEpochException: self('after_cancel_epoch')

~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in _do_epoch_train(self)
173 try:
174 self.dl = self.dls.train; self('begin_train')
--> 175 self.all_batches()
176 except CancelTrainException: self('after_cancel_train')
177 finally: self('after_train')

~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in all_batches(self)
151 def all_batches(self):
152 self.n_iter = len(self.dl)
--> 153 for o in enumerate(self.dl): self.one_batch(*o)
154
155 def one_batch(self, i, b):

~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in one_batch(self, i, b)
159 self.pred = self.model(*self.xb); self('after_pred')
160 if len(self.yb) == 0: return
--> 161 self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
162 if not self.training: return
163 self.loss.backward(); self('after_backward')

~\anaconda3\envs\fastai2\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2420 if size_average is not None or reduce is not None:
2421 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2423
2424

~\anaconda3\envs\fastai2\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2216 .format(input.size(0), target.size(0)))
2217 if dim == 2:
-> 2218 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2219 elif dim == 4:
2220 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target' in call to _thnn_nll_loss_forward

RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp:245

RuntimeError Traceback (most recent call last)
in
9
10 learn = cnn_learner(dls, resnet34, metrics=error_rate)
---> 11 learn.fine_tune(1)

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\callback\schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
155 "Fine tune with freeze for freeze_epochs then with unfreeze from epochs using discriminative LR"
156 self.freeze()
--> 157 self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
158 base_lr /= 2
159 self.unfreeze()

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\callback\schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
190 try:
191 self.epoch=epoch; self('begin_epoch')
--> 192 self._do_epoch_train()
193 self._do_epoch_validate()
194 except CancelEpochException: self('after_cancel_epoch')

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\learner.py in _do_epoch_train(self)
163 try:
164 self.dl = self.dls.train; self('begin_train')
--> 165 self.all_batches()
166 except CancelTrainException: self('after_cancel_train')
167 finally: self('after_train')

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\learner.py in all_batches(self)
141 def all_batches(self):
142 self.n_iter = len(self.dl)
--> 143 for o in enumerate(self.dl): self.one_batch(*o)
144
145 def one_batch(self, i, b):

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\data\load.py in iter(self)
95 self.randomize()
96 self.before_iter()
---> 97 for b in _loadersself.fake_l.num_workers==0:
98 if self.device is not None: b = to_device(b, self.device)
99 yield self.after_batch(b)

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
717 # before it starts, and del tries to join but will get:
718 # AssertionError: can only join a started process.
--> 719 w.start()
720 self._index_queues.append(index_queue)
721 self._workers.append(w)

d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\process.py in start(self)
110 'daemonic processes are not allowed to have children'
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect

d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):

d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):

d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)

d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #

d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torch\multiprocessing\reductions.py in reduce_tensor(tensor)
240 ref_counter_offset,
241 event_handle,
--> 242 event_sync_required) = storage.share_cuda()
243 tensor_offset = tensor.storage_offset()
244 shared_cache[handle] = StorageWeakRef(storage)

RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp:245

chapter 10 confusion

The confusion comes from the line:
"So to recap, at every epoch we shuffle our collection of documents to pick one document, and then we transform that one into a stream of tokens."

But from what I understood from reading the rest of the chapter is that we shuffle all the documents, combine all of them together, and then create a stream of tokens. The following is a line from previous on the book:
"First we concatenate all of the documents in our dataset into one big long string"

Can I translate it to Chinese?

Can I translate it to Chinese in my GitHub repo as a study material? of course not for commercial use.

BCEWithLogitsLossFlat() as loss function in 06_multicat.ipynb

Hello,

In the paragraph Binary Cross-Entropy in the notebook 06_multicat.ipynb, it is written:

We don't actually need to tell fastai to use this loss function (although we can if we want) since it will be automatically chosen for us. fastai knows that the DataLoaders has multiple category labels, so it will use nn.BCEWithLogitsLoss by default.

In fact, the loss function by default in this case is not nn.BCEWithLogitsLoss but BCEWithLogitsLossFlat().

It's The Oxford-IIIT Pet Dataset not Oxford-IIT Pet Dataset

It's IIIT and not IIT. IIT and IIIT both are different Universities.

ResNet block should add before the last activation function

In chapter 15, may be the first ResBlock (def forward(self, x): return x + self.convs(x)) should be fixed to return the relu of that and drop it from the 2nd ConvLayer ?

class ResBlock(Module):
    def __init__(self, ni, nf):
        self.convs = nn.Sequential(
            ConvLayer(ni,nf),
            ConvLayer(nf,nf, norm_type=NormType.BatchZero))
        
    def forward(self, x): return x + self.convs(x)

class ResBlock(Module):
    def __init__(self, ni, nf):
        self.convs = nn.Sequential(
            ConvLayer(ni,nf),
            ConvLayer(nf,nf, act_cls=None, norm_type=NormType.BatchZero))
        
    def forward(self, x): return  F.relu(x + self.convs(x))

Can we send PRs to this repo ?

Edit:

I have just seen

Also, you'll see that we've removed relu (act_cls=None) from the final convolution in convs and from idconv, and moved it to after we add the skip connection. The thinking behind this is that the whole ResNet block is like a layer, and you want your activation to be after your layer.

May be this bug can be close ?

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED Chapter 1

When I run the first text classifier example

#id training2
#caption Training loop in a text application
from fastai2.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

The network trains on my 1080Ti for a few minutes and the GPU is utilized. Midway through the second epoch it fails:


epoch | train_loss | valid_loss | accuracy | time
-- | -- | -- | -- | --
0 | 0.597115 | 0.438984 | 0.804160 | 01:34


epoch | train_loss | valid_loss | accuracy | time
-- | -- | -- | -- | --
0 | 0.394730 | 00:40

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-15-772d141f2ac2> in <module>
      5 dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
      6 learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
----> 7 learn.fine_tune(4, 1e-2)

~/fastai2/fastai2/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
    157     self.fit_one_cycle(freeze_epochs, slice(base_lr*2), pct_start=0.99, **kwargs)
    158     self.unfreeze()
--> 159     self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
    160 
    161 # Cell

~/fastai2/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
    110     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    111               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    113 
    114 # Cell

~/fastai2/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    174                     try:
    175                         self.epoch=epoch;          self('begin_epoch')
--> 176                         self._do_epoch_train()
    177                         self._do_epoch_validate()
    178                     except CancelEpochException:   self('after_cancel_epoch')

~/fastai2/fastai2/learner.py in _do_epoch_train(self)
    147         try:
    148             self.dl = self.dls.train;                        self('begin_train')
--> 149             self.all_batches()
    150         except CancelTrainException:                         self('after_cancel_train')
    151         finally:                                             self('after_train')

~/fastai2/fastai2/learner.py in all_batches(self)
    125     def all_batches(self):
    126         self.n_iter = len(self.dl)
--> 127         for o in enumerate(self.dl): self.one_batch(*o)
    128 
    129     def one_batch(self, i, b):

~/fastai2/fastai2/learner.py in one_batch(self, i, b)
    135             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
    136             if not self.training: return
--> 137             self.loss.backward();                            self('after_backward')
    138             self.opt.step();                                 self('after_step')
    139             self.opt.zero_grad()

~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    193                 products. Defaults to ``False``.
    194         """
--> 195         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    196 
    197     def register_hook(self, hook):

~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

If I try to rerun the cell it fails immediately with


epoch | train_loss | valid_loss | accuracy | time
-- | -- | -- | -- | --
0 | 0.975069 | 00:01

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-16-772d141f2ac2> in <module>
      5 dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
      6 learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
----> 7 learn.fine_tune(4, 1e-2)

~/fastai2/fastai2/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
    155     "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
    156     self.freeze()
--> 157     self.fit_one_cycle(freeze_epochs, slice(base_lr*2), pct_start=0.99, **kwargs)
    158     self.unfreeze()
    159     self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)

~/fastai2/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
    110     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    111               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    113 
    114 # Cell

~/fastai2/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    174                     try:
    175                         self.epoch=epoch;          self('begin_epoch')
--> 176                         self._do_epoch_train()
    177                         self._do_epoch_validate()
    178                     except CancelEpochException:   self('after_cancel_epoch')

~/fastai2/fastai2/learner.py in _do_epoch_train(self)
    147         try:
    148             self.dl = self.dls.train;                        self('begin_train')
--> 149             self.all_batches()
    150         except CancelTrainException:                         self('after_cancel_train')
    151         finally:                                             self('after_train')

~/fastai2/fastai2/learner.py in all_batches(self)
    125     def all_batches(self):
    126         self.n_iter = len(self.dl)
--> 127         for o in enumerate(self.dl): self.one_batch(*o)
    128 
    129     def one_batch(self, i, b):

~/fastai2/fastai2/learner.py in one_batch(self, i, b)
    135             self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
    136             if not self.training: return
--> 137             self.loss.backward();                            self('after_backward')
    138             self.opt.step();                                 self('after_step')
    139             self.opt.zero_grad()

~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
    193                 products. Defaults to ``False``.
    194         """
--> 195         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    196 
    197     def register_hook(self, hook):

~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     97     Variable._execution_engine.run_backward(
     98         tensors, grad_tensors, retain_graph, create_graph,
---> 99         allow_unreachable=True)  # allow_unreachable flag
    100 
    101 

RuntimeError: cuda runtime error (400) : invalid resource handle at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THC/generic/THCTensorMath.cu:35

The strange thing is that I ran this cell yesterday with the same conda environment and everything worked.

I'm using CUDA 10.1.

→ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

This is the output of nvidia-smi during the training right before failure

Sat Feb 29 11:17:50 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 108...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   49C    P2   198W / 280W |  11138MiB / 11177MiB |     96%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1222      G   /usr/lib/xorg/Xorg                          1690MiB |
|    0      1904      G   /usr/bin/compiz                               41MiB |
|    0      3764      G   ...uest-channel-token=10439615552343491592    42MiB |
|    0      4391      C   ...nvs/data-science-stack-2.1.0/bin/python  9351MiB |
+-----------------------------------------------------------------------------+

Thanks for making this prerelease available! Happy to provide more info if it helps.

Learning rate optimization diagrams not shown in 04_mnist_basics.ipynb

The graphs are missing but the caption is there:

Now we look to see what would happen if we increased or decreased our parameter by a little bit — the adjustment. This is simply the slope at a particular point:

A graph showing the squared function with the slope at one point

We can change our weight by a little in the direction of the slope, calculate our loss and adjustment again, and repeat this a few times. Eventually, we will get to the lowest point on our curve:

An illustration of gradient descent

It looks like the image links are broken?

Confusing instruction in 04_mnist_basics.ipynb

Line 1591:
Where it says "Let's do the same thing for the sevens, but let's put all the steps together at once to save some time:", there is nothing done differently with the sevens than with the threes. The steps are not any more "together" with the sevens example than with the example code of the threes.

Typo in 04_mnist_basics notebook at Summarizing gradient descent section

Text in the notebook: Our weights need to be improved. To do this, we take a few data items (such as images) that and feed them to our model

Issue : In the italic- highlighted text above, seems like word that is not fitting in context of sentence. Two probable cases, may be something is missed to be written after that or that is actually a typo

08_collab has a wrong sentence

It tells us not just whether a movie is of a kind that people tend not to enjoy watching, but that people type like watching it even if it is of a kind that they would otherwise enjoy!

Should be

It tells us not just whether a movie is of a kind that people tend not to enjoy watching, but that people type do not like watching it even if it is of a kind that they would otherwise enjoy!

Count with me if you ever consider translating it!

I understand that the license restricts the modification of the notebooks, but if you ever consider translating it to Brazilian Portuguese, I can help! Thanks!

Full and Stripped Notebooks - How do we hide the outputs for the course?

I'm running the latest version in Paperspace, and I see the following comment in the book;

Full and Stripped Notebooks: There are two folders containing different versions of the notebooks. The full folder contains the exact notebooks used to create the book you're reading now, with all the prose and outputs. The stripped version has the same headings and code cells, but all outputs and prose have been removed.

Unfortunately, not only is there no course-v4/nbs/full or course-v4/nbs/stripped folders in the repo - which makes sense because the course is changing all the time, maybe they don't exist anymore - but it seems that even inside of the fastbook/clean folder, the outputs are still visible.

This is an image of what it looks like to open course-v4/nbs/14_resnet;

Here is what it looks like to open fastbook/clean/14_resnet

Here is fastbook/14_resnet

Note, each example has the outputs pre-displayed below the cell. (The four images)

I may be in the minority here, but when the notebooks show the output, it really take the magic of seeing everything "going" away. Devil's advocate here, obviously doesn't apply to people very focused on learning the content, but why run the model if I already see 16 cat images with probabilities below them.

Essentially, I don't want to be "spoiled", I want to actually wait the 20 minutes to train some big model and get rewarded the hard way. The only thing better than that I would say is a function definition with some comments;

# Cell
def train_new_model():
    batch_size = 12

    # 1. Make sure you load in the learner!
    # 2. Make sure you've got the data ready from "some_global"!
    # 3. Begin training and print the outputs - hint: docs.fastai.com/logging/print_stuff

I want hardmode, and since there's nothing like the above right now, where can I find the stripped notebooks?

Polish is not an agglutinative language (chapter 10)

There are also "agglutinative languages", like Polish, which can add many morphemes together to create very long "words" which include a lot of separate pieces of information.

Polish does not work this way. Perhaps you meant Turkish.

Can't pickle local object 'parallel_gen.<locals>.f' in 01_intro

When I run the following code(from ch1 intro), I get the error message.

from fastai2.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

AttributeError: Can't pickle local object 'parallel_gen..f'

Would you please let me know how to fix this problem? Thanks in advance.

ModuleNotFoundError: No module named 'fastai2'

when i try to run the code in the first module, i get this error, i have already installed fastai but when i try to install fastai2 whit pip i get ERROR: No matching distribution found for torch>=1.3.0 (from fastai2)

how to make them to work in colab because it uses azure module?

how to make them work in colab because it uses the azure module?
Or I can remove the azure module?
Thank you.

CUDA on Windows errror in Intro

Totally likely that this is my incompetence. But it seems that the code in the Intro notebook does not work out of the box on Windows. This would probably at least require a warning for (the few) Windows users among the book buyers.

The first little classifier fails with:

This is the most related info I could find:
https://pytorch.org/docs/stable/notes/windows.html#cuda-ipc-operations

subclasses Tuple in SiameseImage(Tuple) of 11_midlevel_data

However, the Tuple class does not exist in fastaiv2 or some comments said it is deleted.

Add Binder badge to launch notebooks

To start working with the code right away and share them. I've added it in #161.

Incorrect sentence in 04_mnist_basics.ipynb

Line 2225 says:
Next in mnist_distance we see abs(). You might be able to guess now what this does when applied to a tensor. It applies the method to each individual element in the tensor, and returns a tensor of the results (that is, it applies the method "elementwise"). So in this case, we'll get back 1,010 absolute values.

The shape of the returned tensor after abs() is actually 1,010 by 28 by 28, so there would be 1010*28*28 (791,840) absolute values, not 1,010.

This could be fixed by changing it to say "1,010 matrices of absolute values"

fastai / fastbook Goto Github PK

fastbook's Issues

1.1.4 Constructing a DataBlock

Benefits :

Recommend Projects

Recommend Topics

Recommend Org