fastai / fastbook Goto Github PK
View Code? Open in Web Editor NEWThe fastai book, published as Jupyter Notebooks
License: Other
The fastai book, published as Jupyter Notebooks
License: Other
In 01_intro notebook :
"The "Show in docs" link take tou to the full documentation"
Tou -> you
Thanks a lot, hope it helps
There is a mistake in the notebook for chapter 10 which makes the sentence mean the opposite of the authors' intent.
Section 1.3.3. Last sentence of the paragraph following the 3rd cell:
The line
The model not without the final layer is
called the encoder.
should be replaced by either
The model without the final layer is
called the encoder.
or
The model not including the final layer is
called the encoder.
I think the momentum calculations in 16_accel_sgd are wrong.
Early in the chapter the formula for weighted average is shown as:
weight.avg = beta * weight.avg + (1-beta) * weight.grad
That looks correct to me for an exponentially weighted average. But the actual implementation a little later in the book is:
def average_grad(p, mom, grad_avg=None, **kwargs):
if grad_avg is None: grad_avg = torch.zeros_like(p.grad.data)
return {'grad_avg': grad_avg*mom + p.grad.data}
The last line there is calculating the weight average, but AFAICT its missing the (1-beta)
term. Shouldn't it be return {'grad_avg': grad_avg*mom + p.grad.data*(1.0-mom)}
?
The same thing occurs in the RMSProp implementation.
Is that intentional? Is the gradient being weighted somewhere else somehow?
As a side note, I noticed with a few runs that training didn't seem to progress much differently with either the versions as they are or with "fixing" them. Kinda odd, since it seems like an important implementation detail to me...
On notebook 16, it says:
Another approach is to replace the nearest neighbor and convolution combination with a transposed convolution otherwise known as a stride half convolution. This is identical to a regular convolution, but first zero padding is inserted between every pixel in the input. This is easiest to see with a picture — here's a diagram from the excellent convolutional arithmetic paper we have seen before, showing a 3x3 transposed convolution applied to a 3x3 image:
But the problem is this picture is for nearest neighbor interpolation. Thus, it is missing the picture that talks about 3x3 transposed convolution and possible explanation too.
Thank you for the book it is great.
in notebook 05 you use the term overconfident for a model but don't properly define what it means and if it is good or bad. It would be great to clarify this part.
I am reviewing a few chapters of the notebooks. I am not sure if it is a bug or something, but I consistently noticed that whenever there is a sentence with the words "in <", words/ sentences right after that disappear.
Here are a few examples:
example 1: from 07_sizing_and_tta.ipynb
example 2: from 05_pet_breeds.ipynb
Remarks
It is a bit strange that it happens in many places. Would like to see if others could confirm the phenomenon.
In nb_06 when doing regression, the following code is introduced without explication, it makes the reading confusing.
batch_tfms=[*aug_transforms(size=(240,320)),
Normalize.from_stats(*imagenet_stats)]
After I run the first cell of the book got the below error (I also installed all the requirements successfully)
#hide
from utils import *
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-a48dbe7e39cc> in <module>
1 #hide
----> 2 from utils import *
~/fastbook/utils.py in <module>
1 # Numpy and pandas by default assume a narrow screen - this fixes that
----> 2 from fastai2.vision.all import *
3 from nbdev.showdoc import *
4 from ipywidgets import widgets
5 from pandas.api.types import CategoricalDtype
~/anaconda3/lib/python3.7/site-packages/fastai2/vision/all.py in <module>
----> 1 from ..basics import *
2 from ..callback.all import *
3 from .augment import *
4 from .core import *
5 from .data import *
~/anaconda3/lib/python3.7/site-packages/fastai2/basics.py in <module>
----> 1 from .data.all import *
2 from .optimizer import *
3 from .callback.core import *
4 from .learner import *
5 from .metrics import *
~/anaconda3/lib/python3.7/site-packages/fastai2/data/all.py in <module>
1 from ..torch_basics import *
----> 2 from .core import *
3 from .load import *
4 from .external import *
5 from .transforms import *
~/anaconda3/lib/python3.7/site-packages/fastai2/data/core.py in <module>
118 # Cell
119 @docs
--> 120 class DataLoaders(GetAttr):
121 "Basic wrapper around several `DataLoader`s."
122 _default='train'
~/anaconda3/lib/python3.7/site-packages/fastai2/data/core.py in DataLoaders()
131
132 def _set(i, self, v): self.loaders[i] = v
--> 133 train ,valid = add_props(lambda i,x: x[i], _set)
134 train_ds,valid_ds = add_props(lambda i,x: x[i].dataset)
135
~/fastcore/fastcore/utils.py in add_props(f, n)
530 def add_props(f, n=2):
531 "Create properties passing each of `range(n)` to f"
--> 532 return (property(partial(f,i)) for i in range(n))
533
534 # Cell
TypeError: 'function' object cannot be interpreted as an integer
the stripped version of code under clean folder is missing utils.py file
While running the ImageClassifierCleaner
in Chapter 2 Production notebook 02_production.ipynb
for idx,cat in cleaner.change(): shutil.move(cleaner.fns[idx], path/cat)
I am seeing the following error:
AttributeError Traceback (most recent call last)
<ipython-input-45-98d27e7a0247> in <module>
1 #hide
2 # for idx in cleaner.delete(): cleaner.fns[idx].unlink()
----> 3 for idx,cat in cleaner.change(): shutil.move(cleaner.fns[idx], f"{path}/{cat}")
/opt/conda/envs/fastai/lib/python3.7/shutil.py in move(src, dst, copy_function)
560 return
561
--> 562 real_dst = os.path.join(dst, _basename(src))
563 if os.path.exists(real_dst):
564 raise Error("Destination path '%s' already exists" % real_dst)
/opt/conda/envs/fastai/lib/python3.7/shutil.py in _basename(path)
524 # Thus we always get the last component of the path, even for directories.
525 sep = os.path.sep + (os.path.altsep or '')
--> 526 return os.path.basename(path.rstrip(sep))
527
528 def move(src, dst, copy_function=copy2):
AttributeError: 'PosixPath' object has no attribute 'rstrip'
evansd/whitenoise#192
It might be related to errbotio/errbot#1340
https://bugs.python.org/issue32689
It just pops up an authentication dialog box with no other information. I realize this external site is not controlled by the authors, but a more accessible site would make for a better link.
Typo: change "benn" to "been" on 7th paragraph under Neural Network History in Chapter 1
"Couple" is grammatically correct, but I believe "tuple" was intended because each mini-batch is a tuple of a couple of x's and y's.
From the nb:
How do we convert from a DataFrame
object to a DataLoaders
object? We generally suggest using the data block API for creating a DataLoaders
object, where possible, since it provides a good mix of flexibility and simplicity. Here we will show you the steps that we take to use the data blocks API to construct a DataLoaders
object in practice, using this dataset as an example.
As we have seen, PyTorch and fastai have two main classes for representing and accessing a training set or validation set:
Dataset
:: A collection that returns a tuple of your independent and dependent variable for a single item
DataLoader
:: An iterator that provides a stream of mini-batches, where each mini-batch is a couple of a batch of independent variables and a batch of dependent variables
I see some font inconsistencies should I PR those??
", we can just make it twice as bit, so we" includes a spelling error.
Note: I've been doing these through pull requests, but there are enough duplicate corrections piling up that I feel it's not a good place any longer. Should I move to issues? Alternatively, if contributors could create meaningful PR titles and comments it would be easier to see if the fixes have already been committed.
While running the 5th notebook, I got error from this line :
dls1 = dblock1.dataloaders([(Path.cwd()/'images'/'grizzly.jpg')]*100, bs=8)
[Errno 2] No such file or directory: '/content/images/grizzly.jpg'
I believe the word "large" or "wide" was accidentally omitted from the sentence:
"Neural networks are special because they are highly flexible, which means they can solve an unusually range of problems just by finding the right weights."
Feel Free to comment the benefits everyone can have by this.
We often talk to people who overestimate both the constraints, and the capabilities of deep learning. Both of these can be problems: underestimating the capabilities means that you might not even try things which could be very beneficial; underestimating the constraints might mean that you fail to consider and react to important issues.
Issue: : In the above text, seems like authors are trying to say- We often talk to people who underestimate, but actual text says overestimate. We might have to correct it if it's the case ..:)
I might be wrong here in inferring it incorrectly, but somehow I feel what text is trying to say and what actually is present is different
Hi, I think there is a typo in the section 'taking the log' in chapter 5.
we should have:
y = b**a to get a = log(y,b)
Best regards
Maybe add a bibtex entry to README.md for citations?
@book{gugger2020deep,
title={Deep Learning for Coders with Fastai and Pytorch: AI Applications Without a PhD},
author={Gugger, S. and Howard, J.},
isbn={9781492045526},
url={https://books.google.no/books?id=xd6LxgEACAAJ},
year={2020},
publisher={O'Reilly Media, Incorporated}
}
Typo: change "Trianing Time" to "Training Time"
before "Sidebar: This Book Was Written in Jupyter Notebooks" in Chapter 1
I am not sure what this sentence means:
Picking a subword vocab size represents a compromise: a larger vocab means more fewer tokens per sentence, which means faster training, less memory, and less state for the model to remember; but on the downside, it means larger embedding matrices, which require more data to learn.
dls = ImageDataLoaders.from_folder(path, num_workers=0)
learn = cnn_learner(dls, resnet18, pretrained=False,
loss_func=F.cross_entropy, metrics=accuracy)
learn.fit_one_cycle(1, 0.1)
RuntimeError Traceback (most recent call last)
in
2 learn = cnn_learner(dls, resnet18, pretrained=False,
3 loss_func=F.cross_entropy, metrics=accuracy)
----> 4 learn.fit_one_cycle(1, 0.1)
~\anaconda3\envs\fastai2\lib\site-packages\fastcore\utils.py in _f(*args, **kwargs)
429 init_args.update(log)
430 setattr(inst, 'init_args', init_args)
--> 431 return inst if to_return else f(*args, **kwargs)
432 return _f
433
~\anaconda3\envs\fastai2\lib\site-packages\fastai2\callback\schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
111 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
112 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 113 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
114
115 # Cell
~\anaconda3\envs\fastai2\lib\site-packages\fastcore\utils.py in _f(*args, **kwargs)
429 init_args.update(log)
430 setattr(inst, 'init_args', init_args)
--> 431 return inst if to_return else f(*args, **kwargs)
432 return _f
433
~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
201 try:
202 self.epoch=epoch; self('begin_epoch')
--> 203 self._do_epoch_train()
204 self._do_epoch_validate()
205 except CancelEpochException: self('after_cancel_epoch')
~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in _do_epoch_train(self)
173 try:
174 self.dl = self.dls.train; self('begin_train')
--> 175 self.all_batches()
176 except CancelTrainException: self('after_cancel_train')
177 finally: self('after_train')
~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in all_batches(self)
151 def all_batches(self):
152 self.n_iter = len(self.dl)
--> 153 for o in enumerate(self.dl): self.one_batch(*o)
154
155 def one_batch(self, i, b):
~\anaconda3\envs\fastai2\lib\site-packages\fastai2\learner.py in one_batch(self, i, b)
159 self.pred = self.model(*self.xb); self('after_pred')
160 if len(self.yb) == 0: return
--> 161 self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
162 if not self.training: return
163 self.loss.backward(); self('after_backward')
~\anaconda3\envs\fastai2\lib\site-packages\torch\nn\functional.py in cross_entropy(input, target, weight, size_average, ignore_index, reduce, reduction)
2420 if size_average is not None or reduce is not None:
2421 reduction = _Reduction.legacy_get_string(size_average, reduce)
-> 2422 return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
2423
2424
~\anaconda3\envs\fastai2\lib\site-packages\torch\nn\functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2216 .format(input.size(0), target.size(0)))
2217 if dim == 2:
-> 2218 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2219 elif dim == 4:
2220 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'target' in call to _thnn_nll_loss_forward
RuntimeError Traceback (most recent call last)
in
9
10 learn = cnn_learner(dls, resnet34, metrics=error_rate)
---> 11 learn.fine_tune(1)
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\callback\schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
155 "Fine tune with freeze
for freeze_epochs
then with unfreeze
from epochs
using discriminative LR"
156 self.freeze()
--> 157 self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
158 base_lr /= 2
159 self.unfreeze()
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\callback\schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
190 try:
191 self.epoch=epoch; self('begin_epoch')
--> 192 self._do_epoch_train()
193 self._do_epoch_validate()
194 except CancelEpochException: self('after_cancel_epoch')
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\learner.py in _do_epoch_train(self)
163 try:
164 self.dl = self.dls.train; self('begin_train')
--> 165 self.all_batches()
166 except CancelTrainException: self('after_cancel_train')
167 finally: self('after_train')
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\learner.py in all_batches(self)
141 def all_batches(self):
142 self.n_iter = len(self.dl)
--> 143 for o in enumerate(self.dl): self.one_batch(*o)
144
145 def one_batch(self, i, b):
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\fastai2\data\load.py in iter(self)
95 self.randomize()
96 self.before_iter()
---> 97 for b in _loadersself.fake_l.num_workers==0:
98 if self.device is not None: b = to_device(b, self.device)
99 yield self.after_batch(b)
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torch\utils\data\dataloader.py in init(self, loader)
717 # before it starts, and del tries to join but will get:
718 # AssertionError: can only join a started process.
--> 719 w.start()
720 self._index_queues.append(index_queue)
721 self._workers.append(w)
d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\process.py in start(self)
110 'daemonic processes are not allowed to have children'
111 _cleanup()
--> 112 self._popen = self._Popen(self)
113 self._sentinel = self._popen.sentinel
114 # Avoid a refcycle if the target function holds an indirect
d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\context.py in _Popen(process_obj)
221 @staticmethod
222 def _Popen(process_obj):
--> 223 return _default_context.get_context().Process._Popen(process_obj)
224
225 class DefaultContext(BaseContext):
d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\context.py in _Popen(process_obj)
320 def _Popen(process_obj):
321 from .popen_spawn_win32 import Popen
--> 322 return Popen(process_obj)
323
324 class SpawnContext(BaseContext):
d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\popen_spawn_win32.py in init(self, process_obj)
63 try:
64 reduction.dump(prep_data, to_child)
---> 65 reduction.dump(process_obj, to_child)
66 finally:
67 set_spawning_popen(None)
d:\ProgramData\Anaconda3\envs\pytorch\lib\multiprocessing\reduction.py in dump(obj, file, protocol)
58 def dump(obj, file, protocol=None):
59 '''Replacement for pickle.dump() using ForkingPickler.'''
---> 60 ForkingPickler(file, protocol).dump(obj)
61
62 #
d:\ProgramData\Anaconda3\envs\pytorch\lib\site-packages\torch\multiprocessing\reductions.py in reduce_tensor(tensor)
240 ref_counter_offset,
241 event_handle,
--> 242 event_sync_required) = storage.share_cuda()
243 tensor_offset = tensor.storage_offset()
244 shared_cache[handle] = StorageWeakRef(storage)
RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp:245
The confusion comes from the line:
"So to recap, at every epoch we shuffle our collection of documents to pick one document, and then we transform that one into a stream of tokens."
But from what I understood from reading the rest of the chapter is that we shuffle all the documents, combine all of them together, and then create a stream of tokens. The following is a line from previous on the book:
"First we concatenate all of the documents in our dataset into one big long string"
Can I translate it to Chinese in my GitHub repo as a study material? of course not for commercial use.
Hello,
In the paragraph Binary Cross-Entropy in the notebook 06_multicat.ipynb, it is written:
We don't actually need to tell fastai to use this loss function (although we can if we want) since it will be automatically chosen for us. fastai knows that the DataLoaders has multiple category labels, so it will use nn.BCEWithLogitsLoss by default.
In fact, the loss function by default in this case is not nn.BCEWithLogitsLoss
but BCEWithLogitsLossFlat().
In chapter 15, may be the first ResBlock (def forward(self, x): return x + self.convs(x)) should be fixed to return the relu of that and drop it from the 2nd ConvLayer ?
ie
class ResBlock(Module):
def __init__(self, ni, nf):
self.convs = nn.Sequential(
ConvLayer(ni,nf),
ConvLayer(nf,nf, norm_type=NormType.BatchZero))
def forward(self, x): return x + self.convs(x)
class ResBlock(Module):
def __init__(self, ni, nf):
self.convs = nn.Sequential(
ConvLayer(ni,nf),
ConvLayer(nf,nf, act_cls=None, norm_type=NormType.BatchZero))
def forward(self, x): return F.relu(x + self.convs(x))
Can we send PRs to this repo ?
Edit:
I have just seen
Also, you'll see that we've removed relu (act_cls=None) from the final convolution in convs and from idconv, and moved it to after we add the skip connection. The thinking behind this is that the whole ResNet block is like a layer, and you want your activation to be after your layer.
May be this bug can be close ?
When I run the first text classifier example
#id training2
#caption Training loop in a text application
from fastai2.text.all import *
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)
The network trains on my 1080Ti for a few minutes and the GPU is utilized. Midway through the second epoch it fails:
epoch | train_loss | valid_loss | accuracy | time
-- | -- | -- | -- | --
0 | 0.597115 | 0.438984 | 0.804160 | 01:34
epoch | train_loss | valid_loss | accuracy | time
-- | -- | -- | -- | --
0 | 0.394730 | 00:40
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-15-772d141f2ac2> in <module>
5 dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
6 learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
----> 7 learn.fine_tune(4, 1e-2)
~/fastai2/fastai2/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
157 self.fit_one_cycle(freeze_epochs, slice(base_lr*2), pct_start=0.99, **kwargs)
158 self.unfreeze()
--> 159 self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
160
161 # Cell
~/fastai2/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell
~/fastai2/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
174 try:
175 self.epoch=epoch; self('begin_epoch')
--> 176 self._do_epoch_train()
177 self._do_epoch_validate()
178 except CancelEpochException: self('after_cancel_epoch')
~/fastai2/fastai2/learner.py in _do_epoch_train(self)
147 try:
148 self.dl = self.dls.train; self('begin_train')
--> 149 self.all_batches()
150 except CancelTrainException: self('after_cancel_train')
151 finally: self('after_train')
~/fastai2/fastai2/learner.py in all_batches(self)
125 def all_batches(self):
126 self.n_iter = len(self.dl)
--> 127 for o in enumerate(self.dl): self.one_batch(*o)
128
129 def one_batch(self, i, b):
~/fastai2/fastai2/learner.py in one_batch(self, i, b)
135 self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
136 if not self.training: return
--> 137 self.loss.backward(); self('after_backward')
138 self.opt.step(); self('after_step')
139 self.opt.zero_grad()
~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
193 products. Defaults to ``False``.
194 """
--> 195 torch.autograd.backward(self, gradient, retain_graph, create_graph)
196
197 def register_hook(self, hook):
~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
---> 99 allow_unreachable=True) # allow_unreachable flag
100
101
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
If I try to rerun the cell it fails immediately with
epoch | train_loss | valid_loss | accuracy | time
-- | -- | -- | -- | --
0 | 0.975069 | 00:01
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-16-772d141f2ac2> in <module>
5 dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
6 learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
----> 7 learn.fine_tune(4, 1e-2)
~/fastai2/fastai2/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
155 "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
156 self.freeze()
--> 157 self.fit_one_cycle(freeze_epochs, slice(base_lr*2), pct_start=0.99, **kwargs)
158 self.unfreeze()
159 self.fit_one_cycle(epochs, slice(base_lr/lr_mult, base_lr), pct_start=pct_start, div=div, **kwargs)
~/fastai2/fastai2/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
110 scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
111 'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112 self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
113
114 # Cell
~/fastai2/fastai2/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
174 try:
175 self.epoch=epoch; self('begin_epoch')
--> 176 self._do_epoch_train()
177 self._do_epoch_validate()
178 except CancelEpochException: self('after_cancel_epoch')
~/fastai2/fastai2/learner.py in _do_epoch_train(self)
147 try:
148 self.dl = self.dls.train; self('begin_train')
--> 149 self.all_batches()
150 except CancelTrainException: self('after_cancel_train')
151 finally: self('after_train')
~/fastai2/fastai2/learner.py in all_batches(self)
125 def all_batches(self):
126 self.n_iter = len(self.dl)
--> 127 for o in enumerate(self.dl): self.one_batch(*o)
128
129 def one_batch(self, i, b):
~/fastai2/fastai2/learner.py in one_batch(self, i, b)
135 self.loss = self.loss_func(self.pred, *self.yb); self('after_loss')
136 if not self.training: return
--> 137 self.loss.backward(); self('after_backward')
138 self.opt.step(); self('after_step')
139 self.opt.zero_grad()
~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/tensor.py in backward(self, gradient, retain_graph, create_graph)
193 products. Defaults to ``False``.
194 """
--> 195 torch.autograd.backward(self, gradient, retain_graph, create_graph)
196
197 def register_hook(self, hook):
~/miniconda3/envs/data-science-stack-2.1.0/lib/python3.7/site-packages/torch/autograd/__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
97 Variable._execution_engine.run_backward(
98 tensors, grad_tensors, retain_graph, create_graph,
---> 99 allow_unreachable=True) # allow_unreachable flag
100
101
RuntimeError: cuda runtime error (400) : invalid resource handle at /opt/conda/conda-bld/pytorch_1579022060824/work/aten/src/THC/generic/THCTensorMath.cu:35
The strange thing is that I ran this cell yesterday with the same conda environment and everything worked.
I'm using CUDA 10.1.
→ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
This is the output of nvidia-smi during the training right before failure
Sat Feb 29 11:17:50 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59 Driver Version: 440.59 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:01:00.0 On | N/A |
| 0% 49C P2 198W / 280W | 11138MiB / 11177MiB | 96% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1222 G /usr/lib/xorg/Xorg 1690MiB |
| 0 1904 G /usr/bin/compiz 41MiB |
| 0 3764 G ...uest-channel-token=10439615552343491592 42MiB |
| 0 4391 C ...nvs/data-science-stack-2.1.0/bin/python 9351MiB |
+-----------------------------------------------------------------------------+
Thanks for making this prerelease available! Happy to provide more info if it helps.
The graphs are missing but the caption is there:
Now we look to see what would happen if we increased or decreased our parameter by a little bit — the adjustment. This is simply the slope at a particular point:
A graph showing the squared function with the slope at one point
We can change our weight by a little in the direction of the slope, calculate our loss and adjustment again, and repeat this a few times. Eventually, we will get to the lowest point on our curve:
An illustration of gradient descent
It looks like the image links are broken?
Line 1591:
Where it says "Let's do the same thing for the sevens, but let's put all the steps together at once to save some time:", there is nothing done differently with the sevens than with the threes. The steps are not any more "together" with the sevens example than with the example code of the threes.
Text in the notebook: Our weights need to be improved. To do this, we take a few data items (such as images) that and feed them to our model
Issue : In the italic- highlighted text above, seems like word that is not fitting in context of sentence. Two probable cases, may be something is missed to be written after that or that is actually a typo
It tells us not just whether a movie is of a kind that people tend not to enjoy watching, but that people type like watching it even if it is of a kind that they would otherwise enjoy!
Should be
It tells us not just whether a movie is of a kind that people tend not to enjoy watching, but that people type do not like watching it even if it is of a kind that they would otherwise enjoy!
I understand that the license restricts the modification of the notebooks, but if you ever consider translating it to Brazilian Portuguese, I can help! Thanks!
I'm running the latest version in Paperspace, and I see the following comment in the book;
Full and Stripped Notebooks: There are two folders containing different versions of the notebooks. The full folder contains the exact notebooks used to create the book you're reading now, with all the prose and outputs. The stripped version has the same headings and code cells, but all outputs and prose have been removed.
Unfortunately, not only is there no course-v4/nbs/full
or course-v4/nbs/stripped
folders in the repo - which makes sense because the course is changing all the time, maybe they don't exist anymore - but it seems that even inside of the fastbook/clean
folder, the outputs are still visible.
This is an image of what it looks like to open course-v4/nbs/14_resnet
;
Here is what it looks like to open fastbook/clean/14_resnet
Here is fastbook/14_resnet
Note, each example has the outputs pre-displayed below the cell. (The four images)
I may be in the minority here, but when the notebooks show the output, it really take the magic of seeing everything "going" away. Devil's advocate here, obviously doesn't apply to people very focused on learning the content, but why run the model if I already see 16 cat images with probabilities below them.
Essentially, I don't want to be "spoiled", I want to actually wait the 20 minutes to train some big model and get rewarded the hard way. The only thing better than that I would say is a function definition with some comments;
# Cell
def train_new_model():
batch_size = 12
# 1. Make sure you load in the learner!
# 2. Make sure you've got the data ready from "some_global"!
# 3. Begin training and print the outputs - hint: docs.fastai.com/logging/print_stuff
I want hardmode, and since there's nothing like the above right now, where can I find the stripped notebooks?
There are also "agglutinative languages", like Polish, which can add many morphemes together to create very long "words" which include a lot of separate pieces of information.
Polish does not work this way. Perhaps you meant Turkish.
When I run the following code(from ch1 intro), I get the error message.
from fastai2.text.all import *
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)
AttributeError: Can't pickle local object 'parallel_gen..f'
Would you please let me know how to fix this problem? Thanks in advance.
when i try to run the code in the first module, i get this error, i have already installed fastai but when i try to install fastai2 whit pip i get ERROR: No matching distribution found for torch>=1.3.0 (from fastai2)
how to make them work in colab because it uses the azure module?
Or I can remove the azure module?
Thank you.
Totally likely that this is my incompetence. But it seems that the code in the Intro notebook does not work out of the box on Windows. This would probably at least require a warning for (the few) Windows users among the book buyers.
The first little classifier fails with:
This is the most related info I could find:
https://pytorch.org/docs/stable/notes/windows.html#cuda-ipc-operations
However, the Tuple class does not exist in fastaiv2 or some comments said it is deleted.
To start working with the code right away and share them. I've added it in #161.
Line 2225 says:
Next in mnist_distance
we see abs()
. You might be able to guess now what this does when applied to a tensor. It applies the method to each individual element in the tensor, and returns a tensor of the results (that is, it applies the method "elementwise"). So in this case, we'll get back 1,010 absolute values.
The shape of the returned tensor after abs()
is actually 1,010 by 28 by 28, so there would be 1010*28*28 (791,840) absolute values, not 1,010.
This could be fixed by changing it to say "1,010 matrices of absolute values"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.