radekosmulski / whale Goto Github PK

Jupyter Notebook 99.97% Python 0.03%

whale's Introduction

Humpback Whale Identification Competition Starter Pack

The code in this repo is all you need to make a first submission to the Humpback Whale Identification Competition. It uses the FastAi library release 1.0.36.post1 for anything up to point 7 in the Navigating through the repository list below (this is important - you are likely to encounter an error if you use any other version of the library). Subsequently I switch to 1.0.39.

For additional information please refer to discussion threads on Kaggle forums: classification, feature learning, detection.

Some people reported issues with running the first_submission notebook. If you encounter the issue, you should be okay to skip to the subsequent notebooks. The one that scores 0.760 on the LB is only_known_train.ipynb.

Making first submission

Install the fastai library, specifically version 1.0.36.post1. The easiest way to do it is to follow the developer install as outlined in the README of the fastai repository. Once you perform the installation, navigate to the fastai directory and execute git checkout 1.0.36.post1. You can verify that this worked by executing the following inside jupyter notebook or a Python REPL:

import fastai
fastai.__version__

Clone this repository. cd into data. Download competition data by running kaggle competitions download -c humpback-whale-identification. You might need to agree to competition rules on competition website if you get a 403.
Create the train directory and extract files via running mkdir train && unzip train.zip -d train
Do the same for test: mkdir test && unzip test.zip -d test
Open first_submission.ipynb in jupyter notebook and run all cells.

Navigating through the repository

Here is the order in which I worked on the notebooks:

first_submission - getting all the basics in place
new_whale_detector - binary classifer known_whale / new_whale
oversample - addressing class imbalance
only_known_research - how to modify the architecture and what hyperparams to use
only_known_train - training on full dataset
resize - resize the images before training to free up CPU
siamese network - a fully working prototype of a siamese network
!!! Important !!! - to make use of some of the new functionality available in fast.ai at this point I switch to 1.0.39.
fluke detection - train a model to draw bounding boxes surrounding flukes
!!! Important !!! - here I switch to fastai master to incorporate a bug fix, will annotate with version once a new release comes out
fluke detection redux - better results, less code, works with current fastai master
extract bboxes - predicted bounding box extraction in images of specified size
classification and metric learning - training the for predicting whale ids, places in top 7% of the competition

whale's People

Contributors

Stargazers

Watchers

whale's Issues

When did you save "'small_lr'

Thank you so much for your work. I followed your instruction and ran every notebooks and got a score of 0.754.
When I ran "only_known_research.ipynb", I couldn't find a file called "small_lr.pth". I just comment out learn.load('small_lr') and continued to train until the end without any problems.

error in the data.show_batch

i got error in the below code
data.show_batch(rows=3)
the error is
RuntimeError : DataLoader worker (pid 38) is killed by signal: Bus error.

Thanks in advance for your help!

error while training

i got error in the below code
learn.fit_one_cycle(2) after finish first cycle
the error is
RuntimeError: Could not infer dtype of NoneType

Thanks in advance for your help!

contrastive loss

I tried changing the loss function to contrastive loss. this is the contrastive loss code:
class ContrastiveLoss(torch.nn.Module):
"""
Contrastive loss function.
Based on: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
"""

def __init__(self, margin=2.0):
    super(ContrastiveLoss, self).__init__()
    self.margin = margin

def forward(self, output,label):
    euclidean_distance = F.pairwise_distance(output[0], output[1])
    loss_contrastive = torch.mean((1-label) * torch.pow(euclidean_distance, 2) +
                                  (label) * torch.pow(torch.clamp(self.margin - euclidean_distance, min=0.0), 2))


    return loss_contrastive

but i got this error when doing learn.lr_find():
Dimension out of range (expected to be in range of [-1, 0], but got 1)

Using models outside fastai

Firstly thanks for your efforts. I wanted to ask how to use models defined outside create_cnn. I've asked this question on forum including others as well, but it hasn't been addressed well. I have a generic Pytorch model (inherited from nn.Module) similar to what torchvision provides (se_resnext101 etc.). How do I use with create_cnn. I looked at the docs for customizing model, but create_body and create_head aren't much useful. It's very cumbersome to split the models with head and body as the lib expects. Is there any efficient way you may know of ?

a error in siamese_prototype

fastai version '1.0.36.post1'
In fact, 1.0.39 doesn't work either.

in Predict part

data = (
    ImageItemList
        .from_df(df, 'data/train_crop', cols=['Image'])
#         .no_split()
        .split_by_valid_func(lambda path: path2fn(path) in {'69823499d.jpg'}) # in newer version of the fastai library there is .no_split that could be used here
        .label_from_func(lambda path: fn2label[path2fn(path)], classes=classes)
        .add_test(ImageItemList.from_folder('data/test_crop'))
        .transform(tfms=None, size=SZ, resize_method=ResizeMethod.SQUISH)
        .databunch(bs=BS, num_workers=NUM_WORKERS, path='data')
        .normalize(imagenet_stats)
)

creating data appear:
UserWarning: It's not possible to collate samples of your dataset together in a batch.

and it showed:
Shapes of the inputs/targets:
[[torch.Size([3, 349, 998]), torch.Size([3, 304, 797]), torch.Size([3, 258, 986]), torch.Size([3, 298, 970]), torch.Size([3, 176, 589]), torch.Size([3, 368, 1042]), torch.Size([3, 167, 651]), torch.Size([3, 314, 1024]), torch.Size([3, 238, 1044]), torch.Size([3, 506, 1004]), torch.Size([3, 340, 921]), torch.Size([3, 290, 966]), torch.Size([3, 270, 1010]), torch.Size([3, 329, 529]), torch.Size([3, 264, 941]), torch.Size([3, 411, 848]), torch.Size([3, 417, 1018]), torch.Size([3, 180, 891]), torch.Size([3, 274, 926]), torch.Size([3, 406, 995]), torch.Size([3, 339, 1009]), torch.Size([3, 238, 1031]), torch.Size([3, 221, 805]), torch.Size([3, 200, 573]), torch.Size([3, 299, 1018]), torch.Size([3, 319, 1042]), torch.Size([3, 408, 1014]), torch.Size([3, 184, 957]), torch.Size([3, 375, 1009]), torch.Size([3, 410, 1024]), torch.Size([3, 472, 911]), torch.Size([3, 226, 695])], [(), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), (), ()]]

and next code

test_feats = []
learn.model.eval()
for ims, _ in data.test_dl:
    test_feats.append(learn.model.process_features(learn.model.cnn(ims)).detach().cpu())

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 303 and 370 in dimension 2 at ...

And like I'm the only one who has this problem? Maybe I used the cropped image? I don't think so.
And the previous two attempts to create data worked fine.Do I need to convert all images to RGB?

MAP@5

Can something give an intuitive explanation of what this evaluation metric actually is and what it signifies?

IndexError: index 0 is out of bounds for axis 0 with size 0

Hello,

First, thank you so VERY MUCH for uploading your code to GitHub. This has been very helpful for me.

I am getting an error when running first_submission. I get this error, shown below. I confirmed I am running correct fastai version.

I'm running:
nvidia driver ver 415.27
cuda version 10.0.
anaconda ver 1.7.2

Not sure if these conflict with what you did.

Any ideas?

IndexError: Traceback (most recent call last):
File "/home/devon/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/devon/anaconda3/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 99, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/devon/fastai/fastai/data_block.py", line 492, in getitem
if self.item is None: x,y = self.x[idxs],self.y[idxs]
File "/home/devon/fastai/fastai/data_block.py", line 92, in getitem
if isinstance(try_int(idxs), int): return self.get(idxs)
File "/home/devon/fastai/fastai/vision/data.py", line 265, in get
fn = super().get(i)
File "/home/devon/fastai/fastai/data_block.py", line 58, in get
return self.items[i]

Error while creating data

Hi I am getting below error while running below code

code:

data = (
    ImageItemList
        .from_folder('data/whale/input/train')
        .random_split_by_pct(seed=SEED)
        .label_from_func(lambda path: fn2label[path.name])
        .add_test(ImageItemList.from_folder('data/whale/input/test'))
        .transform(get_transforms(do_flip=False, max_zoom=1, max_warp=0, max_rotate=2), size=SZ, resize_method=ResizeMethod.SQUISH)
        .databunch(bs=BS, num_workers=NUM_WORKERS, path='data/whale/input')
)

error:

KeyError Traceback (most recent call last)
~\fastai1\fastai\courses\dl1\fastai\data_block.py in process_one(self, item)
277 def process_one(self,item):
--> 278 try: return self.c2i[item] if item is not None else None
279 except:

KeyError: 'w_d8a08f8'

During handling of the above exception, another exception occurred:

Exception Traceback (most recent call last)
in
3 .from_folder('data/whale/input/train')
4 .random_split_by_pct(seed=SEED)
----> 5 .label_from_func(lambda path: fn2label[path.name])
6 .add_test(ImageItemList.from_folder('data/whale/input/test'))
7 .transform(get_transforms(do_flip=False, max_zoom=1, max_warp=0, max_rotate=2), size=SZ, resize_method=ResizeMethod.SQUISH)

~\fastai1\fastai\courses\dl1\fastai\data_block.py in _inner(*args, **kwargs)
391 self.valid = fv(*args, **kwargs)
392 self.class = LabelLists
--> 393 self.process()
394 return self
395 return _inner

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self)
438 "Process the inner datasets."
439 xp,yp = self.get_processors()
--> 440 for i,ds in enumerate(self.lists): ds.process(xp, yp, filter_missing_y=i==0)
441 return self
442

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, xp, yp, filter_missing_y)
563 def process(self, xp=None, yp=None, filter_missing_y:bool=False):
564 "Launch the processing on self.x and self.y with xp and yp."
--> 565 self.y.process(yp)
566 if filter_missing_y and (getattr(self.x, 'filter_missing_y', None)):
567 filt = array([o is None for o in self.y])

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, processor)
66 if processor is not None: self.processor = processor
67 self.processor = listify(self.processor)
---> 68 for p in self.processor: p.process(self)
69 return self
70

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, ds)
284 ds.classes = self.classes
285 ds.c2i = self.c2i
--> 286 super().process(ds)
287
288 def getstate(self): return {'classes':self.classes}

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process(self, ds)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
---> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():

~\fastai1\fastai\courses\dl1\fastai\data_block.py in (.0)
36 def init(self, ds:Collection=None): self.ref_ds = ds
37 def process_one(self, item:Any): return item
---> 38 def process(self, ds:Collection): ds.items = array([self.process_one(item) for item in ds.items])
39
40 class ItemList():

~\fastai1\fastai\courses\dl1\fastai\data_block.py in process_one(self, item)
278 try: return self.c2i[item] if item is not None else None
279 except:
--> 280 raise Exception("Your validation data contains a label that isn't present in the training set, please fix your data.")
281
282 def process(self, ds):

Exception: Your validation data contains a label that isn't present in the training set, please fix your data.

Thanks in advance for your help!

use of new_whale_idx in siamese_network_prototype.ipynb:

Ubuntu18 crashes and reboots while training on only_known_train

Quick question - when running your only_known_train notebook in jupyter, my entire O/S crashes and reboots after saving res50-full-train-stage-2 but before the 3rd. I have the 1080 TI with 11GB, and as I was monitoring it before it crashed, it hadn't fully peaked on memory usage yet, so I'm a bit bewildered. I haven't checked server logs yet. Do you suggest I scale back on one of the following:

1.) Image size
2.) batch size

or both? Which should I start reducing first, in your opinion?

ValueError

ValueError: Expected y_max for bbox [0.009947089947089958, 0.14722363655921705, 0.28901822457378046, 1.0871899315142182, 0] to be in the range [0.0, 1.0], got 1.0871899315142182.

I didn't change ant of the but I really dont have any idea how to fix this bbox coordinates error.

unable to complete the training with contrastive loss only

Hi Radek,

I finally managed to train my network according to the first part of the "classification_and_metric_learning" notebook.
I managed to train with SZ=224 and then proceeded with SZ=448 with the custom loss function.

But so far, I've been unable to do the last part of the training successfully. The part where you mention: "For the last segment of the training, I used contrastive loss only." Indeed my network seems to not learn anything anymore.
Here is my code:

[...]
learn = Learner(create_fake_data(), CustomModel(), loss_func=contr_loss, metrics=[accuracy_mod, map5_mod])

learn.model_dir = Path(os.path.join(ROOT_DIR, 'data/models/'))
learn = learn.clip_grad()
learn.split((learn.model.cnn[6], learn.model.head))
learn.freeze()

learn.load('s2.3-var1a-oywrks063e'); #loaded from the training with SZ=448


dists = create_similarity_dict(learn.model, basic_dataloader)
learn.data = create_data(SZ, dists, BS)  

print('training started...', flush=True)
learn.fit_one_cycle(20, 1e-3)

But neither the validation loss nor the training loss is decreasing... The network does not learn anymore. 😕

Would you have any advice? Did you use a k smaller than 20, eg learn.data = create_data(SZ, dists, BS, k=5)? Do you remember the details of how you trained this last segment (contrastive loss only)?

FYI: parallel() is ready, it could make resize notebook simpler

Hi,

Thank you for sharing notebooks. These notebooks help us so much, and we can spend more time for facing problem itself.

I found one small thing that might be useful for you. Please simply close this issue when you read, this is just for sharing with you.

This is about notebook resize.ipynb.

Fast.ai has a function parallel (please find in fastai/core.py) which wraps ThreadPoolExecutor.

Then if you add extra parameter with resize_img,

def resize_img(path, index):
    PIL.Image.open(path).resize((SZ,SZ), resample=PIL.Image.BICUBIC).save((PATH/f'{path.parent.name}-{SZ}'/path.name))

This will make it a little bit simpler:

parallel(resize_img, files)

Just FYI.

create_submission function?

Can you provide more information on the create_submission() function you are calling? How are you processing the output tensors?

cannot find your zen_dataset module

Hi Radek,
In the extract_bboxes.ipynb and classification_and_metric_learning.ipynb notebooks you use the zen_dataset library, but it seems that you have deleted it.

Could I get a copy so that I can run these notebooks?

TypeError: image2tensor() got an unexpected keyword argument 'augment_fn' in classification_and_metric_learning notebook

Hi, when I run your code block #18 (shown below) from classification_and_metric_learning.ipynb, I get an error message shown below this code block:

%%time
SZ = 224
NUM_WORKERS = 12
BS = 32

basic_dataloader = create_basic_dataloader(SZ, NUM_WORKERS, BS)
dists = create_similarity_dict(learn.model, basic_dataloader)
data = create_data(SZ, dists, BS)

Error message:
TypeError: Traceback (most recent call last):
File "/home/devon/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/devon/anaconda3/envs/fastai/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/devon/fastai/kaggle/whale/zen_dataset/zen_dataset/dataset.py", line 10, in getitem
return self.reader(item), self.labeler(item)
File "", line 8, in call
tensors = [image2tensor(image, augment_fn = self.augment_fn) for image in images]
File "", line 8, in
tensors = [image2tensor(image, augment_fn = self.augment_fn) for image in images]
TypeError: image2tensor() got an unexpected keyword argument 'augment_fn'

Here's my env:
=== Software ===
python : 3.6.8
fastai : 1.0.46
fastprogress : 0.1.20
torch : 1.0.1.post2
nvidia driver : 410.104
torch cuda : 10.0.130 / is available
torch cudnn : 7402 / is enabled

=== Hardware ===
nvidia gpus : 1
torch devices : 1

gpu0 : 11175MB | GeForce GTX 1080 Ti

=== Environment ===
platform : Linux-4.15.0-46-generic-x86_64-with-debian-buster-sid
distro : Ubuntu 18.04 bionic
conda env : fastai
python : /home/devon/anaconda3/envs/fastai/bin/python
sys.path :
/home/devon/anaconda3/envs/fastai/lib/python36.zip
/home/devon/anaconda3/envs/fastai/lib/python3.6
/home/devon/anaconda3/envs/fastai/lib/python3.6/lib-dynload
/home/devon/.local/lib/python3.6/site-packages
/home/devon/anaconda3/envs/fastai/lib/python3.6/site-packages
/home/devon/fastai/kaggle/whale/zen_dataset
/home/devon/fastai

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2243 G /usr/lib/xorg/Xorg 46MiB |
| 0 2313 G /usr/bin/gnome-shell 49MiB |
| 0 6726 G /usr/lib/xorg/Xorg 475MiB |
| 0 6881 G /usr/bin/gnome-shell 229MiB |
| 0 7580 G ...quest-channel-token=2872686570769991275 151MiB |
| 0 12588 G /usr/bin/python3 31MiB |
| 0 31856 C .../devon/anaconda3/envs/fastai/bin/python 627MiB |
+-----------------------------------------------------------------------------+


I know it's an environment issue and NOT an issue with your code.  Was hoping to get some advice on this.  Thanks so much

radekosmulski / whale Goto Github PK

whale's Introduction

Humpback Whale Identification Competition Starter Pack

Making first submission

Navigating through the repository

whale's People

Contributors

Stargazers

Watchers

Forkers

whale's Issues

error:

Recommend Projects

Recommend Topics

Recommend Org