seutao / humpback-whale-identification Goto Github PK
View Code? Open in Web Editor NEWHumpback Whale Identification
Humpback Whale Identification
Is there some way already builtin to free memory in main.py
? I have tried the following torch.cuda.empty_cache()
, added it here-and-there in main.py
, to free memory but still getting the memory error.
change lr: 0.0001
change hard_ratio: 0.01
NW ratio!!!!!!! : 0.25
NW id num!!!!!! : 8
66656
Traceback (most recent call last):
File "main.py", line 486, in <module>
main(config)
File "main.py", line 450, in main
run_train(config)
File "main.py", line 256, in run_train
logit, logit_softmax, feas = net.forward(input, label = truth_, is_infer = True)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/hhh/Kaggle/Humpback-Whale-Identification-Challenge-2019_2nd_palce_solution/net/model_resnet101.py", line 80, in forward
x = self.basemodel.layer3(x)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torchvision/models/resnet.py", line 87, in forward
out = self.conv3(out)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 15.89 GiB total capacity; 14.89 GiB already allocated; 197.50 MiB free; 992.00 KiB cached)
Thanks your code!
I see val*.txt in the image_lst ` folder.
how to get it ?
I cost about 100 hour to train only in a model(resnet) in a single GPU(V100),Am I wrong?How long does the training take?
I have not seen your code to set the validation set?Could you answer me if you are free?
The code points to the folowing dirs but they are not found. What would be the best way to load the train/test dataset to try the code?
$ grep -r shentao *
bbox_model/kpda_parser.py:TRN_IMGS_DIR = '/data1/shentao/DATA/competitions/whale/train/'
bbox_model/kpda_parser.py:TST_IMGS_DIR = '/data1/shentao/DATA/competitions/whale/test/'
process/data_helper.py:PJ_DIR = r'/data1/shentao/Projects/Kaggle_Whale2019_2nd_place_solution'
process/data_helper.py:train_df = pd.read_csv('/data1/shentao/DATA/competitions/whale/train.csv')
process/data_helper.py:TRN_IMGS_DIR = '/data1/shentao/DATA/competitions/whale/train/'
process/data_helper.py:TST_IMGS_DIR = '/data1/shentao/DATA/competitions/whale/test/'
Binary file process/__pycache__/data_helper.cpython-36.pyc matches
Why won't let model learn bias by itself via gradient descent, or it is somehow related to ensembling?
I haven't encountered such design yet and want to know logic behind this concept, can you explain?
Thanks for sharing your code. It is terrific.
The following code is showed in the function of run_train:
logit, logit_softmax, feas = net.forward(input, label = truth_, is_infer = True)
why in the trainning process the is_infer is set to True? If is_infer is set to True, the calculation of arcface loss is werid
Thanks for your code again really appreciate it .
I got the main.py finally running but getting the error in its execution such that $ python main.py
fires the errors below.
(binary_head): BinaryHead(
(fc): Sequential(
(0): Linear(in_features=2048, out_features=10008, bias=True)
)
)
)
)
change lr: 0.0001
change hard_ratio: 0.01
NW ratio!!!!!!! : 0.25
NW id num!!!!!! : 8
66656
Traceback (most recent call last):
File "main.py", line 486, in <module>
main(config)
File "main.py", line 450, in main
run_train(config)
File "main.py", line 239, in run_train
for input, truth_ , truth_NW_binary in train_loader:
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
return self._process_next_batch(batch)
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
AttributeError: Traceback (most recent call last):
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/hhh/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/hhh/Kaggle/Humpback-Whale-Identification-Challenge-2019_2nd_palce_solution/process/data.py", line 88, in __getitem__
image = get_cropped_img(image, self.bbox_dict[os.path.split(image_path)[1]], is_mask=False)
File "/home/hhh/Kaggle/Humpback-Whale-Identification-Challenge-2019_2nd_palce_solution/process/augmentation.py", line 113, in get_cropped_img
size_x = image.shape[1]
AttributeError: 'NoneType' object has no attribute 'shape'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.