The hlb-cifar10 from tysam-code

hlb-cifar10's Issues

Out of memory with 5GB VRAM

--------------------------------------------------------------------------------------------------------
|  epoch  |  train_loss  |  val_loss  |  train_acc  |  val_acc  |  ema_val_acc  |  total_time_seconds  |
--------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "main.py", line 621, in <module>
    main()
  File "main.py", line 540, in main
    for epoch_step, (inputs, targets) in enumerate(get_batches(data, key='train', batchsize=batchsize)):
  File "main.py", line 428, in get_batches
    images = batch_crop(data_dict[key]['images'], 32) # TODO: hardcoded image size for now?
  File "main.py", line 390, in batch_crop
    cropped_batch = torch.masked_select(inputs, crop_mask_batch).view(inputs.shape[0], inputs.shape[1], crop_size, crop_size)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 4.58 GiB (GPU 0; 5.81 GiB total capacity; 835.23 MiB already allocated; 2.35 GiB free; 1.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have not looked at the code too closely, but it might be possible to shave off a few MB when preparing batches.

Thank you for this comment by the way.

hlb-CIFAR10/main.py

Line 523 in 132829f

 ## has a timing feature too, but there's no synchronizes so I suspect the times reported are much faster than they may be in actuality 

I totally forgot to add torch.cuda.synchronize(), but it is finally fixed https://github.com/99991/cifar10-fast-simple Fortunately, it did not make much of a difference. I now get 14.3 seconds with my code vs 15.7 seconds with your code. Perhaps there is something during batch preparation which makes a difference?