Git Product home page Git Product logo

aladdinpersson / machine-learning-collection Goto Github PK

View Code? Open in Web Editor NEW
7.0K 109.0 2.6K 120.71 MB

A resource for learning about Machine learning & Deep Learning

Home Page: https://www.youtube.com/c/AladdinPersson

License: MIT License

Python 60.60% Shell 0.22% Jupyter Notebook 39.19%
pytorch pytorch-implementation pytorch-tutorial pytorch-gan pytorch-examples tensorflow2 tensorflow-tutorials tensorflow-examples machine-learning machine-learning-algorithms

machine-learning-collection's People

Contributors

aladdinpersson avatar ankandrew avatar darveenvijayan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machine-learning-collection's Issues

RuntimeError: cannot perform reduction function argmax on a tensor with no elements because the operation does not have an identity

Thanks for this amazing tutorial and repo. I had been using this code to perform tumor detection on my data set. I have been getting the error above as mentioned in the title. Did anyone else get this error?

Just to mention, I have around 45000 2D images but only around 4% of them have tumors in them, and the rest 96% do not contain any tumors. So, in order to annotate these images with no tumors, I have created empty .txt files with the same name as the image. Is this correct? Or should I be using [0, 0, 0, 0, 0] in the annotation files for the images with no tumors?

Thanks in advance.

To learn positional encoding

Hi,nice work!
I have a question about transformer~

I wonder how to change the nn.embedding(including inputs and positional encoding.) using learning method?

thanks!

Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors'

Hi,

I rewrote the code along with watching your tutorial. When I run the training procedure, I get the following error:

Traceback (most recent call last):
  File "/home/niko/programs/pycharm-community-2019.2.1/helpers/pydev/pydevd.py", line 1415, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/niko/programs/pycharm-community-2019.2.1/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/train_original.py", line 147, in <module>
    main()
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/train_original.py", line 126, in main
    train_loader, model, iou_threshold=0.5, threshold=0.4
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/utils.py", line 255, in get_bboxes
    true_bboxes = cellboxes_to_boxes(labels)
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/utils.py", line 322, in cellboxes_to_boxes
    converted_pred = convert_cellboxes(out).reshape(out.shape[0], S * S, -1)
  File "/home/niko/workspace/pytorch-and-lightning-tutorials/yolo/utils.py", line 315, in convert_cellboxes
    (predicted_class, best_confidence, converted_bboxes), dim=-1
RuntimeError: Expected object of scalar type Long but got scalar type Float for sequence element 1 in sequence argument at position #1 'tensors'

Then I tried to copy the exact same code from your train.py and dataset.py file but the error still persisted. I guess getitem in dataset.py should return long instead of float types for bounding boxes. Do you know what might be the cause of the error above?

Small error in pix2pix for anime dataset

Hey, wanted to let you know that in your pix2pix implementation, when using the anime set, the target image is actually on the left side, instead of the right. So you'll have to switch Column 22/23 in dataset.py. Might wanna add that to the readme or sth. Otherwise really nice implementation and thanks for providing this!

custom image testing

Hi,
I am using method 1 from tutorial 18 for subfolders image dataset for using custom dataset.
My code is running perfectly, but I want to know how can I test my own image(not included in dataset) on the model ?

Output becomes zero after optimizer.step() yolo-v1 model

I encountered this error while i was trying train the model on my local gpu

Here : Machine-Learning-Collection/ML/Pytorch/object_detection/YOLO/

This is the test script that i have used to test the yolo-v1 model

if __name__ == '__main__':

    csv_file_path = 'PascalVOC_YOLO/100examples.csv'
    img_dir = 'PascalVOC_YOLO/images'
    label_path = 'PascalVOC_YOLO/labels'

    learning_rate = 1e-10
    num_workers = 2
    batch_size = 2
    weight_decay = 1e-4

    sample_dataset = VOCDataset( csv_file_path , img_dir , label_path, transform=transform)

    sample_loader = DataLoader(
        dataset=sample_dataset,
        batch_size=2,
        num_workers=2,
        pin_memory=True,
        shuffle=True,
        drop_last=True,
    )

    device = 'cuda'if torch.cuda.is_available() else 'cpu'

    model = Yolov1(split_size=7, num_boxes=2, num_classes=20).to(device).half()

    optimizer = optim.Adam( model.parameters() , lr=learning_rate , weight_decay=weight_decay)

    loss_func = YoloLoss().to(device)

    for _ in  range(2):

        print('iter : ',_,'\n')

        x , y = next( iter(sample_loader) )

        x , y = Variable(x).to(device).half() , Variable(y).to(device).half()

        # print( 'infinite : ' , torch.isfinite(x))

        # print('x : ',x)

        out = model(x)

        print('out : ',out , '\n')

        loss = loss_func(out,y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print( 'loss : ',loss , '\n')
        print( 'loss : data ', loss.data , '\n')
        print(' loss : grad ',loss.grad , '\n')

        for name, param in model.named_parameters():
            print(name, torch.isfinite(param.grad).all() , torch.max(abs(param.grad)) )

        print('\n')

Note : i am using half() because of the cuda error => RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

while running the script i was getting this output below

iter :  0

out :  tensor([[-0.1432,  0.0819,  0.0342,  ..., -0.0377, -0.0745,  0.1312],
        [ 0.1110, -0.0650,  0.2410,  ..., -0.0765,  0.3328,  0.1908]],
       device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>)

loss :  tensor(1., device='cuda:0', dtype=torch.float16, grad_fn=<ClampBackward>)

loss : data  tensor(1., device='cuda:0', dtype=torch.float16)

test.py:94: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  print(' loss : grad ',loss.grad , '\n')
 loss : grad  None

darknet.0.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)


iter :  1

out :  tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       dtype=torch.float16, grad_fn=<AddmmBackward>)

[W python_anomaly_mode.cpp:104] Warning: Error detected in MseLossBackward. Traceback of forward call that caused the error:
  File "test.py", line 86, in <module>
    loss = loss_func(out,y)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/e/workspace/@training/@datasets/cnns/yolo/yolo-v1-pytorch/loss.py", line 120, in forward
    torch.flatten(exists_box * target[..., :20], end_dim=-2,),
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 528, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/functional.py", line 2929, in mse_loss
    return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
 (function _print_stack)
Traceback (most recent call last):
  File "test.py", line 89, in <module>
    loss.backward()
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: Function 'MseLossBackward' returned nan values in its 0th output.
(dev) buckaroo@hansolo:/mnt/e/workspace/@training/@datasets/cnns/yolo/yolo-v1-pytorch$ python3 test.py
iter :  0

out :  tensor([[-0.1044, -0.3135, -0.4897,  ..., -0.1079, -0.0055, -0.0380],
        [ 0.1190, -0.3154, -0.0910,  ..., -0.0995, -0.1595, -0.0576]],
       device='cuda:0', dtype=torch.float16, grad_fn=<AddmmBackward>)

loss :  tensor(1.0010, device='cuda:0', dtype=torch.float16, grad_fn=<AddBackward0>)

loss : data  tensor(1.0010, device='cuda:0', dtype=torch.float16)

test.py:94: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  print(' loss : grad ',loss.grad , '\n')
 loss : grad  None

darknet.0.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.0.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.2.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.4.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.5.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.6.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.7.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.9.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.10.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.11.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.12.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.13.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.14.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.15.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.16.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.17.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.18.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.20.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.21.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.22.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.23.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.24.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.25.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.26.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.conv.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
darknet.27.batchnorm.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.1.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.weight tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)
fcs.4.bias tensor(True, device='cuda:0') tensor(0., device='cuda:0', dtype=torch.float16)

iter :  1

out :  tensor([[nan, nan, nan,  ..., nan, nan, nan],
        [nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',
       dtype=torch.float16, grad_fn=<AddmmBackward>)

Observations

  • As you can see that for the first time , the model out is a valid tensor with values ( i.e before optimizer.step() )
  • when the iteration 1 begins ( i.e after optimizer.step() ) output becomes nan

Debug method : 0

  • after setting this torch.autograd.set_detect_anomaly(True) globally
  • i found this result below
[W python_anomaly_mode.cpp:104] Warning: Error detected in MseLossBackward. Traceback of forward call that caused the error:
  File "test.py", line 86, in <module>
    loss = loss_func(out,y)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/mnt/e/workspace/@training/@datasets/cnns/yolo/yolo-v1-pytorch/loss.py", line 120, in forward
    torch.flatten(exists_box * target[..., :20], end_dim=-2,),
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 528, in forward
    return F.mse_loss(input, target, reduction=self.reduction)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/nn/functional.py", line 2929, in mse_loss
    return torch._C._nn.mse_loss(expanded_input, expanded_target, _Reduction.get_enum(reduction))
 (function _print_stack)
Traceback (most recent call last):
  File "test.py", line 89, in <module>
    loss.backward()
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/buckaroo/miniconda3/envs/dev/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: Function 'MseLossBackward' returned nan values in its 0th output.

so i have tried

  • clamping the loss tensors torch.clamp(value, min=0.0 , max=1.0) in loss.py
  • adding epsilon (1e-6) after torch.sqrt() like torch.sqrt(val+epsilon) in loss.py

But this didnot fix my issue.

Reference

Getting NaN values in backward pass

Output of Model is nan every time

Nan Loss coming after some time

Getting Nan after first iteration with custom loss

Weights become NaN values after first batch step

Why nan after backward pass?

NaN values popping up during loss.backward()

Debugging neural networks

So kindly help me debug this issue , thanks in advance

Few doubts about ProGran Implementation

Thank you so much for the wonderful implementation of the ProGan model. I just had a couple of doubts though-

  1. In the WSConv2d here, I am not really sure what is happening. Setting bias of Conv layer to None but storing its value before that. I am a bit confused.
  2. Also here, do we have some special reason for choosing a value of 30 for PROGRESSIVE_EPOCHS

Would really appreciate your help.

Tensorflow tutorial - video - 8

Hi,
Thanks for the great resource on DL. I found that the code snippet of video tutorial - 8 is missing in your repo. Could you please upload that?

YOLOv3, problems in get_evaluation_bboxes

Hi there! First, thanks for your works!
I would like to know if it's a issue or i am the only one with such a problem.
I'm trying to run the train on 8 examples, with batch size of 4 (but even on training loader is the same):
when i change the CONF_THRESHOLD to less than 0.6, the train seems to never go ahead on "mean average precision": the print(mapval.item) never appear.
When i change to (or equal than) 0.6, it is computed but the MAP is always 0, the no obj accuracy is always 100% and obj accuracy is always 0%.

I ran the train on training loader for 70 epochs more or less, with CONF_THRESHOLD at 0.6, but MAP was something like 0.004. I printed some images but i had several dozen of boxes in the image.

i tried to debug it but without any luck. Can someone elaborate it?

DCGAN - Dimension

If I try to change FEATURES_DISC and FEATURE_GEN to a number that is not 64 I still get generated sample that have size 64x64.
Is it normal / does it exist a way to fix that?
Thank you for your stunning work, btw.

Yolov3 requirements.txt missing

Hi,

As mentioned in the README for the Yolov3 folder I was trying to find the requirements file in the repo unfortunately I am not able to find it. Can anyone help me out with this?

README,md

image

Folder structure
image

Image captioning: all training example output is <UNK>

When training for image captioning, in the first epoch, the print_examples function returns the following

Example 1 CORRECT: Dog on a beach by the ocean
Example 1 OUTPUT: chasing stores mossy participates player brush museum phone handle drops native punk buried alongside cellphones very bags hairy paintball mouths mats markings volleyball backpacker dressed backpacks legos light bitten various pillow singing attempt superman weather try gnawing ceiling shaped tree someone phone scarf crouching courtyard cows indoors seeds hits hits
Example 2 CORRECT: Child holding red frisbee outdoors
Example 2 OUTPUT: chasing stores mossy bushes tags hardwood tulips chin lining gnawing taken tinkerbell both kind cable tile colorfully shepherd dangling skinny cake scene tattooed swimmer beverage come points come 23 wheels puppy scenic ring snake one piggy snowboard camera slightly fireworks nature try gnawing ceiling shaped tree someone phone scarf crouching
Example 3 CORRECT: Bus driving by parked cars
Example 3 OUTPUT: trucks each that cheerleader hawk jeeps formal ring skeleton forested various plastic goofy snowmobile dances very wearing seaweed cards kick works baseman past daughter football waterfalls bathroom motorcycle bar bikers phone following kid ring past converse nose nose college wide skyscraper rough holding bending seeds broken kissing follows pouring pouring
Example 4 CORRECT: A small boat in the ocean
Example 4 OUTPUT: chasing stores mossy bushes tags hardwood tulips chin lining gnawing taken tinkerbell both kind cable tile colorfully shepherd dangling skinny cake scene tattooed swimmer beverage come points come 23 wheels puppy scenic ring snake one piggy snowboard camera slightly fireworks nature try gnawing ceiling shaped tree someone phone scarf crouching
Example 5 CORRECT: A cowboy riding a horse in the desert
Example 5 OUTPUT: avoid windsurfing alongside roof between enjoys dimly artists artists others biting upon holding silhouette ascending apples curve tennis o leaves gives dinner chasing picnic pack ceremony kayak kayak office festive hikes covered visible signs dancing construction construction when hiking pillow foot leotard about all pit between stool ear sports cigarette

however, after the first epoch and later, the print_examples function returns:

Example 1 CORRECT: Dog on a beach by the ocean                                  
Example 1 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 2 CORRECT: Child holding red frisbee outdoors
Example 2 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 3 CORRECT: Bus driving by parked cars
Example 3 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 4 CORRECT: A small boat in the ocean
Example 4 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>
Example 5 CORRECT: A cowboy riding a horse in the desert
Example 5 OUTPUT: <SOS> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK> <UNK>

Im not sure what's going on

true_boxes list

The ML/Pytorch/object detection/metrics/mean_avg_precision.py is all what I was wanting for so long.
Great thanks to you @aladdinpersson

But I just wanted to ask if how can I get the true_boxes list while integrating the code into my validation pipeline.
However, I can produce the pred_boxes list in my validation code since I know where I produce the detections, scores, classes etc.

Does it means I should take it same as pred_boxes list except for where the class_prediction is 0 according to the definition?
Am I right?

Thanks in advance

Train set up Yolov3

Hey Aladdin,
Thank you for your great tutorial! I have a question about your Yolov3 training. I've tried training Yolo on Pascal VOC with my own settings, but I'm stuck at a mAP of 57.5. How did you get 78.2 mAP? Can you tell me your settings?

My settings:

BATCH_SIZE = 16
IMAGE_SIZE = 416
LEARNING_RATE = 1e-5
WEIGHT_DECAY = 1e-4
NUM_EPOCHS = 500
CONF_THRESHOLD = 0.2
MAP_IOU_THRESH = 0.5
NMS_IOU_THRESH = 0.45

Transformer Question, and Request

Learning PyTorch and love your videos. Your code is so clean and your explanations so crisp.

Question/Bug?:
In SelfAttention you split values, keys, and query by the number of heads. Then pass this into Linear with same input and output dimension. Why not keep the full dimension (ie: not split) and let the Linear do the reduction?
This would allow linear to learn what to take out of the input rather?

btw, https://github.com/tunz/transformer-pytorch/blob/master/model/transformer.py, class MultiHeadAttention(nn.Module) does this (if I interpret their code correctly).

The paper https://arxiv.org/pdf/1706.03762.pdf indicates "learned
linear projections to dk, dk and dv dimensions".

If I'm all wrong, would love to be corrected as I learning.
If I'm right, would also love to know that I'm starting to understand this stuff.

Request:
Starting to understand torch.einsum power but I am sure I am missing a bunch.
Can you do a video on this?

Regards,
John

Getting error while executing Sementic segmentation w. UNET in pytorch

Hi,
I watched your recent tutorial on sementic segmentation with pytorch. Being new to pytorch I was looking for some tutorial with good explanations especially in segmentation module and your tutorial came as a great help.
I tried to implement your way on a UNet network for segmentation on google-colab but getting an error. I tried to fix it but no luck. Can you please help me in fixing the error.
The error I am getting is:


TypeError Traceback (most recent call last)
in ()
85
86 if name == "main":
---> 87 main()

7 frames
in main()
67
68 for epoch in range(Num_epochs):
---> 69 train_fn(train_loader, model, optimizer, loss_fn, scaler)
70
71

in train_fn(loader, model, optimizer, loss_fn, scaler)
2 loop = tqdm(loader)
3
----> 4 for batch_idx, (data, targets) in enumerate(loop):
5 data= data.to(device=device)
6 targets= targets.float().unsqueeze(1).to(device=device)

/usr/local/lib/python3.6/dist-packages/tqdm/std.py in iter(self)
1102 fp_write=getattr(self.fp, 'write', sys.stderr.write))
1103
-> 1104 for obj in iterable:
1105 yield obj
1106 # Update and possibly print the progressbar.

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in next(self)
433 if self._sampler_iter is None:
434 self._reset()
--> 435 data = self._next_data()
436 self._num_yielded += 1
437 if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
473 def _next_data(self):
474 index = self._next_index() # may raise StopIteration
--> 475 data = self._dataset_fetcher.fetch(index) # may raise StopIteration
476 if self._pin_memory:
477 data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
---> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py in (.0)
42 def fetch(self, possibly_batched_index):
43 if self.auto_collation:
---> 44 data = [self.dataset[idx] for idx in possibly_batched_index]
45 else:
46 data = self.dataset[possibly_batched_index]

in getitem(self, index)
17
18 if self.transform is not None:
---> 19 augmentations= self.transform(image=image, mask=mask)
20 image = augmentations["image"]
21 mask = augmentations["mask"]

TypeError: 'int' object is not callable

data loader loss compute problem

image
According to yolo detection thought which cell the midpoint(center_x, center_y) falls in is responsible for detect the object, but in upper code not consider the adjoin grid cell, if they also have the greater than ignore_iou_thresh, the adjoin grid cell will also compute the loss. Because the code do not set their targets[scale_idx][anchor_on_scale, i, j, 0] = -1? I am looking forward to your answer. Thank you in advance.

CycleGAN

Hey there,

With the exception of changing the paths to make it google colab friendly, removing the val load in train, and setting load = False, I copied the files exactly as they are...but I keep getting this error. I'm not sure what I'm doing wrong.

0% 0/6287 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 150, in
main()
File "train.py", line 141, in main
train_fn(disc_H, disc_Z, gen_Z, gen_H, loader, opt_disc, opt_gen, L1, mse, d_scaler, g_scaler)
File "train.py", line 26, in train_fn
D_H_real = disc_H(image)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/MyDrive/Colab Notebooks/Project1Monet/CycleGAN_from_scratch_RESNET/discriminator_model.py", line 41, in forward
x = self.initial(x)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 399, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 394, in _conv_forward
_pair(0), self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 3, 4, 4], expected input[1, 256, 258, 5] to have 3 channels, but got 256 channels instead
0% 0/6287 [00:02<?, ?it/s]

Performance issues in the program

Hello,I found a performance issue in aladdinpersson_Machine-Learning-Collection/ML/TensorFlow/Basics/tutorial7-indepth-functional.py ,
train_dataset.map was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.

The same issues also exist in test_dataset.map ,
ds_train = ds_train.map(read_image).map(augment).batch(2)
and other three places.

Here is the documemtation of tensorflow to support this thing.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

I have a question about the gradient propagation of the Discriminator in WGAN-GP.

In the training process of WGAN-GP (train.py), the following grdient propagation was performed.

        fake = gen(noise)
        critic_real = critic(real).reshape(-1)
        critic_fake = critic(fake).reshape(-1)
        gp = gradient_penalty(critic, real, fake, device=device)
        loss_critic = (
            -(torch.mean(critic_real)-torch.mean(critic_fake)) + LAMBDA_GP * gp
        )
        critic.zero_grad()
        loss_critic.backward(retain_graph=True)
        opt_critic.step()

You used the final loss_critic for gradient propagation. I looked at other people's code additionally. I could see critic_real and critic_fake also doing critic_real.backward() and critic_fake.backward(). What's the difference between this method? And which method would you prefer?

Example) Zeleni9/pytorch-wgan/models /wgan_grdient_penalty.py-https://github.com/Zeleni9/pytorch-wgan

error when running code

when running your code i get this error:

RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

any tips? thank you

Training on a subset of Voc

If I want to train on a subset of VOC, do I have to create a dataset with a subset of that with it's annotations?
Or is it possible to directly do it from this code?

too much time in get_evaluation_bboxes in yolo v3

it takes so long time.
in get_evaluation_bboxes, it takes so much time to run below code, (about more than 10 hours)
for idx in range(batch_size):
nms_boxes = non_max_suppression(
bboxes[idx],
iou_threshold=iou_threshold,
threshold=threshold,
box_format=box_format,
)

ProGan RuntimeError

i downloaded celeba_hq image dataset,modified config.py (DATASET = 'celeba_hq') , modified train.py( at main()
# import sys
# sys.exit())
then when i run python train.py i got this error

return F.conv_transpose2d( RuntimeError: Expected 4-dimensional input for 4-dimensional weight [512, 512, 4, 4], but got 2-dimensional input of size [256, 512] instead

GTA5 to Cityscapes translation

Hello,
I trained this model to translate GTA5 images to Cityscapes images,
But it gave me poor result,
Anyone can help me to increase this results !!

Model overfitting for 20 classes ( PASCAL VOC 2007 + 2012 dataset )

Hi Aladdin , Thank you so much for your video and explanations,
I am currently doing a project on object detection , and your video helped me a lot.
thank you once again.

I have a problem of overfitting in the model . I am getting test map as 10% , train map 90% . I trained on PASCAL VOC 2007 + VOC 2012 data.
I have tried every way I could think of to reduce the overfitting ( dropout layer , weight decay , added 5k more images ,data augmentation , used pretrained extraction weights ,step LR etc etc ) , tried everything as close as possible to original paper
its been a month now and I am still not able to figure out why . could you please help me? ( I have used your code for everything). it would be a great help if you can suggest something with respect to your code .

P.S : I used the same code and modified it for 2 classes and 5 classes , I have got good results , 2classes : test map 50% , 5 classes : test map 60%.

RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[30, 416, 416, 3] to have 3 channels, but got 416 channels instead

RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[30, 416, 416, 3] to have 3 channels, but got 416 channels instead

I get an Runtime Error when trying to train my model.

Someone ever encountered that problem and is able to help with that one ?

I cant really understand where these Dimension problems come from and which parameters i got to check.

Greetings!

How can I obtain the output of an intermediate layer (feature extraction) in Model Subclassing ?

Please refer to the below link for more details on the issue.

https://stackoverflow.com/questions/64471742/skip-some-layers-in-keras-model-during-evaluation-validation-phase

My requirement is to override the Evaluation/Validation step after each epoch, with using the Existing Fit function.

Following below link does not work (when this code is written in test_step method)
https://keras.io/getting_started/faq/#how-can-i-obtain-the-output-of-an-intermediate-layer-feature-extraction

Idea for new Video

3 ideas:

  1. a cheat sheet with a video on all the functions in PyTorch and what context function x is used.
  2. key lessons for deep learning. Example: skip connections good, normalize data good, drop out good but when used with batch normalize ...
  3. what to expected when training big networks, how does loss normally behave.

btw, your einsum video was perfect.

Why do you need to slice captions?

https://github.com/AladdinPerzon/Machine-Learning-Collection/blob/9f3b2a82c0b8b6ba8c16293d8118d8d8c888f8e6/ML/Pytorch/more_advanced/image_captioning/train.py#L82

Hello, thank you for your version of the image captioning solution!
However, one thing is not clear to me. Why would you do that slice? If I correctly understood the captions, in that case, is a padded batch of captions, so it looks like:
1 1 1 1 1 2
1 1 1 2 0 0
1 1 1 1 2 0

and if you make a slice [:,: - 1]
that would be:
1 1 1 1 1
1 1 1 2 0
1 1 1 1 2
(1 is any token, 2 is and 0 is padding)

So if you want to get rid of tokens that would not work.

U-Net Model`s Accuracy decrease when use model.eval()

Hi ,
I used the code from
https://github.com/aladdinpersson/Machine-Learning-Collection/tree/master/ML/Pytorch/image_segmentation/semantic_segmentation_unet

to build an U-net then training on it and it works great, but I found in inference stage, If I use model.eval() the accuracy will strongly decrease. But Once I removed this line and let model run on train mode then the model will perform well..
I could find the reason, I have read some website which said maybe the Net invoke the same Batchnormlize layer in different position, But from the code I can`t see the same issues.

Anyone have any ideas?

Small mistakes in the DCGAN code

in the class Discriminator(nn.Module) and Generator(nn.Module)

  • the first and last Conv2d should have bias=False. and stride= 1,
  • remove comment from nn.BatchNorm2d(out_channels).

Unable to perform inference on pretrained weights

Hi Aladdin great tutorials you have here. I was really able to understand for the first time how to code YOLOv3 . But i couldn't find the code for inference so i decided to write one on my own but i stumbled across following issues.

  1. I tried to use your plot_couple_examples function ,but always got cuda out of memory error ,which was really weird cause i could train the model for batch size 6 on my 4gb gtx1650 gpu
  2. Then i tried to manually pass an image by reading through opencv and then expanding dimension to have batch size of 1 but got the following error
  3. RuntimeError: Given groups=1, weight of size [32, 3, 3, 3], expected input[1, 416, 416, 3] to have 3 channels, but got 416 channels instead
  4. Can you please suggest some fix i am using your repo as starting point for my college project

Number Parameters EfficientNet-B0

Hey Aladdin, thanks for the awesome YouTube videos.

I was checking the implementation of EfficientNet that you provided, and I noticed that in the final example, the total number of parameters for EfficientNet-B0 is 14,047,366.

Is this correct?
According to Table 2 of the original EfficientNet paper, I thought that Version B0 was supposed to have 5.3M parameters.
Thanks in advance,

PS. In order to calculate the total number of parameters, I inserted the following line of code into the test() function:
print(f'Total Number of Parameters: {sum( p.numel() for p in model.parameters() if p.requires_grad ):,}')

PROGAN ISSUE

I am using my own gray scale image dataset
loop = tqdm(loader, leave=True)
for batch_idx, (real, _) in enumerate(loop):
real = real.to(config.DEVICE)
cur_batch_size = real.shape[0]
On this loop i am getting an issue

ValueError: too many values to unpack (expected 2)

Clarification - Inference

Hi @aladdinpersson

thanks for the work you share. Please could you provide a clear explanation on how inference work?
I have watched your videos and still don't understand a 100% how:
1- The seqeunce is produced at training time
2- How the sequence is produced at test time

I saw your inference script but honestly, the whole thing is super blur to me.

def translate_sentence(model, sentence, german, english, device, max_length=50):

need dev

hi i need a dev to build an app whit flutter & tensor flow. Are u avaiable?

The midpoint of the bounding boxes

Hi, I was watching your video on intersection over union, which helped me alot. I tried to breakdown the code and learn it. When I calculated the midpoint with one of the tensor in the iou_test.py t1_box1 = torch.tensor([0.8, 0.1, 0.2, 0.2])I got different midpoints.
box1_x1 = t1_box1[..., 0:1] - t1_box1[..., 2:3] / 2 gave me 0.7000 and box1_x1 = (t1_box1[..., 0:1] - t1_box1[..., 2:3]) / 2 gave me 0.3000. Which is the recommended midpoint?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.