Git Product home page Git Product logo

pytorch-classification's People

Contributors

dusty-nv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pytorch-classification's Issues

ValueError: Invalid backend: ''nccl''

I am trying to train a single-node multi-GPU, but I get the error. It's like this even though I installed nccl, but it's installed on /usr, so I don't think there's a PATH designation, do you know why it's like this?

image

train.py -a resnet50 --dist-url 'tcp://127.0.0.1:9999' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 --model-dir=models/cat_dog data/cat_dog

image

Evaluation/Testing script

Hey, the Training script works perfectly fine on my server. But when I was trying to test it in my server machine(x86-64) it's not. It was a custom script that I wrote for testing.
`
import numpy as np

  import torch
  import torchvision
  from torchvision import datasets, models, transforms
  import torch.utils.data as data
  import multiprocessing
  from sklearn.metrics import confusion_matrix
  import torch.nn as nn

  import torch.optim as optim
  import torch

  import torch.nn as nn
  import torch.nn.parallel

  import torch.backends.cudnn as cudnn
  import torch.distributed as dist
  import torch.optim
  import torch.multiprocessing as mp
  import torch.utils.data
  import torch.utils.data.distributed
  import torchvision.transforms as transforms
  import torchvision.datasets as datasets
  import torchvision.models as models

  EVAL_DIR = "/home/ajithbalakrishnan/vijnalabs/My_Learning/my_workspace/pytorch-image-classification_1/dataset/gps_lock/"
  EVAL_MODEL='/home/ajithbalakrishnan/vijnalabs/My_Learning/my_workspace/pytorch-image-classification_1/checkpoint.pth.tar'
    
  model = torch.load(EVAL_MODEL)
  model.eval()
  
  num_cpu = multiprocessing.cpu_count()
  bs = 8
  
  eval_transform=transforms.Compose([
          transforms.Resize(size=256),
          transforms.CenterCrop(size=224),
          transforms.ToTensor(),
          transforms.Normalize([0.485, 0.456, 0.406],
                               [0.229, 0.224, 0.225])])
  
  eval_dataset=datasets.ImageFolder(root=EVAL_DIR, transform=eval_transform)
  eval_loader=data.DataLoader(eval_dataset, batch_size=bs, shuffle=True,
                              num_workers=num_cpu, pin_memory=True)
  
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
  
  num_classes=len(eval_dataset.classes)
  dsize=len(eval_dataset)
  
  class_names=["baseballdiamond","forest","golfcourse","harbor","overpass","river","storagetanks"]
  
  predlist=torch.zeros(0,dtype=torch.long, device='cpu')
  lbllist=torch.zeros(0,dtype=torch.long, device='cpu')
  
  correct = 0
  total = 0
  with torch.no_grad():
      for images, labels in eval_loader:
          images, labels = images.to(device), labels.to(device)
          outputs = model(images)
          _, predicted = torch.max(outputs.data, 1)
  
          total += labels.size(0)
          correct += (predicted == labels).sum().item()
  
          predlist=torch.cat([predlist,predicted.view(-1).cpu()])
          lbllist=torch.cat([lbllist,labels.view(-1).cpu()])
  
  overall_accuracy=100 * correct / total
  print('Accuracy of the network on the {:d} test images: {:.2f}%'.format(dsize, 
      overall_accuracy))
  
  conf_mat=confusion_matrix(lbllist.numpy(), predlist.numpy())
  print('Confusion Matrix')
  print('-'*16)
  print(conf_mat,'\n')
  
  class_accuracy=100*conf_mat.diagonal()/conf_mat.sum(1)
  print('Per class accuracy')
  print('-'*18)
  for label,accuracy in zip(eval_dataset.classes, class_accuracy):
       class_name=class_names[int(label)]
       print('Accuracy of class %8s : %0.2f %%'%(class_name, accuracy))

`
please help me here.

Error:
`
Traceback (most recent call last):
File "eval.py", line 30, in
model.eval()
AttributeError: 'dict' object has no attribute 'eval'

`

Classification: Data-preprocessing for Much Higher Accuracy and Confidence Level

Hi Dusty, I'm using a Jetson Nano 2GB, and using the classification pipeline. I was struggling with the accuracy and confidence level for quite some time. I'm trying to classify 3 classes, and most of the time, it just got it wrong, or sometimes right, with low confidence. I knew that it was NOT related to training, coz after training, it shows Acc@1 97.xx.

I once suspected that it was the model conversion's issue, but there's almost nothing I could do about it.

At the end I reckoned that the data-preprocessing for the inferencing data and the training data might be different, so I tried to resize and crop the center of the image before feeding it to the network, then things improves DRASTICALLY!!!

This is what I changed to imagenet.py

...

# process frames until the user exits
while True:
  # capture the next image
  img_input = input.Capture()

  img_intermediate = jetson.utils.cudaAllocMapped(width=img_input.width/img_input.height*224, 
                                         height=224, 
                                         format=img_input.format)
  
  # rescale the image (the dimensions are taken from the image capsules)
  jetson.utils.cudaResize(img_input, img_intermediate)

  crop_roi = ((img_intermediate.width - 224)/2, 0, 224 + (img_intermediate.width - 224)/2, 224)
  img = jetson.utils.cudaAllocMapped(width=224,
                                         height=224,
                                         format=img_intermediate.format)
  
  jetson.utils.cudaCrop(img_intermediate, img, crop_roi)

  # classify the image
  class_id, confidence = net.Classify(img)

...

Not sure why your S3E3 video was working so well, but mine needed a little tweak, I'd like to know as well. Was it because of different versions of the code, or different machines (Jetson Nano vs something else)?

Not able to retrain on mobilenet_v2 network

Hello,

I am trying to retrain classification model (mobilenet_v1 or mobilenet_v2), and I am following instructions at :

https://github.com/dusty-nv/jetson-inference/blob/master/docs/pytorch-cat-dog.md

In Help of train.py it shows architecture as below

  --model-dir MODEL_DIR
                        path to desired output directory for saving model
                        checkpoints (default: current directory)
  -a ARCH, --arch ARCH  model architecture: alexnet | densenet121 |
                        densenet161 | densenet169 | densenet201 | googlenet |
                        inception_v3 | mnasnet0_5 | mnasnet0_75 | mnasnet1_0 |
                        mnasnet1_3 | mobilenet_v2 | resnet101 | resnet152 |
                        resnet18 | resnet34 | resnet50 | resnext101_32x8d |
                        resnext50_32x4d | shufflenet_v2_x0_5 |
                        shufflenet_v2_x1_0 | shufflenet_v2_x1_5 |
                        shufflenet_v2_x2_0 | squeezenet1_0 | squeezenet1_1 |
                        vgg11 | vgg11_bn | vgg13 | vgg13_bn | vgg16 | vgg16_bn
                        | vgg19 | vgg19_bn | wide_resnet101_2 |
                        wide_resnet50_2 (default: resnet18)

We can see tha mobilnet_v2 is shown as supported. But when I execute below command:

python3 train.py -a mobilenet_v2 --model-dir=models/cat_dog data/cat_dog

I get Below error:

Use GPU: 0 for training
=> dataset classes:  2 ['cat', 'dog']
=> using pre-trained model 'mobilenet_v2'
Traceback (most recent call last):
  File "train.py", line 506, in <module>
    main()
  File "train.py", line 135, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "train.py", line 205, in main_worker
    model = reshape_model(model, args.arch, num_classes)
  File "/jetson-inference/python/training/classification/reshape.py", line 55, in reshape_model
    print("classifier reshaping not supported for " + args.arch)
NameError: name 'args' is not defined

Can anybody please guide How can I retrain mobilenet classification network for Nvidia Jetson devices?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.