Git Product home page Git Product logo

Comments (3)

bdzyubak avatar bdzyubak commented on July 28, 2024

A basic tutorial style CNN + FCN gets to 0.69 train acc in one epoch and then stays there.

ResNet50 from torchvision gets:
Fresh weights: [98/100] train_loss: 0.333 - train_acc: 0.884 - eval_loss: 1.127 - eval_acc: 0.704
Pretrained weights: [100/100] train_loss: 0.207 - train_acc: 0.932 - eval_loss: 1.095 - eval_acc: 0.754

Using pretrained than random weights as a starting point is much better for optimization and generalizability, as always. Another way to help generalizability would be to train just some of the layers (e.g. add additional heads).

from torch-control.

bdzyubak avatar bdzyubak commented on July 28, 2024

Some fun with basics
D:\Source\torch-control\projects\ComputerVision\dermMNIST\train_basic_network.py

A) A network bottlenecked to 1x1 by CNN+maxpooling, that is torch.Size([100, 4096, 1, 1]) going into dense layers, will still train ok.

B) But a network that has another conv layer (no maxpool) following this won't. Likely because of issues with padding a 1x1 feature map for 3x3 Conv2d to run on it.

C) Maxpooling/increasing channels vs raw CNN layers helps optimization but has little impact on the final accuracy. The latter may not hold in a more difficult dataset.
Maxpool:
[22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699

No maxpool:
[22/100] train_loss: 0.510 - train_acc: 0.809 - eval_loss: 1.175 - eval_acc: 0.620
[52/100] train_loss: 0.012 - train_acc: 0.996 - eval_loss: 2.166 - eval_acc: 0.737

D) Without activation layers, the network will take longer to optimize. It still fits train data okay but fails to generalize, even with dropout:
[22/100] train_loss: 0.609 - train_acc: 0.769 - eval_loss: 0.725 - eval_acc: 0.727
[70/100] train_loss: 0.143 - train_acc: 0.936 - eval_loss: 2.066 - eval_acc: 0.686

from torch-control.

bdzyubak avatar bdzyubak commented on July 28, 2024

The issue of the network underfitting this data was caused by a bug in the basic implementation. Pytorch CrossEntropyLoss applies nn.log_softmax() inside it and needs to be passed raw logits. If it is passed nn.log_softmax(), it works fine, but if it is passed nn.Softmax, this really hurts optimization
With nn.Softmax() activation - stuck at 0.67 train acc:
[22/100] train_loss: 1.499 - train_acc: 0.670 - eval_loss: 1.467 - eval_acc: 0.669

No nn.Softmax() layer - just pass logits
[22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699

In the end, fitting the train data in dermaMNIST turned out to be very easy. There is still a class imbalance and generalizability issue to val data, which will be addressed in a future issue.

from torch-control.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.