Comments (3)
A basic tutorial style CNN + FCN gets to 0.69 train acc in one epoch and then stays there.
ResNet50 from torchvision gets:
Fresh weights: [98/100] train_loss: 0.333 - train_acc: 0.884 - eval_loss: 1.127 - eval_acc: 0.704
Pretrained weights: [100/100] train_loss: 0.207 - train_acc: 0.932 - eval_loss: 1.095 - eval_acc: 0.754
Using pretrained than random weights as a starting point is much better for optimization and generalizability, as always. Another way to help generalizability would be to train just some of the layers (e.g. add additional heads).
from torch-control.
Some fun with basics
D:\Source\torch-control\projects\ComputerVision\dermMNIST\train_basic_network.py
A) A network bottlenecked to 1x1 by CNN+maxpooling, that is torch.Size([100, 4096, 1, 1]) going into dense layers, will still train ok.
B) But a network that has another conv layer (no maxpool) following this won't. Likely because of issues with padding a 1x1 feature map for 3x3 Conv2d to run on it.
C) Maxpooling/increasing channels vs raw CNN layers helps optimization but has little impact on the final accuracy. The latter may not hold in a more difficult dataset.
Maxpool:
[22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699
No maxpool:
[22/100] train_loss: 0.510 - train_acc: 0.809 - eval_loss: 1.175 - eval_acc: 0.620
[52/100] train_loss: 0.012 - train_acc: 0.996 - eval_loss: 2.166 - eval_acc: 0.737
D) Without activation layers, the network will take longer to optimize. It still fits train data okay but fails to generalize, even with dropout:
[22/100] train_loss: 0.609 - train_acc: 0.769 - eval_loss: 0.725 - eval_acc: 0.727
[70/100] train_loss: 0.143 - train_acc: 0.936 - eval_loss: 2.066 - eval_acc: 0.686
from torch-control.
The issue of the network underfitting this data was caused by a bug in the basic implementation. Pytorch CrossEntropyLoss applies nn.log_softmax() inside it and needs to be passed raw logits. If it is passed nn.log_softmax(), it works fine, but if it is passed nn.Softmax, this really hurts optimization
With nn.Softmax() activation - stuck at 0.67 train acc:
[22/100] train_loss: 1.499 - train_acc: 0.670 - eval_loss: 1.467 - eval_acc: 0.669
No nn.Softmax() layer - just pass logits
[22/100] train_loss: 0.129 - train_acc: 0.956 - eval_loss: 1.831 - eval_acc: 0.699
In the end, fitting the train data in dermaMNIST turned out to be very easy. There is still a class imbalance and generalizability issue to val data, which will be addressed in a future issue.
from torch-control.
Related Issues (20)
- Implement sentiment chatbot using several public models
- Movie review sentiment analysis HOT 1
- Fine-tune most popular LLMs for movie sentiment analysis HOT 1
- Evaluate fine tuning only the sentiment head of sentiment analysis models HOT 1
- Add the IMDB dataset to the sentiment analysis task data
- Drop augmented datapoints with variable labels
- Implement time series power consumption regression analysis HOT 1
- Add MLOps HOT 3
- Add oversampling/augmentation to mitigate class imbalance
- Implement use of formal parameter search to log neatly with MLFlow HOT 1
- Implement customizable classification head for LLM fine tuning HOT 1
- Improve power consumption prediction model HOT 1
- Build docker container from mlflow and validate it HOT 1
- Upload Docker container to AWS as demo
- Implement production performance monitoring for power consumption model
- Build pipeline to extract structured text from images HOT 2
- Evaluate - Fine tuning the entire LLM network vs default classifier head vs bigger head HOT 5
- Summarize specific values from extracted text
- Detect bounding boxes containing text in images
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torch-control.