Comments (5)
Hi @bfialkoff
The weights from the backbone only
files are from the teacher, and results from our paper are obtained from the teacher weights as well. We have indeed shown in our paper that the teacher is performing better than the student in general.
Therefore, when using the video_generation script it is loading from teacher weights (though the visualization are nearly the same if you use the student weights in that case).
For any of our evaluation scripts, if you want to evaluate the student weights instead you can do so by specifying the path towards the full checkpoint with the --pretrained_weights
argument and specifying --checkpoint_key student
.
Hope that helps
from dino.
@woctezuma thanks for helping to reply to this issue.
I have a minor remark. Our ultimate goal is to obtain the best model possible in an unsupervised way. We train the student with SGD and the teacher is an EMA of that student. We've found that the teacher is performing better than the student and that is why our final model used in downstream tasks is the teacher.
from dino.
@woctezuma thanks for helping to reply to this issue.
I have a minor remark. Our ultimate goal is to obtain the best model possible in an unsupervised way. We train the student with SGD and the teacher is an EMA of that student. We've found that the teacher is performing better than the student and that is why our final model used in downstream tasks is the teacher.
Oops, it looks like I was confused about that! Thanks for clearing that up!
Hopefully I have not confused others! Sorry about that, @bfialkoff!
from dino.
Thanks for the clarification. I guess what I meant was in the video_generation script when we load a model, we are then loading the student or the backbone? Backbone to the base model and head refers to the part of the architecture that turns it into the student model?
from dino.
I dont understand which of the two models are later used for inference is it the student or teach?
The goal is to train a student. Same as in real life. The teacher is only an expendable mean towards that goal.
Edit: See the answer by the first author below!
Are the pretrained weights provided from the teacher or the student network?
Everything is provided.
from dino.
Related Issues (20)
- Hyperparameters for ViT-B/16 + ImageNet pretraining
- issue with the ViT-S/8 full checkpoint
- Intermediate checkpoints?
- Facing Accuracy Drop while running in QDQ onnx runtime
- Why the patchembedding defaut img size is not equal to the image size in visualize attention? HOT 1
- Loss Nan Error
- Difference between DINO and DINOv2
- For large batches (256), there is a problem of loss non convergence
- Why not setting correct img_size when building the student network? HOT 1
- Why do we skip cases where the student and teacher operate on the same view? If they are operating on different views, why should they produce similar results to calculate the cross-entropy loss? HOT 1
- a solution to solve memory issues (but slows down training a bit)
- Best features for image similarity HOT 1
- Loss is not decresing HOT 4
- Why Teacher network perform bettert than Student one during training? HOT 1
- How to convert to onnx HOT 2
- Can I use dino model to match images with pixels? Just like with the clip model you can match pixels with text. HOT 1
- Choice of out_dim HOT 1
- Cannot reproduce KNN performance for vanilla ViT-S training HOT 1
- Reproducing results on segmentation using training from scratch
- Changing number of classes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dino.