Comments (2)
Thanks for reporting.
Generally, we observed that MAX loss can be harder to optimize. There are ways to reduce the difficulty:
- Train with the SUM loss and make sure it achieves a reasonable performance. This is rather a sanity check for the rest of your code.
- Stage-wise optimization. First, start from pretrained encoders and train only the embedding, then switch to fine-tuning and train end-to-end.
- Tune the batch size. I expect the optimal batch size for MAX loss to be dataset-dependent. If the batch size is too large hard negatives can be outliers. If they are too small, it can take longer to train or the best we end up with is bounded by the SUM loss.
- We have unpublished studies on the effect of a separate negative set. On MSCOCO, it happended that the optimal negative set size was the same as the batch size used (128). Depending on your setting, a large negative set could be cheaper. You can trade-off speed for a larger negative set. Choose a large negative set, compute all embeddings, get rid of activations for memory saving, match your mini-batch to a single example in the negative set, then do another forward pass only for selected pairs.
Here is a plot for varying the negative set size on MSCOCO.
from vsepp.
Thank you very much for these hints. Actually, I think that the stage-wise optimization is the way to go. If I first optimize using SUM loss and then I resume, after 10 epochs, using MAX loss, the problem disappears and the validation metrics keep increasing smoothly.
However, I will pay attention also to the batch size, as you suggested.
Thanks again
from vsepp.
Related Issues (20)
- How to caculate the scores on MSCOCO 1k test images? HOT 3
- Metrics for 1k test images on MS COCO HOT 1
- The question about loss function HOT 1
- How to build vocab? HOT 2
- Can't reproduce the result using pytorch 0.4.1 branch HOT 3
- questions on dataset construction HOT 3
- encoding data
- about use dataset HOT 1
- Runs file too large HOT 1
- FileNotFoundError when try to reproduce results of pretrained model HOT 1
- train on synthetic dataset HOT 2
- Reproducing results HOT 5
- Same meanr being logged by tb_logger during validation HOT 1
- The number of COCO validation images HOT 1
- loss gap between train and test HOT 1
- Question about your model ? HOT 1
- Where are your model weights stored? HOT 1
- Doubt
- RuntimeError: mat1 and mat2 shapes cannot be multiplied (4608x2048 and 4096x1024) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vsepp.