Comments (4)
Hi, thanks for your attention. @ShenXianwen
did you set a small batch size? what was the number, exactly? when was the nan appearing, during training after several epochs or right in the first epoch?
sorry I could not try by myself since I have no access to the servers until next week.
from mar.
hello,thanks for your reply.I set the batch size to 64.The nan appearing during training after 2 epochs.I didn't use your prepared data(MSMT17.mat and Market.mat). I followed your steps to run the construct_dataset_Market.m and construct_dataset_MSMT17.m in MATLAB. But I used the prepared_weight.pth.
from mar.
ok. let me try it next week when I have the access to servers.
from mar.
Hi @ShenXianwen, it turns out that the nan comes out because the default learning rate is too large for a small batch size like 64. A small batch size indicates a stronger and sharper gradient (large batch size would average over more samples, thus smooth gradient), so we need to turn down the lr. I did not try much, but dividing the lr by 10 would enable you to get rid of this problem.
However we should note that the performance would probably drop, since the distribution estimation is less precise due to small batch size.
from mar.
Related Issues (20)
- Checkpoint resume error HOT 4
- continue: checkpint resume error HOT 3
- Released code reproduce result with default parameters lower than publish one HOT 4
- Issue in utils _update_centers function HOT 3
- Question about MDL loss in paper HOT 1
- MSMT17 preprocessed data HOT 1
- MSMT17.mat can't read HOT 2
- labels_target = target_tuple[1].cuda() HOT 4
- Some question about the loss and code HOT 1
- set_storage_offset error HOT 2
- change the batchsize got low performance HOT 3
- RuntimeWarning: invalid value encountered in greater is_positive = p_agree[similar_idx] > self.threshold.item() HOT 1
- Could you please share me the MSMT17 original dateset? the official dateset url is missed. Thanks! HOT 2
- Thank you for your sharing. I wonder what the results is in source dataset HOT 1
- Lower r1, r5, r10 and MAP results HOT 3
- Need pretrained_Duke.pth HOT 2
- nan error HOT 1
- loss target could not decrease
- ValueError: axes don't match array
- About the MSMT17 dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mar.