utsaslab / monet Goto Github PK
View Code? Open in Web Editor NEWMONeT framework for reducing memory consumption of DNN training
Home Page: https://arxiv.org/abs/2010.14501
License: MIT License
MONeT framework for reducing memory consumption of DNN training
Home Page: https://arxiv.org/abs/2010.14501
License: MIT License
Hi, thanks for the awesome library.
How do you measure the used memory? Is it emperically measured or is it theorically computed?
I measured the memory usage by nvidia-smi
and found MONet does not save the memory used by PyTorch.
First, I run the 10GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_10.00.pkl
. The peak memory reported by nvidia-smi
is around 12GB.
Then, I run the 6GB solution by python3 imagenet.py ~/imagenet -a resnet50 --gpu 0 --epochs 1 --batch-size 184 --sol ../data/monet_r50_184_24hr/solution_resnet50_184_inplace_conv_multiway_newnode_6.00.pkl
. The peak memory reported by nvidia-smi
is still around 12GB.
How to use MONet to actually save the memory used by PyTorch?
Hi, thanks for the open-source. I am wondering if MONeT could work with mixed precision training?
The usage in README is all about CNN models, can it run with Transformer models?
And why create a solution have to Obtain the Gurobi academic license?
Hi, thanks for the awesome library.
One of the biggest purposes of saving memory is to enable traninig with larger batch sizes.
How to use MONet to do this? Specifically,
Thanks for your work!
Currently we have two questions:
load
function when running examples/training.py. We have also tried PyTorch 1.5.0 with CUDA 10.1, we didn't get previous error but got cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
at forward function in monet/lm_ops/bn.py, and the program(examples/training.py) took a long time on initialization. Can you post the detailed configurations, including PyTorch, CUDA, g++, etc. ?python cvxpy_solver.py MODEL ...
, and the model format should be "torchvision.models.<model>()"
. Can we use MONeT to generate solutions for our own models?A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.