Comments (11)
I didn't do any modification on the settings.
You can access it through
https://colab.research.google.com/drive/1BBB6J0JBj1_6nxOd2OxqARe3v226nI69?usp=sharing
Thanks for sharing the Colab notebook. I also tried it and got the same error. But the interesting part is that the max GPU usage (also including the GPU cache) was around 1.2 GBs just before the training process crashed:
I can't remember what GPU I got the last time I ran the code, but this issue could be caused by the GPUtil
package and its compatibility with Tesla series, but I'm still not sure. We haven't had this problem for local GPUs so far. For now, you can manually increase the max GPU allocation threshold and run your experiments until we find the reason. For most cases the true GPU allocation shouldn't exceed the competition threshold since the threshold is pretty lenient, so no need to worry for most cases with batch size 64, unless you want to implement a strategy that requires frequent parameter replications or huge batch sizes.
from clvision-challenge-2023.
Hi @Cklwanfifa
Did you increase the batch size or the other settings?
I tried the same code on Colab before and it worked fine. At which epoch did this happen and what GPU is assigned to you when using Colab?
from clvision-challenge-2023.
I didn't do any modification on the settings.
You can access it through
https://colab.research.google.com/drive/1BBB6J0JBj1_6nxOd2OxqARe3v226nI69?usp=sharing
from clvision-challenge-2023.
I found a local A100 GPU and ran the example code. At the first checkpoint it showed that
MAX GPU MEMORY ALLOCATED: 119 MB MAX RAM ALLOCATED: 5874 MB
I assume that:
- The RAMchecker (not the GPUMemoryChecker) got the incorrect RAM data.
- The error information provided by competition_plugins might be wrong.
from clvision-challenge-2023.
I found a local A100 GPU and ran the example code. At the first checkpoint it showed that
MAX GPU MEMORY ALLOCATED: 119 MB MAX RAM ALLOCATED: 5874 MB
I assume that:
- The RAMchecker (not the GPUMemoryChecker) got the incorrect RAM data.
- The error information provided by competition_plugins might be wrong.
Thanks for the update. The RAM usage for me is between 2-2.5 GBs for the first few epochs on Mac and Linux. Similar to the GPU limit, you can manually change the limit for your experiments. We will try to find a solution for the hardware usage inconsistency.
from clvision-challenge-2023.
@HamedHemati Hi, I also have a question about the GPUMemoryChecker. The memory usage output from the GPUMemoryChecker does not match the results I see using the gpustat command, and I want to know which one ultimately prevails?
The first image is the output of GPUMemoryChecker, and the second image is the gpu usage displayed using the gpustat command.
from clvision-challenge-2023.
@ShiWuxuan Thanks for sharing the usage report. It seems like the only way to get the actual GPU memory usage in a consistent way is to use the nvidia-smi
package. The RAM usage is also not consistent across different operating systems.
One solution is to use the nvidia-smi
package, but that would also cause issues if someone tries to use a shared GPU. Therefore, we will most probably remove the RAM and GPU usage plugins, and will ask the participants to check the GPU memory usage manually (with the current limits). We will only keep the time checker plugin just to have an approximate training time limit for the strategies.
from clvision-challenge-2023.
Thank you for your detailed reply. The GPU memory usage reported by nvidia-smi
package is shown in the picture below. This is the result without changing the code, i.e. Navie
strategy + EWCPlugin
+ LwFPlugin
with batch_size=64
. Even with such a simple strategy, the GPU memory used already exceeds the given limit of 1000MB. Perhaps the previous limit was set based on the output of the GPUMemoryChecker
. May I ask if the GPU memory limit will be relaxed, which is important for method design.
from clvision-challenge-2023.
That's correct, sorry for the confusion. I got mixed up between the current limits for the RAM usage and the GPU memory usage limits. The new GPU usage limit will be 4000 MBs and the limit for the RAM usage will be removed. I hope this solves the hardware restriction issues across different platforms.
from clvision-challenge-2023.
I get it, thanks for the explanation.
from clvision-challenge-2023.
GPU and RAM usage plugins are removed in the latest version of the code. I'm closing this issue.
from clvision-challenge-2023.
Related Issues (10)
- Default dataset configuration may be set wrongly? HOT 2
- Inconsistent hardware limitations HOT 2
- Memory usage exceeds limit on Ubuntu20.04 HOT 2
- Get stuck in submitted status HOT 1
- Can I scale the image to a larger resolution? HOT 2
- Question about the usage of replay buffer HOT 1
- Question about modifying the classifier HOT 1
- 'SubSequence' object has no attribute 'slice_ids'
- Requirements Model Architecture - Ensemble HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clvision-challenge-2023.