Comments (9)
I figured it out!
something happened where a lock file was generated, but never cleared. The steps to fix it are the following:
when the python file hangs, use ctrl-c to kill the process. There should be a stack trace that is printed out. mine was the following:
File "", line 1, in
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/qtorch/quant/init.py", line 1, in
from .quant_function import *
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/qtorch/quant/quant_function.py", line 20, in
quant_cuda = load(
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1202, in load
return _jit_compile(
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1439, in _jit_compile
baton.wait()
File "/home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/file_baton.py", line 42, in wait
time.sleep(self.wait_seconds)
analyzing this trace, we see that it is hung on a file lock. I used pdb to debug the program like so:
python3 -m pdb my_file.py
within pdb, i set a breakpoint at the file:
b /home/lost/.pyenv/versions/3.9.13/lib/python3.9/site-packages/torch/utils/file_baton.py:42
press c to continue...
I then opened the file lock code and noticed there was an object called "self.lock_file_path"
i printed it by typing "self.lock_file_path" in pdb
navigate to this path (sans lock)
and delete the lock file
your file should now run again :)
from qpytorch.
Hi @hasnainnaeem ,
what's your environment? pytorch, cuda version?
from qpytorch.
Environment Details:
Torch: 1.11.0
Cuda: 11.3
Ubuntu: 22.7
Python: 3.8
GCC: 9.3
from qpytorch.
I ran into this exact problem. it seems it is hanging during the just in time compilation. I am not sure yet how to fix it.. it might require reinstalling pytorch to clear out some cache or something
from qpytorch.
I ran into this exact problem. it seems it is hanging during the just-in-time compilation. I am not sure yet how to fix it.. it might require reinstalling PyTorch to clear out some cache or something
Unfortunately, that doesn't fix the issue. I tried doing that multiple times, plus reinstalled the Linux subsystem. Then, I tried again on dual-booted ubuntu, but the issue persisted.
Right now, I am working on Colab, it does not occur there.
I think it has something to do with the graphics card/drivers.
from qpytorch.
Awesome! Thanks for letting me.
I knew it had something to do with some lock file, but I couldn't find the lock file.
from qpytorch.
I'm glad I could help :)
from qpytorch.
Thank you very much for the solution! I have no idea why I suddenly ran into the same situation, but the solution fix the problem! (The codes work normally for weeks, then suddenly freeze...)
from qpytorch.
Hi all on this thread,
Thank you all for sharing the knowledge here. I have become too busy to maintain this repo and have not tested it on more recent environment.
Sorry about this!
Bests,
Tianyi
from qpytorch.
Related Issues (20)
- no module named 'quant_cpu' HOT 2
- optim_low breaks if some parameter in the model has None gradient HOT 2
- Question about float quantization HOT 3
- RuntimeError: Error building extension 'quant_cuda' HOT 6
- Problems in Ninjia build HOT 3
- Segmentation fault HOT 12
- Floatpoint(8,23)flips the input values HOT 4
- About the Speed of Low Precision Training HOT 1
- How to represent integer? HOT 1
- SWALP Example HOT 1
- FixedPoint `symmetric=True` Min Value
- Is there any way to directly convert fp16 tensors to low precision tensors? HOT 2
- Model Export to ONNX dose not work due to quant functions
- Why the gradient scaling factor is multiplied before quantization?
- 'gbk' codec can't decode
- Rex
- quant_cuda does not compile.
- is there any way to control exponent bias?
- float_quantize at multi-gpu works wrong.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from qpytorch.