tiiiger / qpytorch Goto Github PK
View Code? Open in Web Editor NEWLow Precision Arithmetic Simulation in PyTorch
License: MIT License
Low Precision Arithmetic Simulation in PyTorch
License: MIT License
By trying to import the module,
from qtorch.quant import Quantizer, quantizer
an error occurs,
UnicodeDecodeError: 'gbk' codec can't decode byte 0x92 in position 16034: illegal multibyte sequence
May I know what could be the possible sources of error? I suspect it's the open file has to be in specified format in your codes.
Beginner question: Why do we not need to quantize the non-convolutional, non-linear layers in the VGG network (e.g. BatchNorm2d
, ReLU
, Dropout
, etc.)? In other words, why do we not pass all of the layer types into layer_types
to ensure that every tensor used is quantized?
QPyTorch/examples/SWALP/train.py
Lines 118 to 125 in 176bf2a
If I am understanding correctly, SWALP involves quantizing everything. Is the reason that those other layers do not have parameters, so the forward/backward pass through them will preserve the low-precision? (Although, BatchNorm2d
has parameters for the affine transformation.)
Currently, the CUDA kernel load 32bits float at each time. Using the vectorized load with 128 bits maybe faster. Try this.
Hi,everyone
I have use QPytorch to do naive cast from float(1,8,23) to half(1,5,10). However, I found the result is different from pytorch's cast.
Here is the test code.
`
import torch
from qtorch.quant import Quantizer
from qtorch import FloatingPoint
torch.manual_seed(25)
bit_16 = FloatingPoint(exp=5, man=10)
weight_quant = Quantizer(forward_number=bit_16, backward_number=None,
forward_rounding="nearest", backward_rounding="nearest").cuda()
a = torch.rand([1000]).cuda()
b = torch.rand([1000]).cuda()
print("round:..............")
a = a[314]
res1 = weight_quant(a)
res2 = a.half().float()
assert res1.equal(res2)
`
It seems when I cast a value, which will be subnormal in my DIY precision, and be normal in float32, it will cause this error.
Currently, the CUDA kernel only supports float, i.e, using float to simulate low-precision numbers. We should improve the CUDA kernels to be type agnostic, with ATEN AT_DISPATCH_FLOATING_TYPES.
Do we have any way to export to ONNX? Can we simply remove quant layers for exporting to ONNX?
Line 23 in 176bf2a
Should this be [-2^{wl-fl-1}+2^{-fl}, 2^{wl-fl-1}-2^{-fl}]
? This seems to be how symmetric fixed-point is used in the WAGE paper (bottom of page 3).
For the current formula, if wl=8
and fl=1
, then the min value is -2^{8-7-1}^{-7} = -2^{0}^{-7} = -2^{0} = -1
, which is the same as if symmetric=False
.
Hi, I want to print the error before and after quantization in the bakcward,I am wondering Qpytorch can make it.
Thanks~
Hi. I was wondering if you were aware that quantizing gradients doesn't work for sparse tensors? This is because x.contiguous() is used and sparse tensors do not have "is_contiguous". Is there a reason for using contiguous? Do you have any suggestions for making this work for sparse tensors? Thanks in advance.
There are many potential ways to do quantization for a neural network, for example, different bits for forward and backward, different number type for forward and backward. However, sometimes we also prefer simple quantization. We should add APIs to support:
There are a lot of benchmarking to do and the list is organized by priority:
Benchmark auto low by run auto-lowed networks with networks inserted with quantization layer by hand
Benchmark old and new implementation on GCP P100
Time a typical fully connected layer and convolutional layer with quantization, time ratio of matrix multiplication and quantization
Benchmark quantization CUDA kernel bandwidth
Benchmark quantization speed on quantizing inputs of different sizes
Useful links:
https://devblogs.nvidia.com/how-implement-performance-metrics-cuda-cc/
.
Hello,
I am trying to run the fixed_baseline.sh for lp_train configuration in my linux machine, with the following configurations.
Python = 3.8
CUDA = 11.0
PyTorch = 1.8.0
The code is running fine in CPU, but if I changed the device to CUDA then I am getting a Segmentation fault (core dumped).Please if you have seen this in past, kindly help me to resolve this.
Dear Sir/Madam,
Thanks for your open source codes. I want to use it to do some experiments on fixed-16 points but got such errors:**
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1673, in _run_ninja_build
env=env)
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "alexnet16fix14.py", line 15, in
from qtorch.quant import fixed_point_quantize, block_quantize, float_quantize
File "/home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/init.py", line 1, in
from .quant_function import *
File "/home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_function.py", line 29, in
os.path.join(current_path, "quant_cuda/quant.cu"),
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1091, in load
keep_intermediates=keep_intermediates)
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1302, in jit_compile
is_standalone=is_standalone)
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1407, in write_ninja_file_and_build_library
error_prefix=f"Error building extension '{name}'")
File "/home/ubuntu/.local/lib/python3.6/site-packages/torch/utils/cpp_extension.py", line 1683, in run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'quant_cuda': [1/7] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output bit_helper.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS -D__CUDA_NO_BFLOAT16_CONVERSIONS -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/bit_helper.cu -o bit_helper.cuda.o
FAILED: bit_helper.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output bit_helper.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/bit_helper.cu -o bit_helper.cuda.o
nvcc fatal : Unknown option '-generate-dependencies-with-compile'
[2/7] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sim_helper.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/sim_helper.cu -o sim_helper.cuda.o
FAILED: sim_helper.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output sim_helper.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/sim_helper.cu -o sim_helper.cuda.o
nvcc fatal : Unknown option '-generate-dependencies-with-compile'
[3/7] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output block_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/block_kernel.cu -o block_kernel.cuda.o
FAILED: block_kernel.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output block_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/block_kernel.cu -o block_kernel.cuda.o
nvcc fatal : Unknown option '-generate-dependencies-with-compile'
[4/7] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output float_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/float_kernel.cu -o float_kernel.cuda.o
FAILED: float_kernel.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output float_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/float_kernel.cu -o float_kernel.cuda.o
nvcc fatal : Unknown option '-generate-dependencies-with-compile'
[5/7] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fixed_point_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/fixed_point_kernel.cu -o fixed_point_kernel.cuda.o
FAILED: fixed_point_kernel.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output fixed_point_kernel.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/fixed_point_kernel.cu -o fixed_point_kernel.cuda.o
nvcc fatal : Unknown option '-generate-dependencies-with-compile'
[6/7] /usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output quant.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/quant.cu -o quant.cuda.o
FAILED: quant.cuda.o
/usr/local/cuda/bin/nvcc --generate-dependencies-with-compile --dependency-output quant.cuda.o.d -DTORCH_EXTENSION_NAME=quant_cuda -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE="gcc" -DPYBIND11_STDLIB="libstdcpp" -DPYBIND11_BUILD_ABI="cxxabi1011" -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/TH -isystem /home/ubuntu/.local/lib/python3.6/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.6m -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS -D__CUDA_NO_HALF_CONVERSIONS_ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -std=c++14 -c /home/ubuntu/.local/lib/python3.6/site-packages/qtorch/quant/quant_cuda/quant.cu -o quant.cuda.o
nvcc fatal : Unknown option '-generate-dependencies-with-compile'
ninja: build stopped: subcommand failed.
I used the the GPU from AWS server with ubuntu 18.04. Here is the detailed information**
ubuntu@ip-172-31-47-61:~/src/train$ python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.18.4
Python version: 3.6 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: Tesla V100-SXM2-16GB
GPU 1: Tesla V100-SXM2-16GB
GPU 2: Tesla V100-SXM2-16GB
GPU 3: Tesla V100-SXM2-16GB
Nvidia driver version: 450.119.03
cuDNN version: Probably one of the following:
/usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7.6.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.0/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.0.5
/usr/local/cuda-11.1/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.0.5
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] qtorch==0.3.0
[pip3] torch==1.8.0
[pip3] torchvision==0.9.0
[conda] blas 1.0 mkl
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py37he8ac12f_0
[conda] mkl_fft 1.2.1 py37h54f3939_0
[conda] mkl_random 1.1.1 py37h0573a6f_0
[conda] numpy 1.19.2 py37h54aff64_0
[conda] numpy-base 1.19.2 py37hfa32c7d_0
[conda] numpydoc 1.1.0 pyhd3eb1b0_1
Could you provide me any suggestions? Thank you very much.
Hi again! I'm experiencing some weird results in fixed_point_quantize
that I think might not be correct behavior. Here is an example:
import torch
from qtorch.quant import fixed_point_quantize
tensor = torch.Tensor([1, 2, -1, -2])
fixed_point_tensor = fixed_point_quantize(tensor, wl=8, fl=8, rounding='nearest')
print('full precision tensor:', tensor)
print('fixed point quantized tensor:', fixed_point_tensor)
The output of this is:
full precision tensor: tensor([ 1., 2., -1., -2.])
fixed point quantized tensor: tensor([ 0.4961, 0.4961, -0.5000, -0.5000])
However, with 8 integer bits and 8 fractional bits, the original tensor values should definitely be representable, so the result should be the exact same. Would be great to take a look at this and see what's going on!
Hi,
when I run "from qtorch.quant import Quantizer, quantizer" here report an error
**/usr/lib/python3.6/imp.py in find_module(name, path)
295 break # Break out of outer loop when breaking out of inner loop.
296 else:
--> 297 raise ImportError(_ERR_MSG.format(name), name=name)
298
299 encoding = None
ImportError: No module named 'quant_cpu'**
It has the same problem in 0.1.1 and 0.2.0. How to fix it?
Hi. According to my understanding for k bit exponent maximal exponent value should be calculated as following (2^k - 1) - (2^(k - 1) - 1) = 2^(k-1).
In your code I see max exponent calculated ((1 << (exp_bits-1))-1) + 127 which is equivalent to 2^(k-1) - 1.
For instance at (1,4,3) configuration (sign, exp, mantissa) maximal representable number should be:
2^8 * 1.875 = 480
but with qtorch I get
2^7 * 1.875 = 240
Could you please clarify if this is bug or explain the rational behind reduced upper limit.
Thanks.
I wonder if it is possible to simulate integers using QPtorch!
When i call qtorch.quant
i receive message:
Traceback (most recent call last):
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
`subprocess.run(`
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/qtorch/quant/__init__.py", line 1, in <module>
from .quant_function import *
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/qtorch/quant/quant_function.py", line 20, in <module>
quant_cuda = load(
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1508, in _jit_compile
_write_ninja_file_and_build_library(
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1623, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'quant_cuda': [1/1] c++ quant_cuda.o bit_helper.cuda.o sim_helper.cuda.o block_kernel.cuda.o float_kernel.cuda.o fixed_point_kernel.cuda.o quant.cuda.o -shared -L/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/.../anaconda3/envs/qtorch/lib64 -lcudart -o quant_cuda.so
FAILED: quant_cuda.so
c++ quant_cuda.o bit_helper.cuda.o sim_helper.cuda.o block_kernel.cuda.o float_kernel.cuda.o fixed_point_kernel.cuda.o quant.cuda.o -shared -L/home/.../anaconda3/envs/qtorch/lib/python3.10/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/home/.../anaconda3/envs/qtorch/lib64 -lcudart -o quant_cuda.so
/home/.../anaconda3/envs/qtorch/bin/../lib/gcc/x86_64-conda-linux-gnu/9.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lcudart
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
Versions of requirements:
cuda: 11.6
python :3.10
pytorch: 1.13
gcc: 9.3.0
sphinx: 5.3.0
In the step()
method here, if p.grad = None
, then this line will break causing the optimizer to crash. However, in several Deep Learning applications, it is common to have some parameters within a layer or even whole layers to be frozen. QPyTorch's optimizer would not be applicable to these cases.
PyTorch's default optimizer has a simple and elegant solution which is to just skip these parameters treating None
has equivalent to 0
gradient. We have implemented this solution here. I would like to propose this change to QPyTorch as well.
Right now auto_low and low-precision optimizer either take a quantization module or a quantization function. We should change this by having auto_low and low-precision optimizer take arguments and parse inside the framework.
Hi,
I've been working with low-precision CIFAR-10 training, and I used Posit numbers (bit_8 = Posit(nsize, es)). However, I realized that setting 'nsize' to an 'odd' number leads to an unexpected crash. It works for nsize = 10, 8, and 6, but whenever I tried nsize = 9 or 7 the crash happens in training.
[I am using Google Colab, and everything is fine for Floating point numbers].
**Such a great work! Thanks.
Thanks for creating this! Been trying to use it for some experiments, and found that the qtorch/quant/quant_cpu
and qtorch/quant/quant_cuda
libraries are missing from the pip installation.
For example, in python, when I write
from qtorch.quant import Quantizer
I get the error
FileNotFoundError: [Errno 2] No such file or directory: '/Users/calvinq/miniconda3/envs/envname/lib/python3.7/site-packages/qtorch/quant/quant_cpu/quant_cpu.cpp'
And when I list what's in the directory ls users/calvinq/.../qtorch/quant/
all I get are:
__init__.py __pycache__ quant_function.py quant_module.py
Any idea on how to fix this? Thank you so much :)
>>> half_precision_tensor = torch.rand(5).half()
>>> low_precision_tensor = float_quantize(half_precision_tensor, exp=5, man=2, rounding="nearest")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/data/users/heyangqin/anaconda3/envs/deepspeed_fp8/lib/python3.9/site-packages/qtorch/quant/quant_function.py", line 283, in float_quantize
out = quant_module.float_quantize_nearest(x.contiguous(), man, exp)
RuntimeError: expected scalar type Float but found Half
from qtorch.quant import Quantizer, quantizer doesn't work saying which : no hipcc in /PATH and I use CentOS 7.6.
It seems that Quatnizer for CPU Version have another name or I miss something.
Do you have a solution for this problem?
Is there a way to do non uniform quantization woth qtorch ?
Rewrite block floating point kernel with bitwise operation.
Question:
Should we add the random number before the exponent bias shift or after?
Block-offset floating point can be implemented in almost the same with block floating point so they should reuse the same kernel.
I run the example I found in the documentation: (https://qpytorch.readthedocs.io/en/latest/examples/tutorial/CIFAR10_Low_Precision_Training_Example.html)
But the accuracy is 10% and it does not increase even if I increase the number of epochs.
Is there something wrong in the example or I am missing something?
Thanks
Hi! I encountered a bit of a bug. When trying to use OptimLP with Adam, line 97 in optim_low.py
tries to access the key 'momentum' in the parameter group, but Adam does not have this field so an error occurs. Maybe this could include a check to see if 'momentum' in group
?
Here's are examples:
For SGD,
optimizer = SGD(model.parameters(), lr=0.05)
optimizer = OptimLP(optimizer,
weight_quant=weight_quant,
grad_quant=grad_quant,
momentum_quant=momentum_quant,
acc_quant=acc_quant
)
Output: dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay', 'amsgrad'])
For Adam,
optimizer = Adam(model.parameters())
optimizer = OptimLP(optimizer,
weight_quant=weight_quant,
grad_quant=grad_quant,
momentum_quant=momentum_quant,
acc_quant=acc_quant
)
Output: dict_keys(['params', 'lr', 'betas', 'eps', 'weight_decay', 'amsgrad'])
I've tried to install qpytorch, and at the begining of the Functionality_Overview example, i got this error:
CalledProcessError Traceback (most recent call last)
~/anaconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix)
1666 stdout_fileno = 1
-> 1667 subprocess.run(
1668 command,
~/anaconda3/lib/python3.8/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs)
511 if check and retcode:
--> 512 raise CalledProcessError(retcode, process.args,
513 output=stdout, stderr=stderr)
CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
in
1 import torch
----> 2 import qtorch
~/tfm/QPyTorch/qtorch/init.py in
1 from .number import *
----> 2 from .posit_activation import *
3 all = ["FixedPoint", "BlockFloatingPoint", "FloatingPoint", "Posit", "PositTanhModule","PositTanhModuleEnhanced","RefTanhModule"]
~/tfm/QPyTorch/qtorch/posit_activation.py in
2 #Todo : implement sigmoid, rarely used in modern DNN
3 import torch
----> 4 from qtorch.quant import posit_sigmoid, posit_tanh, posit_tanh_enhanced
5 class PositTanhModule(torch.nn.Module):
6 def forward(self, input):
~/tfm/QPyTorch/qtorch/quant/init.py in
----> 1 from .quant_function import *
2 from .quant_module import *
3
4 all = [
5 "fixed_point_quantize",
~/tfm/QPyTorch/qtorch/quant/quant_function.py in
20
21 if torch.cuda.is_available():
---> 22 quant_cuda = load(
23 name="quant_cuda",
24 sources=[
~/anaconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py in load(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
1077 verbose=True)
1078 '''
-> 1079 return _jit_compile(
1080 name,
1081 [sources] if isinstance(sources, str) else sources,
~/anaconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py in _jit_compile(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_python_module, is_standalone, keep_intermediates)
1290 clean_ctx=clean_ctx
1291 )
-> 1292 _write_ninja_file_and_build_library(
1293 name=name,
1294 sources=sources,
~/anaconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py in _write_ninja_file_and_build_library(name, sources, extra_cflags, extra_cuda_cflags, extra_ldflags, extra_include_paths, build_directory, verbose, with_cuda, is_standalone)
1402 if verbose:
1403 print(f'Building extension module {name}...')
-> 1404 _run_ninja_build(
1405 build_directory,
1406 verbose,
~/anaconda3/lib/python3.8/site-packages/torch/utils/cpp_extension.py in _run_ninja_build(build_directory, verbose, error_prefix)
1681 if hasattr(error, 'output') and error.output: # type: ignore
1682 message += f": {error.output.decode()}" # type: ignore
-> 1683 raise RuntimeError(message) from e
1684
1685
RuntimeError: Error building extension 'quant_cuda'`
Provide equivalent quantization kernel on CPU
Please do.
Hi again! Just wondering, in the CIFAR10_Low_Precision_Training_Example notebook, why is the backward_number for weights, gradients, and momentum set to None, while the activations/errors have low precision backward passes. Shouldn't they all be in low precision mode to be trained as according to the paper?
Maybe I am misunderstanding the roles of the different parameters.
Thank you!
Hi, I use QPytorch on V100 for cuda9 with pytorch version 1.1.0. I do a simple test but it's error.
Is my understand of Qpytorch using is wrong? Thanks!
`
import torch
from qtorch.quant import Quantizer
from qtorch.optim import OptimLP
from qtorch import FloatingPoint
import torch.nn as nn
bit_16 = FloatingPoint(exp=5, man=10)
weight_quant = Quantizer(forward_number=bit_16, backward_number=None,
forward_rounding="nearest", backward_rounding="nearest").cuda()
a = torch.rand([10]).cuda()
b = torch.rand([10]).cuda()
print("add:.................")
res1 = a.half() + b.half()
res2 = weight_quant(weight_quant(a) + weight_quant(b))
print(res1)
print(res2)
assert res1.equal(res2.half())
`
Implement WAGE using our fixed point quantization kernel
The code file does not execute and freezes indefinitely. When I remove the below-mentioned line from the code, the project runs as expected:
from qtorch.quant import float_quantize, fixed_point_quantize, block_quantize
I wonder if anyone encountered a similar issue. @Tiiiger Any ideas why this is happening?
We are unable to proceed with our research because of this issue, if you could please spare some time to look into the code, I can send you the files (setup will take ~2 mins).
Hi,
I have tried the following code
a=torch.tensor([3.0])
out=float_quantize(a,8,23,"nearest")
The output is printed as -3.0.
This happens only when the rounding is nearest .I am not able to understand why is this happening. Can you please explain me why is this happening, as I am missing something here.
Hi there,
As the WAGE paper mention, they used the integer number value for W,A,G,E with 2,8,8,8 bits respectively. However, your implementation used fix-point value for that numbers, is there any explanation for that or I misunderstand something?
Thanks,
What are the steps to run the lp_train test with IMAGENET12?
QPyTorch/qtorch/optim/optim_low.py
Line 81 in ed0d8b1
In OptimLP
, the gradient scaling factor is multiplied before quantization. However, grad scaling is meant to prevent possible underflow of low precision quantized gradient values. I think the current implementation cannot prevent underflow.
Maybe the correct implementation is to multiply the scaling factor after quantization.
p.grad.data = self.grad_quant(p.grad.data) * self.grad_scaling
Right now I have only implemented stochastic rounding CUDA kernel. We need to add nearest rounding CUDA kernel.
hi.
I am trying to reproduce this paper
Is there any way to modify qpytorch exponent bias(seems like default is 2^(exp-1)-1)? thanks in advance
Hi,
The minimum and maximum values you can achieve with float numbers in this implementation seem to be different from IEEE-754-like representations. I assume that you are sacrificing some precision to avoid the computation of subnormal values. Is that correct?
Could you please clarify if this is a problem in the implementation or if this was a design decision (which causes values close to zero to be poorly represented)?
Here is my understanding of this issue:
Let's assume that our exponent is 2 bits and our mantissa is also 2 bits
Exponent:
Usually, the exponent should be able to go from -1 (00) to 2 (11).
In this QPytorch implementation, it goes from -2 (00) to 1(11).
Usually, if the exponent is zero, we will use a denormalized mantissa.
In this QPytorch implementation, if the exponent is zero, the output is also zero (regardless of the mantissa).
Mantissa:
The mantissa has an invisible leading bit, meaning that it varies between 1 and ~2.
For both implementations, if we have 2 bits in the mantissa, our mantissa should vary between 1 (1.00) and 1.75 (1.11) in steps of 0.25.
Here are the numbers that would be representable using the usual representation:
[-0.0, -0.25, -0.5, -0.75, -1.0, -1.25, -1.5, -1.75, -2.0, -2.5, -3.0, -3.5, -4.0, -5.0, -6.0, -7.0, 0.0, 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5, 4.0, 5.0, 6.0, 7.0]
Here are the numbers representable in this QPytorch implementation (without subnormal values):
[-3.5, -3.0, -2.5, -2.0, -1.75, -1.5, -1.25, -1.0, -0.875, -0.75, -0.625, -0.5, 0.0, 0.5, 0.625, 0.75, 0.875, 1.0, 1.25, 1.5, 1.75, 2.0, 2.5, 3.0, 3.5]
Code:
import torch
import numpy as np
from qtorch.quant import *
def test_float():
# Configurations
exp, man = 2,2
d="cpu"
r="nearest"
# Get min and max values representable
a_max_pos = (2 ** (2 ** (exp - 1))) * (1 - 2 ** (-man - 1))
a_max_neg = -(2 ** (2 ** (exp - 1))) * (1 - 2 ** (-man - 1))
a_min_pos = (2 ** (-(2 ** (exp - 1)) + 1))
a_min_neg = -(2 ** (-(2 ** (exp - 1)) + 1))
print(f'min: ±{a_min_pos}\tmax: ±{a_max_pos}')
# Block box test to get representable numbers
prev_val=None
quant_vals=[]
for i in np.arange(-10,10,0.001):
a = torch.Tensor([i]).to(device=d)
quant_a = float_quantize(a, exp=exp, man=man, rounding=r)
if prev_val!=quant_a[0]:
quant_vals.append(quant_a[0].item())
prev_val=quant_a
print("Values representable in QPytorch")
print(quant_vals)
# IEEE-like implementation
quant_vals=[]
for sign in [-1,1]:
exp_offset=(2**(exp-1)-1)
for e in np.arange(0,2**(exp)):
m_step=2**(-man)
for m in np.arange(1,2,m_step):
if e==0:
v=sign*(m-1)
else:
v=sign*2.**(e-exp_offset)*m
#print(f"sign: {sign} \texp:{e}\t man:{m}\tvalue:{v}")
quant_vals.append(v)
print("Values representable in an IEEE-like implementation:")
print(quant_vals)
# Implementation in QPytorch
quant_vals=[]
for sign in [-1,1]:
exp_offset=(2**(exp-1))
for e in np.arange(0,2**(exp)):
m_step=2**(-man)
for m in np.arange(1,2,m_step):
# No denormalized numbers
if e==0:
v=0
else:
v=sign*2.**(e-exp_offset)*m
#print(f"sign: {sign} \texp:{e-exp_offset}\t man:{m}\tvalue:{v}")
quant_vals.append(v)
print("Recreation of values representable in QPytorch")
print(list(np.sort(quant_vals)))
if __name__ == '__main__':
test_float()
Assert input type is single precision floating point.
I wouldn't call this an issue. It's more a question and I'm not sure where else to ask. Does sequential_lower also quantize the layer weights or does it only quantize the outputs of corresponding layers? My particular use case would be to train a network in full precision and then convert the model to low precision for inference. I know I can manually convert the weights myself, but it would be nice if sequential_lower did that. By looking at the code and printing out weights, it looks like it doesn't but please correct me if I'm wrong. Thanks in advance for your reply.
Hello, thanks for your great work! Recently, I met a problem about the low precision training. When I was training the IBM8 model, I found that using the quantization method will take more time than that not using the quantization method. The time of training the vgg16 model with low precision per epoch is 14 seconds, but the time of training the model with full precision is only 8 seconds! I wonder have you ever noticed this metric or something I am wrong?
Hi! I've encountered an issue when importing this package on certain CentOS machines.
Installing qtorch works fine, but whenever I try to import a particular function, such as
from qtorch.quant import fixed_point_quantize
I get the following long error:
!! WARNING !!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Your compiler (c++) is not compatible with the compiler Pytorch was
built with for this platform, which is g++ on linux. Please
use g++ to to compile your extension. Alternatively, you may
compile PyTorch from source using c++, and then you can also use
c++ to compile your extension.
See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
with compiling PyTorch from source.
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!! WARNING !!
Then none of the library's functions work and they'll result in a segfault+crash. Any idea how this might be fixed? On my ubuntu machine this error does not come up, but for certain experiments I'm running it has to be on the CentOS systems so there's no way to avoid this error for me.
Thank you!
Add a kernel that quantizes a floating point number into arbitrary length of exponent and mantissa.
Test:
I just tried to run the sample fixed_baseline and got Segmentation Fault!
Any thoughts?
which: no nvcc in (xxx/.linuxbrew/bin:xxx/.linuxbrew/sbin:xxx/anaconda3/bin:xxx/anaconda3/bin:xxx/anaconda3/condabin:xxx/.local/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/data/shared/Tools/bin)
Loading dataset CIFAR10 from ./data
Files already downloaded and verified
Files already downloaded and verified
weight : FixedPoint (wl=12, fl=12)
activate : FloatingPoint (exponent=8, mantissa=23)
grad : FixedPoint (wl=12, fl=12)
error : FloatingPoint (exponent=8, mantissa=23)
momentum : FixedPoint (wl=12, fl=12)
Model: VGG16LP
sample_scripts/fixed_baseline.sh: line 20: 48552 Segmentation fault (core dumped) python train.py --dataset CIFAR10 --data_path ./data --model VGG16LP --epochs=200 --lr_init=0.05 --wd=5e-4 --wl-weight 12 --wl-grad 12 --wl-momentum 12 --fl-weight 12 --fl-grad 12 --fl-momentum 12 --seed 100 --batch_size 128 --weight-rounding stochastic --grad-rounding stochastic --momentum-rounding stochastic --weight-type fixed --grad-type fixed --momentum-type fixed
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.