xilinx / brevitas Goto Github PK

View Code? Open in Web Editor NEW

1.1K 34.0 175.0 19.97 MB

Brevitas: neural network quantization in PyTorch

Home Page: https://xilinx.github.io/brevitas/

License: Other

Python 85.00% Makefile 0.03% Batchfile 0.04% C++ 0.27% Jupyter Notebook 14.62% Shell 0.04%

quantization pytorch brevitas fpga neural-networks hardware-acceleration xilinx deep-learning ptq qat

brevitas's Introduction

Brevitas

Brevitas is a PyTorch library for neural network quantization, with support for both post-training quantization (PTQ) and quantization-aware training (QAT).

Please note that Brevitas is a research project and not an official Xilinx product.

If you like this project please consider ⭐ this repo, as it is the simplest and best way to support it.

Requirements

Python >= 3.8.
Pytorch >= 1.9.1, <= 2.1 (more recent versions would be untested).
Windows, Linux or macOS.
GPU training-time acceleration (Optional but recommended).

Installation

You can install the latest release from PyPI:

pip install brevitas

Getting Started

Brevitas currently offers quantized implementations of the most common PyTorch layers used in DNN under brevitas.nn, such as QuantConv1d, QuantConv2d, QuantConvTranspose1d, QuantConvTranspose2d, QuantMultiheadAttention, QuantRNN, QuantLSTM etc., for adoption within PTQ and/or QAT. For each one of these layers, quantization of different tensors (inputs, weights, bias, outputs, etc) can be individually tuned according to a wide range of quantization settings.

As a reference for PTQ, Brevitas provides an example user flow for ImageNet classification models under brevitas_examples.imagenet_classification.ptq that quantizes an input torchvision model using PTQ under different quantization configurations (e.g. bit-width, granularity of scale, etc).

For more info, checkout our getting started guide.

Cite as

If you adopt Brevitas in your work, please cite it as:

@software{brevitas,
  author       = {Alessandro Pappalardo},
  title        = {Xilinx/brevitas},
  year         = {2023},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.3333552},
  url          = {https://doi.org/10.5281/zenodo.3333552}
}

History

2024/02/19 - Minor release version 0.10.2, see the release notes.
2024/02/15 - Minor release version 0.10.1, see the release notes.
2023/12/08 - Release version 0.10.0, see the release notes.
2023/04/28 - Minor release version 0.9.1, see the release notes.
2023/04/21 - Release version 0.9.0, see the release notes.
2023/01/10 - Release version 0.8.0, see the release notes.
2021/12/13 - Release version 0.7.1, fix a bunch of issues. Added TVMCon 2021 tutorial notebook.
2021/11/03 - Re-release version 0.7.0 (build 1) on PyPI to fix a packaging issue.
2021/10/29 - Release version 0.7.0, see the release notes.
2021/06/04 - Release version 0.6.0, see the release notes.
2021/05/24 - Release version 0.5.1, fix a bunch of minor issues. See release notes.
2021/05/06 - Release version 0.5.0, see the release notes.
2021/03/15 - Release version 0.4.0, add support for __torch_function__ to QuantTensor.
2021/03/04 - Release version 0.3.1, fix bug w/ act initialization from statistics w/ IGNORE_MISSING_KEYS=1.
2021/03/01 - Release version 0.3.0, implements enum and shape solvers within extended dependency injectors. This allows declarative quantizers to be self-contained.
2021/02/04 - Release version 0.2.1, includes various bugfixes of QuantTensor w/ zero-point.
2021/01/30 - First release version 0.2.0 on PyPI.

brevitas's People

Contributors

Stargazers

Watchers

Forkers

huamulou bedawn92 f18298335152h bailin7134 lihao0214 pandinosaurus r1marcus gsr-ee waterbearbee nickfraser andrearigoni ml-lab volcacius dreamgang powderluv giuseppe5 luyanfcp lzr9926 james750267 seeker1943 samux87 vishwas1234567 liuguoyou qazi0 maryamsana-1998 jupiterethan andylai0212 robertdigital xrick robertluobo lilujunai meelement mahinlma guoyi0 qingzengsong sunilsurineni maltanar entn-at faustpy fenling gitter-badger quetric auphelia rick1chen dbofseuofhust dlphil tiantian-han solitary-1 yongbo-yu snowygoose e-dupuis mkolod jiangyancao heborras lyun-huang smsskil parthpatel-es takeshi30rou wei-ao marenan xavier0 koodg123 vfdev-5 wenlong-qi meowdla deepware-ai jawaechan erronknight fpjentzsch ntzzc niiice spontaneousduck jerhill xxxgp 13301338176 dovedx henrikchentw rgb000000 zhuangzhuangwu bbalimbatuhan yzw-explorer erintruax maxpark max2ma jmduarte jaysunl dtczhl stevenokm kueyos miziworld qingyang-jiandao derpda hxl1990 erikafujita haojunyong krishnakumar1995 yiweichen04 michaela1224 pfduraes atomeyang

brevitas's Issues

Bug when set ENABLE_BIAS_QUANT = True

Hi @volcacius , thanks for ur implementation. I got a bug when set ENABLE_BIAS_QUANT = True. Can u help me debug ?.

/usr/local/lib/python3.6/dist-packages/brevitas/nn/quant_conv.py in forward(self, input)
    224 
    225         if self.compute_output_bit_width:
--> 226             assert input_bit_width is not None
    227             output_bit_width = self.max_output_bit_width(input_bit_width, quant_weight_bit_width)
    228         if self.compute_output_scale:

AssertionError:

Optimizer with QuantTensor Error

Hello, Thank you for awesome tool!!
I'm working on combining DARTs [ICLR 2019] with the brevitas.
Then it seems like QuantTensor is not optimized by the various useful optimizer in torch (torch.optim.Adam or SGD and so on, plz see following error)

TypeError: optimizer can only optimize Tensors, but one of the params is brevitas.quant_tensor.QuantTensor

Is there a way to learn QuantTensor with the Optimizer?

and my code is:

44     net_crit = nn.CrossEntropyLoss().to(device)                                          
 45     model = SearchCNNController(C_in=input_channels,                                     
 46     ¦   ¦   ¦   ¦   ¦   ¦   ¦   C=config.init_channels,                                  
 47     ¦   ¦   ¦   ¦   ¦   ¦   ¦   n_classes=n_classes,                                     
 48     ¦   ¦   ¦   ¦   ¦   ¦   ¦   n_layers=config.layers,                                  
 49     ¦   ¦   ¦   ¦   ¦   ¦   ¦   criterion=net_crit,                                      
 50     ¦   ¦   ¦   ¦   ¦   ¦   ¦   n_nodes = config.n_nodes,                                
 51     ¦   ¦   ¦   ¦   ¦   ¦   ¦   device_ids=config.gpus,                                  
 52     ¦   ¦   ¦   ¦   ¦   ¦   ¦   imagenet_mode=config.dataset.lower() in utils.LARGE_DATAS    ETS,                                                                                     
 53     ¦   ¦   ¦   ¦   ¦   ¦   ¦   quant=config.quant)                                      
 54     model = model.to(device)                                                             
 55                                                                                          
 56     # weights optimizer                                                                  
 57     w_optim = torch.optim.SGD(model.weights(), config.w_lr, momentum=config.w_momentum,  
 58     ¦   ¦   ¦   ¦   ¦   ¦   ¦ weight_decay=config.w_weight_decay)                        
 59     # alphas optimizer                                                                   
 60     alpha_optim = torch.optim.Adam(model.alphas(), config.alpha_lr, betas=(0.5, 0.999),  
 61     ¦   ¦   ¦   ¦   ¦   ¦   ¦   ¦  weight_decay=config.alpha_weight_decay)

Create installable package for brevitas examples

Per title. This would also help with end2end testing.

Power of two scaling float to int

Currently power of two scaling floors the floating point scale factors. This is not a smart choice and rounding should be performed instead. To preserve backward compatibility, an option to control the float to int transformation should be introduced.

Enforcing style and formatting

Currently we are not checking PEP8 compliance in any way, nor enforcing formatting. I really appreciate the fact that Black enforce a single uniform style, but i cannot accept dangling parentheses, it's not something i can look at every day. So yapf should be explored as an option.
Additionally, pre commit hooks should be setup to perform import reordering, as well as the usual set of checks around newlines etc.

How can I get INT8 weight in my model?

Hi there,
I'm trying to train a vgg16 model (use the vgg16 provided from brevitas/examples/imagenet_classification/models/vgg.py, and setting is following thecommon.py) on our own datasets, and the model has trained well.
I look the code in brevitas/examples/imagenet_classification/models/common.py
and find line7 QUANT_TYPE = QuantType.INT
Can I regard the weight in qnn.QuantConv2d will be INT type?
But when I load the model.pt(my saving model weight, using torch.load), I get the weight just like this:

How can I get the INT8 weights in my models, and how to use the weight I got to do inference on FPGA? Just directly port my weight to my VGG design on FPGA or I need to add some scaling step or something...?

Questions about model storage

Hello:
How to save the model after quantification?
Using torch.save(model, 'model.pth') will report an error, how can I solve it?

How do I create "quant_melgan_8b_generator-8fe7e01f.pth" file

Hi @Giuseppe5 and @volcacius

I tried text_to_speech with your PR on colab, but I can't find the way to create quant 8bit in origin pretrained of repo. Can you help me?
Thanks so much.

Get rid of zero_hw_sentinel

Create a StatelessConst module rather than having zero_hw_sentinel being moved around.

how to verify the weights are INT8?

hi, I directly replace the Conv2D and Relu in my model with qnn.QuantConv2d and qnn.QuantReLU. However, after training, when I read the weights, it seems still NOT type INT8. Can you help me out of it?

Deal with current overloading of name "scale" accross the repo

Current naming clash between actual scale factor and its learned component is the source of confusion. Rename it without breaking compatibility with pretrained models.

Quantization for speeding up

Hi, thanks for your code and could you help me with the following question? I have incorporated your provided layers to a DenseUNET model, I have:

conv = qnn.QuantConv2d(in_channels=params['num_channels'], out_channels=params['num_filters'],
kernel_size=(
params['kernel_h'], params['kernel_w']),
padding=(padding_h, padding_w),
stride=params['stride_conv'],
weight_quant_type=QuantType.INT,
weight_bit_width=8)

batchnorm = qnn.BatchNorm2dToQuantScaleBias(num_features=params['num_channels'],
weight_quant_type=QuantType.INT,
weight_bit_width=8)

relu = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=8, max_val=6)

sigmoid = qnn.QuantSigmoid(bit_width=8, quant_type=QuantType.INT)

Those functions replaced by qnn, I did not change anything else, the model can be successfully trained but seemed the running time with GPU and CPU is actually slower than pytorch nn implementation. Did I do anything wrong? Should the model have the speed up the training and inference about 4x times?

Runtime error when move the quantize network to GPU

Hi, when I trained the QNN with CPU, it performed normally.
However, when I using GPU to train, the Runtime error raises:

I define the model as follow:

class QuantLeNet(Module):
		def __init__(self):
				super(QuantLeNet, self).__init__()
				self.conv1 = qnn.QuantConv2d(1, 6, 5, 
																		weight_quant_type=QuantType.INT, 
																		weight_bit_width=6,bias_quant_type = QuantType.INT,bias_bit_width = 10,compute_output_scale = True,
																		compute_output_bit_width = True ,weight_restrict_scaling_type=RestrictValueType.POWER_OF_TWO)
				self.relu1 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=4, max_val=6, return_quant_tensor = True, restrict_scaling_type=RestrictValueType.POWER_OF_TWO )
				self.conv2 = qnn.QuantConv2d(6, 16, 5, 
																		weight_quant_type=QuantType.INT, 
																		weight_bit_width=6,bias_quant_type = QuantType.INT,bias_bit_width = 10,compute_output_scale = True,
																		compute_output_bit_width = True, weight_restrict_scaling_type=RestrictValueType.POWER_OF_TWO )
				self.relu2 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=4, max_val=6, return_quant_tensor = True, restrict_scaling_type=RestrictValueType.POWER_OF_TWO )
				self.fc1	 = qnn.QuantLinear(16*5*5, 200, bias=True,
																		weight_quant_type=QuantType.INT, 
																		weight_bit_width=6,bias_quant_type = QuantType.INT,bias_bit_width = 10,compute_output_scale = True,
																		compute_output_bit_width = True, weight_restrict_scaling_type=RestrictValueType.POWER_OF_TWO )
				self.relu3 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=4, max_val=6, return_quant_tensor = True, restrict_scaling_type=RestrictValueType.POWER_OF_TWO )
				self.fc2	 = qnn.QuantLinear(200, 84, bias=True, 
																		weight_quant_type=QuantType.INT, 
																		weight_bit_width=6,bias_quant_type = QuantType.INT,bias_bit_width = 10,compute_output_scale = True,
																		compute_output_bit_width = True, weight_restrict_scaling_type=RestrictValueType.POWER_OF_TWO )
				self.relu4 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=4, max_val=6, return_quant_tensor = True, restrict_scaling_type=RestrictValueType.POWER_OF_TWO )
				self.fc3	 = qnn.QuantLinear(84, 10, bias=False, 
																		weight_quant_type=QuantType.INT, 
																		weight_bit_width=6,bias_quant_type = QuantType.INT,bias_bit_width = 10,compute_output_scale = True,
																		compute_output_bit_width = True, weight_restrict_scaling_type=RestrictValueType.POWER_OF_TWO )

		def forward(self, x):
				out = self.conv1(x)
				relu1_tensor, relu1_scale, relu1_bit = self.relu1(out)
				out = F.max_pool2d(relu1_tensor, 2)
				out = self.conv2(pack_quant_tensor(out,relu1_scale,relu1_bit))
				relu2_tensor, relu2_scale, relu2_bit = self.relu2(out)
				out = F.max_pool2d(relu2_tensor, 2)
				out = out.view(out.size(0), -1)
				out = self.fc1(pack_quant_tensor(out,relu2_scale,relu2_bit))
				relu3_tensor, relu3_scale, relu3_bit = self.relu3(out)
				out = self.fc2(pack_quant_tensor(relu3_tensor, relu3_scale, relu3_bit))
				relu4_tensor, relu4_scale, relu4_bit = self.relu4(out)
				out = self.fc3(pack_quant_tensor(relu4_tensor, relu4_scale, relu4_bit))
				return out

Fix affine stats

Affine stats should be applied after e.g. taking the log, not before, and absolute values should be taken separately over the two coefficients.

Question about training

Hi,

I am trying to train a new model using Brevitas, however, I don't seem to find any example or documentation how to actually train a network.
Using the standard PyTorch method yields errors about size mismatches.

RuntimeError: size mismatch, m1: [96 x 32], m2: [3072 x 2048] at /Users/distiller/project/conda/conda-bld/pytorch_1579022061893/work/aten/src/TH/generic/THTensorMath.cpp:136

class QuantNet(Module):
    def __init__(self):
        super(QuantNet, self).__init__()
        self.fc1   = qnn.QuantLinear(3072, 2048, bias=False, 
                                     weight_quant_type=QuantType.INT, 
                                     weight_bit_width=8)
        self.relu1 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=2, max_val=6)
        
        self.fc2   = qnn.QuantLinear(2048, 1024, bias=False, 
                                     weight_quant_type=QuantType.INT, 
                                     weight_bit_width=2)
        self.relu2 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=4, max_val=6)
        
        self.fc3   = qnn.QuantLinear(1024, 512, bias=False, 
                                     weight_quant_type=QuantType.INT, 
                                     weight_bit_width=2)
        self.relu3 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=4, max_val=6)
        
        self.fc4   = qnn.QuantLinear(512, 256, bias=False, 
                                     weight_quant_type=QuantType.INT, 
                                     weight_bit_width=2)
        self.relu4 = qnn.QuantReLU(quant_type=QuantType.INT, bit_width=4, max_val=6)
        
        self.fc5   = qnn.QuantLinear(256, 10, bias=False, 
                                     weight_quant_type=QuantType.INT, 
                                     weight_bit_width=8)

    def forward(self, x):
        out = x.view(32*32*3*1)
        out = self.relu1(self.fc1(x))
        out = self.relu2(self.fc2(out))
        out = self.relu3(self.fc3(out))
        out = self.relu4(self.fc4(out))
        out = self.fc5(out)
        return out

transform = transforms.Compose([transforms.ToTensor()])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=1,
                                          shuffle=True, num_workers=8)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=1,
                                         shuffle=False, num_workers=8)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        
        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

MAC clamping

A major feature currently missing from Brevitas is the ability to clamp an accumulator to a specified bit width during in an operation such as conv2d or linear.
This is an issue to explore progresses on this. Current plan is to leverage CUTLASS to do it for anything that maps to GEMM, meaning linear, 1x1 conv, img to col conv.

Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults

line 111, in init
weight_quant_type=QuantType.INT, weight_bit_width=8)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/nn/quant_conv.py", line 179, in init
override_pretrained_bit_width=weight_override_pretrained_bit_width)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/proxy/parameter_quant.py", line 357, in init
self.re_init_tensor_quant()
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/proxy/parameter_quant.py", line 360, in re_init_tensor_quant
self.tensor_quant = self.lazy_tensor_quant_init(tracked_parameter_list=self._tracked_parameter_list)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/proxy/parameter_quant.py", line 146, in _weight_quant_init_impl
affine=scaling_impl_type == ScalingImplType.AFFINE_STATS)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1386, in init_then_register
original_init(self, *args, **kwargs)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/core/scaling.py", line 246, in init
stats_output_shape=stats_output_shape)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1386, in init_then_register
original_init(self, *args, **kwargs)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/core/scaling.py", line 171, in init
self.restrict_scaling = RestrictValue(restrict_scaling_type, FloatToIntImplType.CEIL, scaling_min_val)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1386, in init_then_register
original_init(self, *args, **kwargs)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/core/restrict_val.py", line 82, in init
float_to_int_impl = CeilSte()
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1391, in init_then_register
_create_methods_from_stubs(self, methods)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1347, in _create_methods_from_stubs
self._c._create_methods(self, defs, rcbs, defaults)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1010, in _try_compile_fn
return _compile_function(fn, qualified_name=qualified_name, _frames_up=1, _rcb=rcb)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1077, in _compile_function
script_fn = torch._C._jit_script_compile(qualified_name, ast, _rcb, get_default_args(fn))
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/init.py", line 1058, in _compile_and_register_class
ast = get_jit_class_def(obj, obj.name)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/frontend.py", line 143, in get_jit_class_def
self_name=self_name) for method in methods]
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/frontend.py", line 143, in
self_name=self_name) for method in methods]
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/frontend.py", line 166, in get_jit_def
return build_def(ctx, py_ast.body[0], type_line, self_name)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/frontend.py", line 195, in build_def
param_list = build_param_list(ctx, py_def.args, self_name)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/jit/frontend.py", line 215, in build_param_list
raise ValueError(_vararg_kwarg_err)
ValueError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults

Extend QuantTensor

This is an issue to track extensions to the implementation of QuantTensor.

Attributed that should be added:

signed/unsigned bool
quantized/dequantized bool
zero point (which so far is always at 0)

Methods that should be added:

convert to a pytorch quantized tensor

Extract the bias scale and activation scale?

Hi, thanks for the great work. I have few questions:
How can I

extract the actvation scale such as QuantReLU? QuantReLU.quant_act_scale seems not works.
extract the bias scale in QuantConv2d?

Question about weight_scaling_impl_type

Hi, thanks for awesome work!
Could you give a short explanation about the various strategies in weight_scaling_impl_type.
For example, "AFFINE_STATS","STATS","PARAMETER_FROM_STATS".

simple example?

i am trying to use this code but no idea how to start. so can you provide some simple examples?

Some question about how to inference on real INT8 weight

Hi there,
First, I have trained a net and I know that I can access the real INT8 weight of model by using property int_weight and quant_weight_scale to extract the integer weights and their scale factor, activations have method quant_act_scale() to extract the scale factor.
However, I have no idea how to use it to do real INT8 inference.
Can anyone give me a toy example how to use those parameters to do inference?

Pretrained model in example folder

Hi,
Thanks for the great work. I have few questions :

Does the pretrained models in example folder (mobilenet, vgg) fine-tune from floating point model or just train from scratch? And I am wonder how much would it takes for fine-tuning/training? If there is a example training codes would be appreciated.

load pre-train model from original mobilenet

Hi Alessandro,

Thank you so much for the amazing work. I have a question that I have changed pytorch official mobilenet-v2 code to quantization version by referring to your mobilenet-v1 code and is there any possibility that I could load the pre-train model (original mobilenet-v2 version) pytorch official released (which is a float version) to my quantization mobilenet-v2 so as to finetune from it?

Thank you so much!

Best,
Tracy

quantize output of conv layer

hello, for FPGA implementation, we need the output of convolution layer to be fixed point.

I notice that at Line 115 and 116 of "brevitas/nn/quant_conv.py", two parameters of QConv2D: compute_output_scale and compute_output_bit_width.
So can I tune these two parameters to quantize the output of conv layer?

Add support for PReLU

Add support for PReLU.

End to end tests

End to end tests should be performed on the available pretrained models. This has a dependency on the pretrained models being installable (#76), as well as on the test infrastructure being parametrized (#77).
Test vectors should be generated based on a few examples. Weights and intermediate activations (both quantized and dequantized) and scale factors should be checked against. Values can be captured through hooks.
Test vectors should be stored as a release, tagged depending on the pretrained model, and possibly cached efficiently through Github Actions cache.
Appropriate tasks should be setup through Nox.

Question about inferring

When inferring, is the type of weight INT?

Quantize/de-quantize inputs/outputs & scaling loss?

Hi,

Is it required to quantize inputs or dequantize outputs before passing it to the loss function? Also, does any kind of scaling needs to be performed for the loss function?

Meet

TVM export

Track discussion around TVM export. Nothing is planned at the moment, so no assignees. Main reference is https://github.com/pytorch/tvm .

Standard ONNX export

Track discussions around ONNX compliant export. Nothing is planned at the moment, so no assignees.

Recurrent layers

This is an issue to track discussions around quantized RNN/GRU/LSTM implementation.

Question about output

Hello
The weight has been quantified to INT, but why is the output of each layer still float? Shouldn't it be INT?
I look forward to your reply.

ModuleNotFoundError: No module named 'brevitas.function.ops_ste'

I clone your brevitas repo and run pip install -e . but the setup.py file does not generate ops_ste.py file from ops_ste.py.template,I can not find ops_ste.py under brevitas/function/ and get this problem ,what is wrong with it ?

LSTM with brevitas

Hi,
I am currently trying to implement an LSTM on FPGA using the previous version of FINN.
Having seen that the new version of FINN will use networks trained with brevitas, I was wondering if brevitas will support LSTM networks in the future or not.
Thank you,

Phillip Geier

Question about quantification

Hello:
      Please ask you the following questions:
      I quantified a network and got the file (.pth).
Question 1:
      The weights saved in this file are still float, not INT. When I access the property (QuantConv2d.int_weight), I can extract the integer representation of the weight. When inference, the first step is to convert the weight data type (float to INT). Can I understand this?
Question 2:
      Can you tell us about your quantification principle?

Add max_value to RestrictValue

Right now there is no constraint on the possible max value that the scaling factor may assume.
It may be useful to add a max_value implementation similar to the current min_value one.

merge_bn_in() in QuantConv2d

Hi,
I found that there is merge_bn_in() in QuantConv2d. How can I use it for merging bn factors into QuantConv2d?
Do I just use nn.BatchNorm2d after qnn.QuantConv2d durning training and then merge them in testing phase? Also, does the scale factor in QuantConv2d changed after merging bn?

Template tests.yaml

Current tests.yaml should be splitted into multiples actions, to test e.g. develop installation separately from the test suite, with separate matrix parametrization. Given that a lot of the logic around cache creation would be shared, a python driven flow for templating it should be setup. noxfile.py has to be able to read the template in order to match the matrix configuration that would be generate for different scenarios.

How should I give max_val in QuantReLU

Hi,
Thanks for sharing the awesome code. I have a question about the max_val parameter in the QuantReLU. In the LeNet example and examples/imagenet_classification/models/common.py, it is set to 6. I was wondering if I want to construct my own net on image, or change bit_width from 8 to 4, do I need to change this parameter?

Thanks,

Implement Q1.7 format

Reposted as a new issue. From the author:
Hi, I want to implement 8-bit fixed-point implementation of weights and activations with 7 bits for the fractional part. What value should I pass for weight_restrict_scaling_type?

Originally posted by @MinahilRaza in #4 (comment)

Move enums out of core

Start getting pushing up enums by moving any dependency on them out of the core package.

Document build flow on Windows

Clarify compiler dependencies on Windows when Pytorch >= 1.3.0.

DecoupledIntQuant

This is an issue to track progresses on refactoring weight scaling into normalization/prescaling and post scaling.

avg_pool cause some error

out, scale, bit_width = self.final_pool(out)

File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/nn/quant_avg_pool.py", line 92, in forward
output_bit_width = self.max_output_bit_width(input_bit_width)
File "/root/anaconda3/envs/python367/lib/python3.6/site-packages/brevitas/nn/quant_avg_pool.py", line 99, in max_output_bit_width
max_uint_input = max_uint(bit_width=input_bit_width, narrow_range=False)
RuntimeError: max_uint() Expected a value of type 'Tensor' for argument 'bit_width' but instead found type 'NoneType'.
Position: 1
Value: None
Declaration: max_uint(bool narrow_range, Tensor bit_width) -> (Tensor)

pytorch1.2
python3.6
when try this code
self.final_pool = make_quant_avg_pool(kernel_size=7,
stride=1,
signed=False,
bit_width=4)

Migration of FINN-Brevitas tests into Brevitas

This is an issue to track discussion around migrating Brevitas based tests in FINN into Brevitas. It should be on Brevitas to check that nothing breaks during the FINN export flow. Additionally, end to end inference tests on LFC and CNV should be moved together with the rest of end to end tests (#83).
The main issue is around how to setup FINN test enviroment within Github Actions.

Tagging: @maltanar

Test and document conversion between different scale factors

Test and document all the various alternatives around switching between different scale factors.

BN quantization

Hi, thanks for the great work.
For BN layer, is it possible to merge the parameter in BN layer to conv layer during inferenece time?
I noticed that there is a BatchNorm2dToQuantScaleBias layer in this repo, how can I use it for BN quantization?

compute_output_bit_width optional with bias quantization

Disable mandatory compute_output_bit_width for bias quantization when bit width of bias is manually specified

What's wrong?

C:\Users\86188\brevitas>pip install -e .
Obtaining file:///C:/Users/86188/brevitas
Requirement already satisfied: torch>=1.1.0 in d:\programdata\anaconda346\lib\site-packages (from Brevitas==0.2.0a0) (1.3.0)
Requirement already satisfied: docrep in d:\programdata\anaconda346\lib\site-packages (from Brevitas==0.2.0a0) (0.2.7)
Requirement already satisfied: scipy in d:\programdata\anaconda346\lib\site-packages (from Brevitas==0.2.0a0) (1.0.0)
Requirement already satisfied: packaging in d:\programdata\anaconda346\lib\site-packages (from Brevitas==0.2.0a0) (16.8)
Requirement already satisfied: numpy in d:\programdata\anaconda346\lib\site-packages (from torch>=1.1.0->Brevitas==0.2.0a0) (1.14.0)
Requirement already satisfied: six in d:\programdata\anaconda346\lib\site-packages (from docrep->Brevitas==0.2.0a0) (1.11.0)
Requirement already satisfied: pyparsing in d:\programdata\anaconda346\lib\site-packages (from packaging->Brevitas==0.2.0a0) (2.2.0)
Installing collected packages: Brevitas
Running setup.py develop for Brevitas
ERROR: Command errored out with exit status 1:
command: 'd:\programdata\anaconda346\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\Users\86188\brevitas\setup.py'"'"'; file='"'"'C:\Users\86188\brevitas\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' develop --no-deps
cwd: C:\Users\86188\brevitas
Complete output (18 lines):
信息: 用提供的模式无法找到文件。
running develop
running egg_info
writing Brevitas.egg-info\PKG-INFO
writing dependency_links to Brevitas.egg-info\dependency_links.txt
writing requirements to Brevitas.egg-info\requires.txt
writing top-level names to Brevitas.egg-info\top_level.txt
reading manifest file 'Brevitas.egg-info\SOURCES.txt'
writing manifest file 'Brevitas.egg-info\SOURCES.txt'
running build_ext
building 'brevitas._C' extension
d:\programdata\anaconda346