Git Product home page Git Product logo

qkeras's Introduction

QKeras

github.com/google/qkeras

Introduction

QKeras is a quantization extension to Keras that provides drop-in replacement for some of the Keras layers, especially the ones that creates parameters and activation layers, and perform arithmetic operations, so that we can quickly create a deep quantized version of Keras network.

According to Tensorflow documentation, Keras is a high-level API to build and train deep learning models. It's used for fast prototyping, advanced research, and production, with three key advantages:

  • User friendly

Keras has a simple, consistent interface optimized for common use cases. It provides clear and actionable feedback for user errors.

  • Modular and composable

Keras models are made by connecting configurable building blocks together, with few restrictions.

  • Easy to extend

Write custom building blocks to express new ideas for research. Create new layers, loss functions, and develop state-of-the-art models.

QKeras is being designed to extend the functionality of Keras using Keras' design principle, i.e. being user friendly, modular and extensible, adding to it being "minimally intrusive" of Keras native functionality.

In order to successfully quantize a model, users need to replace variable creating layers (Dense, Conv2D, etc) by their counterparts (QDense, QConv2D, etc), and any layers that perform math operations need to be quantized afterwards.

Publications

  • Claudionor N. Coelho Jr, Aki Kuusela, Shan Li, Hao Zhuang, Jennifer Ngadiuba, Thea Klaeboe Aarrestad, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Sioni Summers, "Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors", Nature Machine Intelligence (2021), https://www.nature.com/articles/s42256-021-00356-5

  • Claudionor N. Coelho Jr., Aki Kuusela, Hao Zhuang, Thea Aarrestad, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Sioni Summers, "Ultra Low-latency, Low-area Inference Accelerators using Heterogeneous Deep Quantization with QKeras and hls4ml", http://arxiv.org/abs/2006.10159v1

  • Erwei Wang, James J. Davis, Daniele Moro, Piotr Zielinski, Claudionor Coelho, Satrajit Chatterjee, Peter Y. K. Cheung, George A. Constantinides, "Enabling Binary Neural Network Training on the Edge", https://arxiv.org/abs/2102.04270

Layers Implemented in QKeras

  • QDense

  • QConv1D

  • QConv2D

  • QDepthwiseConv2D

  • QSeparableConv1D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QSeparableConv2D (depthwise + pointwise convolution, without quantizing the activation values after the depthwise step)

  • QMobileNetSeparableConv2D (extended from MobileNet SeparableConv2D implementation, quantizes the activation values after the depthwise step)

  • QConv2DTranspose

  • QActivation

  • QAdaptiveActivation

  • QAveragePooling2D (in fact, an AveragePooling2D stacked with a QActivation layer for quantization of the result)

  • QBatchNormalization (is still in its experimental stage, as we have not seen the need to use this yet due to the normalization and regularization effects of stochastic activation functions.)

  • QOctaveConv2D

  • QSimpleRNN, QSimpleRNNCell

  • QLSTM, QLSTMCell

  • QGRU, QGRUCell

  • QBidirectional

It is worth noting that not all functionality is safe at this time to be used with other high-level operations, such as with layer wrappers. For example, Bidirectional layer wrappers are used with RNNs. If this is required, we encourage users to use quantization functions invoked as strings instead of the actual functions as a way through this, but we may change that implementation in the future.

A first attempt to create a safe mechanism in QKeras is the adoption of QActivation is a wrap-up that provides an encapsulation around the activation functions so that we can save and restore the network architecture, and duplicate them using Keras interface, but this interface has not been fully tested yet.

Activation Layers Implemented in QKeras

  • smooth_sigmoid(x)

  • hard_sigmoid(x)

  • binary_sigmoid(x)

  • binary_tanh(x)

  • smooth_tanh(x)

  • hard_tanh(x)

  • quantized_bits(bits=8, integer=0, symmetric=0, keep_negative=1)(x)

  • bernoulli(alpha=1.0)(x)

  • stochastic_ternary(alpha=1.0, threshold=0.33)(x)

  • ternary(alpha=1.0, threshold=0.33)(x)

  • stochastic_binary(alpha=1.0)(x)

  • binary(alpha=1.0)(x)

  • quantized_relu(bits=8, integer=0, use_sigmoid=0, negative_slope=0.0)(x)

  • quantized_ulaw(bits=8, integer=0, symmetric=0, u=255.0)(x)

  • quantized_tanh(bits=8, integer=0, symmetric=0)(x)

  • quantized_po2(bits=8, max_value=-1)(x)

  • quantized_relu_po2(bits=8, max_value=-1)(x)

The stochastic_* functions, bernoulli as well as quantized_relu and quantized_tanh rely on stochastic versions of the activation functions. They draw a random number with uniform distribution from _hard_sigmoid of the input x, and result is based on the expected value of the activation function. Please refer to the papers if you want to understand the underlying theory, or the documentation in qkeras/qlayers.py.

The parameters "bits" specify the number of bits for the quantization, and "integer" specifies how many bits of "bits" are to the left of the decimal point. Finally, our experience in training networks with QSeparableConv2D, both quantized_bits and quantized_tanh that generates values between [-1, 1), required symmetric versions of the range in order to properly converge and eliminate the bias.

Every time we use a quantization for weights and bias that can generate numbers outside the range [-1.0, 1.0], we need to adjust the *_range to the number. For example, if we have a quantized_bits(bits=6, integer=2) in a weight of a layer, we need to set the weight range to 2**2, which is equivalent to Catapult HLS ac_fixed<6, 3, true>. Similarly, for quantization functions that accept an alpha parameter, we need to specify a range of alpha, and for po2 type of quantizers, we need to specify the range of max_value.

Example

Suppose you have the following network.

An example of a very simple network is given below in Keras.

from keras.layers import *

x = x_in = Input(shape)
x = Conv2D(18, (3, 3), name="first_conv2d")(x)
x = Activation("relu")(x)
x = SeparableConv2D(32, (3, 3))(x)
x = Activation("relu")(x)
x = Flatten()(x)
x = Dense(NB_CLASSES)(x)
x = Activation("softmax")(x)

You can easily quantize this network as follows:

from keras.layers import *
from qkeras import *

x = x_in = Input(shape)
x = QConv2D(18, (3, 3),
        kernel_quantizer="stochastic_ternary",
        bias_quantizer="ternary", name="first_conv2d")(x)
x = QActivation("quantized_relu(3)")(x)
x = QSeparableConv2D(32, (3, 3),
        depthwise_quantizer=quantized_bits(4, 0, 1),
        pointwise_quantizer=quantized_bits(3, 0, 1),
        bias_quantizer=quantized_bits(3),
        depthwise_activation=quantized_tanh(6, 2, 1))(x)
x = QActivation("quantized_relu(3)")(x)
x = Flatten()(x)
x = QDense(NB_CLASSES,
        kernel_quantizer=quantized_bits(3),
        bias_quantizer=quantized_bits(3))(x)
x = QActivation("quantized_bits(20, 5)")(x)
x = Activation("softmax")(x)

The last QActivation is advisable if you want to compare results later on. Please find more cases under the directory examples.

QTools

The purpose of QTools is to assist hardware implementation of the quantized model and model energy consumption estimation. QTools has two functions: data type map generation and energy consumption estimation.

  • Data Type Map Generation: QTools automatically generate the data type map for weights, bias, multiplier, adder, etc. of each layer. The data type map includes operation type, variable size, quantizer type and bits, etc. Input of the QTools is:
  1. a given quantized model;
  2. a list of input quantizers for the model. Output of QTools json file that list the data type map of each layer (stored in qtools_instance._output_dict) Output methods include: qtools_stats_to_json, which is to output the data type map to a json file; qtools_stats_print which is to print out the data type map.
  • Energy Consumption Estimation: Another function of QTools is to estimate the model energy consumption in Pico Joules (pJ). It provides a tool for QKeras users to quickly estimate energy consumption for memory access and MAC operations in a quantized model derived from QKeras, especially when comparing power consumption of two models running on the same device.

As with any high-level model, it should be used with caution when attempting to estimate the absolute energy consumption of a model for a given technology, or when attempting to compare different technologies.

This tool also provides a measure for model tuning which needs to consider both accuracy and model energy consumption. The energy cost provided by this tool can be integrated into a total loss function which combines energy cost and accuracy.

  • Energy Model: The best work referenced by the literature on energy consumption was first computed by Horowitz M.: “1.1 computing’s energy problem ( and what we can do about it)”; IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), 2014

In this work, the author attempted to estimate the energy consumption for accelerators, and for 45 nm process, the data points he presented has since been used whenever someone wants to compare accelerator performance. QTools energy consumption on a 45nm process is based on the data published in this work.

  • Examples: Example of how to generate data type map can be found in qkeras/qtools/ examples/example_generate_json.py. Example of how to generate energy consumption estimation can be found in qkeras/qtools/examples/example_get_energy.py

AutoQKeras

AutoQKeras allows the automatic quantization and rebalancing of deep neural networks by treating quantization and rebalancing of an existing deep neural network as a hyperparameter search in Keras-Tuner using random search, hyperband or gaussian processes.

In order to contain the explosion of hyperparameters, users can group tasks by patterns, and perform distribute training using available resources.

Extensive documentation is present in notebook/AutoQKeras.ipynb.

Related Work

QKeras has been implemented based on the work of "B.Moons et al. - Minimum Energy Quantized Neural Networks", Asilomar Conference on Signals, Systems and Computers, 2017 and "Zhou, S. et al. - DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients," but the framework should be easily extensible. The original code from QNN can be found below.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks. Finally, our main goal is easy of use, so we attempt to make QKeras layers a true drop-in replacement for Keras, so that users can easily exchange non-quantized layers by quantized ones.

Acknowledgements

Portions of QKeras were derived from QNN.

https://github.com/BertMoons/QuantizedNeuralNetworks-Keras-Tensorflow

Copyright (c) 2017, Bert Moons where it applies

qkeras's People

Contributors

ashishenoyp avatar chenmoneygithub avatar danielemoro avatar duchstf avatar fchollet avatar haifeng-jin avatar jam14j avatar jecorona97 avatar jmduarte avatar jw1992 avatar kshithijiyer avatar laurilaatu avatar lishanok avatar mschoenb97 avatar nkovela1 avatar nunescoelho avatar qkeras-robot avatar qkeras-team avatar qlzh727 avatar rblazquezf avatar vloncar avatar yilei avatar yyang29 avatar zhuangh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qkeras's Issues

QBatchNormalization: Progress

First: Thanks a lot for this wonderful package, it is really interesting!

Secondly, I wanted to ask you about the progress on implementing QBatchNormalization. In readme.md, you were stating that:

Finally, QBatchNormalization is still in its experimental stage, as we have not seen the need to use this yet due to the normalization and regularization effects of stochastic activation functions.

Did you implement a "Beta" version of this experimental version? I know, most of the time, there are not many parameters in the BatchNormalization Layer, nonetheless it would be interesting to study the effects of quantizing BatchNormalization Layers of a pretrained model like MobileNetV2, which includes BatchNormalization Layers.

Thanks a lot,
asti205

QDense_batchnorm?

I see you have qconv2d_batchnorm layer which folds the weights of the two layers and then quantizes.

We're bringing support for that to hls4ml, and it should help us save some resources & latency.

I'm wondering, do you plan to add the equivalent combined QDense + BatchNormalization layer to QKeras?

Quantization parameters

Hello, Thanks a lot guys for this Qkeras library. I really like it. I have a question related to the quantization parameters.

how to really observe the difference in weights and activation values when we use these parameter (i.e. kernel_quantizer = quantized_bits(5,0,1), bias_quantizer = quantized_bits(3), quantized_relu(2) )

for example: this function quantized_model_debug(qmodel,X_test) returns some stats. i.e. the max and min values of activation functions and weights of the model. For the following layer in a model with these parameters.

qx = QDense(8, kernel_quantizer = quantized_bits(3,0,1), bias_quantizer = quantized_bits(3), name="qdense2")(qx) qx = QActivation("quantized_relu(2)", name="act_2")(qx)

output of function:
qdense2 : -1.5000 1.7500 ( -0.7500 0.7500) ( 0.0000 0.2500) a( 1.000000 1.000000)
act_2 : 0.0000 0.7500

But i manually check the layer's weights using model.get_weights(). the maximum and minimum value turns out be
np.max(qmodel.get_weights()[2])
np.min(qmodel.get_weights()[2])

max = 0.9998701
min = -1.0

so i would like to know, how to really know what difference these parameters is making to the weights and activation output.

Thanks alot

QDepthwiseConv2D can't work

The codes to reproduce are at below. They are based on mnist.py The import path may be different in your case.

QConv2D

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""

from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import DepthwiseConv2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
# from lo import *
import numpy as np
from qkeras.qkeras import *
# from qkeras import print_qstats
# from qkeras import QActivation
# from qkeras import QConv2D
# from qkeras import QDense
# from qkeras import quantized_bits
# from qkeras import ternary


np.random.seed(42)
OPTIMIZER = Adam(lr=0.002)
NB_EPOCH = 10
BATCH_SIZE = 32
VERBOSE = 1
NB_CLASSES = 10
N_HIDDEN = 100
VALIDATION_SPLIT = 0.1
RESHAPED = 784


def QConv2DModel(load_weights=False):
  """Construct QConv2DModel."""

  x = x_in = Input((28,28,1), name="input")
  x = QActivation("quantized_relu(2)", name="act_i")(x)

  x = Conv2D(32, (3, 3), strides=(2, 2), name="conv2d_0_m")(x)
  x = BatchNormalization(name="bn0")(x)
  x = QActivation("quantized_relu(2)", name="act0_m")(x)

  x = Conv2D(64, (3, 3), strides=(2, 2), name="conv2d_1_m")(x)
  x = BatchNormalization(name="bn1")(x)
  x = QActivation("quantized_relu(2)", name="act1_m")(x)

  x = Conv2D(64, (3, 3), strides=(2, 2), name="conv2d_2_m")(x)
  x = BatchNormalization(name="bn2")(x)
  x = QActivation("quantized_relu(2)", name="act2_m")(x)

  x = Flatten(name="flatten")(x)

  x = QDense(
      NB_CLASSES,
      kernel_quantizer=quantized_bits(4, 0, 1),
      bias_quantizer=quantized_bits(4, 0, 1),
      name="dense2")(x)
  x = Activation("softmax", name="softmax")(x)

  model = Model(inputs=[x_in], outputs=[x])
  model.summary()
  model.compile(loss="categorical_crossentropy",
                optimizer=OPTIMIZER, metrics=["accuracy"])


  return model



"""Use DenseModel.
Args:
  weights_f: weight file location.
  load_weights: load weights when it is True.
"""
model = QConv2DModel()
batch_size = BATCH_SIZE
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 28, 28, 1)
x_test = x_test.reshape(10000, 28, 28, 1)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 256.
x_test /= 256.
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
y_train = to_categorical(y_train, NB_CLASSES)
y_test = to_categorical(y_test, NB_CLASSES)

model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=NB_EPOCH,
    verbose=VERBOSE,
    validation_split=VALIDATION_SPLIT)

score = model.evaluate(x_test, y_test, verbose=False)
print("Test score:", score[0])
print("Test accuracy:", score[1])

QDepthwiseConv2D

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = ""

from tensorflow.keras.datasets import mnist
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import Conv2D
from tensorflow.keras.layers import DepthwiseConv2D
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
# from lo import *
import numpy as np
from qkeras.qkeras import *
# from qkeras import print_qstats
# from qkeras import QActivation
# from qkeras import QConv2D
# from qkeras import QDense
# from qkeras import quantized_bits
# from qkeras import ternary


np.random.seed(42)
OPTIMIZER = Adam(lr=0.002)
NB_EPOCH = 10
BATCH_SIZE = 32
VERBOSE = 1
NB_CLASSES = 10
N_HIDDEN = 100
VALIDATION_SPLIT = 0.1
RESHAPED = 784


def QConv2DModel(load_weights=False):
  """Construct QConv2DModel."""

  x = x_in = Input((28,28,1), name="input")
  x = QActivation("quantized_relu(2)", name="act_i")(x)

  x = QConv2D(32, (3, 3), strides=(1,1), name="conv2d_0_m")(x)
  x = BatchNormalization(name="bn0")(x)
  x = QActivation("quantized_relu(2)", name="act0_m")(x)

  x = QDepthwiseConv2D((3, 3), name="dwconv2d_1_m")(x)
  x = BatchNormalization(name="bn1")(x)
  x = QActivation("quantized_relu(2)", name="act1_m")(x)

  x = QDepthwiseConv2D((3, 3), name="dwconv2d_2_m")(x)
  x = BatchNormalization(name="bn2")(x)
  x = QActivation("quantized_relu(2)", name="act2_m")(x)

  x = Flatten(name="flatten")(x)

  x = QDense(
      NB_CLASSES,
      kernel_quantizer=quantized_bits(4, 0, 1),
      bias_quantizer=quantized_bits(4, 0, 1),
      name="dense2")(x)
  x = Activation("softmax", name="softmax")(x)

  model = Model(inputs=[x_in], outputs=[x])
  model.summary()
  model.compile(loss="categorical_crossentropy",
                optimizer=OPTIMIZER, metrics=["accuracy"])


  return model



"""Use DenseModel.
Args:
  weights_f: weight file location.
  load_weights: load weights when it is True.
"""
model = QConv2DModel()
batch_size = BATCH_SIZE
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 28, 28, 1)
x_test = x_test.reshape(10000, 28, 28, 1)
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
x_train /= 256.
x_test /= 256.
print(x_train.shape[0], "train samples")
print(x_test.shape[0], "test samples")
y_train = to_categorical(y_train, NB_CLASSES)
y_test = to_categorical(y_test, NB_CLASSES)

model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=NB_EPOCH,
    verbose=VERBOSE,
    validation_split=VALIDATION_SPLIT)

score = model.evaluate(x_test, y_test, verbose=False)
print("Test score:", score[0])
print("Test accuracy:", score[1])

The error shown when running QDepthwiseConv2D

Traceback (most recent call last):
  File "mnist.py", line 123, in <module>
    validation_split=VALIDATION_SPLIT)
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/keras/engine/training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1848, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1924, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 550, in call
    ctx=ctx)
  File "/home/training/r08943133/venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input 'pred' passed float expected bool while building NodeDef 'functional_1/dwconv2d_1_m/cond/switch_pred/_2' using Op<name=Switch; signature=data:T, pred:bool -> output_false:T, output_true:T; attr=T:type> [Op:__inference_train_function_2682]

In Tensorflow 1.X, can't use depthwise_conv2d

in tensorflow 1.X when use depthwise_conv2d,
raise below error

AttributeError: in converted code:
relative to C:\Users\user\Anaconda3\envs\py3_7\lib\site-packages:

qkeras\qlayers.py:1404 call  *
    outputs = tf.keras.backend.depthwise_conv2d(
tensorflow_core\python\util\module_wrapper.py:193 __getattr__
    attr = getattr(self._tfmw_wrapped_module, name)

AttributeError: module 'tensorflow.python.keras.api._v2.keras.backend' has no attribute 'depthwise_conv2d'

so, Is there any room for improvement in the future for this bug?

Missing quantized_sigmoid activation layer in QKeras?

This is more of a doubt than an issue I face with QKeras. I wanted to add a quantized Sigmoid layer to my neural network model. I was rather surprised to find no such activation layer mentioned, either in the Readme or in the research paper published by Google and CERN, on QKeras, particulalry because I see a quantized_tanh and a quantized_relu. It seemed only natural to have an analogous quantized_sigmoid as well.

Is a built-in quantized Sigmoid activation layer truly missing in QKeras? Or am I missing something?

TinyYOLOv3

Is this quantization aware training suitable for any “higher” architectures like TinyYOLOv3?
I am asking, because I tried to make it work and got stuck rather early not achieving satisfying results.

Is this package on pypi?

Thanks for the awesome work! Is this package available for install on pypi via pip (I couldn't find it under qkeras)? Or does one need to use pip to install from this GitHub repo directly?

Vivado HLS ap_fixed emulation

Hi! First I want to thank the QKeras contributors for developing it, I appreciate it so much.

I'm working on the implementation of a CNN in a FPGA and the workflow QKeras+hls4ml seems to be a great option.

Since I want to perform a QAT that emulates the ap_fixed types of Vivado HLS for representing the weights, bias and internal values, I found that the correct way yo do it with QKeras is using the following quantizer:

quantizer = quantized_bits(nbitsTot, nbitsInt-1, keep_negative=True)

Before the training I've check if it works as I thought, doing some tests, including the following that returned True:

x = (2**nbitsInt + 0.2)*np.random.random((1000))-(2**(nbitsInt-1)+0.1)
np.array_equal(quantizer(x)%2**-(nbitsTot-nbitsInt), np.zeros((1000)))

Then I've set a foo model and briefly trained in order to check if the trained weights are quantized:

y = to_categorical(np.random.randint(1), 10)

model = Sequential()
model.add(Input(1000))
model.add(QDense(10,
    kernel_quantizer=quantizer,
    bias_quantizer=quantizer,
    name='Dense'))

model.compile('adam', 'categorical_crossentropy', metrics=['accuracy'])

model.fit(np.expand_dims(x,0), np.expand_dims(y,0), epochs=100)

qmodel = model_save_quantized_weights(model)

dense_W = qmodel['Dense']['weights'][0]

But I found that:

np.array_equal(dense_W%2**-(nbitsTot-nbitsInt), np.zeros((dense_W.shape)))

False

I'm unsure but, it should be True, doesn't it?

I've checked the number of bits of the resolution with the following code and it yielded 14.

i = 0
while not np.array_equal(dense_W%2**-(nbitsTot-nbitsInt+i), np.zeros((dense_W.shape))):
    i+=1
else:
    print(nbitsTot-nbitsInt+i)

The imported funcions I've used are:

import numpy as np
from qkeras import quantized_bits, QDense
from qkeras.utils import model_save_quantized_weights
from tensorflow.keras.layers import Input
from tensorflow.keras.models import Sequential
from tensorflow.keras.utils import to_categorical

I'm running:

  • QKeras 0.9.0
  • TF 2.3.0
  • Ubuntu 20.04.2 LTS

QGRU Error when compiling with input features not the same as units

Building a simple GRU model using Keras:

gru = Sequential(GRU(16, input_shape=(2,4)))
gru.compile(loss='mse', optimizer='adam')
gru.summary()

Produces output:

Model: "sequential_332"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
gru_3 (GRU)                  (None, 16)                1056      
=================================================================
Total params: 1,056
Trainable params: 1,056
Non-trainable params: 0

Trying to build same model with QGRU layer:

gru = Sequential(QGRU(16, input_shape=(2,4)))
gru.compile(loss='mse', optimizer='adam')
gru.summary()

Produces the following error:

ValueError: in user code:

    /lib/python3.7/site-packages/qkeras/qrecurrent.py:1304 call  *
        inputs, mask=mask, training=training, initial_state=initial_state)
    /lib/python3.7/site-packages/qkeras/qrecurrent.py:1129 call  *
        recurrent_z = K.dot(h_tm1_z, quantized_recurrent[:, :self.units])
    /lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:201 wrapper  **
        return target(*args, **kwargs)
    /lib/python3.7/site-packages/tensorflow/python/keras/backend.py:1898 dot
        out = math_ops.matmul(x, y)
    /lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:201 wrapper
        return target(*args, **kwargs)
    /lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3315 matmul
        a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
    /lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py:5550 mat_mul
        name=name)
    /lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:750 _apply_op_helper
        attrs=attr_protos, op_def=op_def)
    /lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py:592 _create_op_internal
        compute_device)
    /lib/python3.7/site-packages/tensorflow/python/framework/ops.py:3536 _create_op_internal
        op_def=op_def)
    /lib/python3.7/site-packages/tensorflow/python/framework/ops.py:2016 __init__
        control_input_ops, op_def)
    /lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1856 _create_c_op
        raise ValueError(str(e))

    ValueError: Dimensions must be equal, but are 16 and 4 for '{{node qgru_13/qgru_cell_13/MatMul_3}} = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false](qgru_13/zeros, qgru_13/qgru_cell_13/strided_slice_6)' with input shapes: [?,16], [4,16].

Setting units equal to input features produces no error. The issue is only present in QGRU. Other recurrent layers QLSTM or QSimpleRNN do not have this issue.

Kernel and Bias Quantizers not quantizing weights and biases

I trained a qkeras model with kernel and bias quantizers for every QDense layer as quantized_bits(8,0). After training, I print out the weights and biases of the QDense layers.

I expect them to have 7 or 8 bits in their binary representation according to the documentation . However, I find all of them to be exceeding that by a lot .

A sample of the weights that I find upon printing them out:

-1.18689373e-01, 7.44902715e-03, -1.58425614e-01, 7.54895657e-02,
-4.10564430e-03, 2.46057995e-02

Clearly, they do not have 7 or 8 bits in their binary representation .

So, my question is: Are the kernel and bias quantizers not functioning properly? Or am I missing something?

quantized_relu is not zeroing negative numbers

I was dumping the feature maps among all layers of a small custom convolutional network and I found that the negative values after quantized_relu are quantized, but still remain negative. The setup for quantized_relu is the default one, i.e. quantized_relu(4,0).
I'm using QKeras 0.9.0.

I think that this line of code

p = x * m / m_i

should become:
p = x_u * m / m_i
(and also other lines that are using x instead of x_u should be changed as well).

Did someone experience this problem, too?
Thanks

Assert np.mod(np.log2(negative_slope), 1) == 0 AssertionError

I tried this in my code
x = QActivation("quantized_relu(8,0,1,0.1)")(x)
to make a LeakyReLU with aplha = 0.1 , but the result is

File "qkeras/quantizers.py", line 1264, in init
assert np.mod(np.log2(negative_slope), 1) == 0
AssertionError

This assertion does not allow parameters like 0.1, 0.2 etc. Am I using the quantized_relu wrong ?

Instantiate quantizers by name

Quantizers in layers can be specified either as an instance, or as a string that gets evaluated. For example QDense(10, kernel_quantizer=ternary(), ...)(x) or QDense(10, kernel_quantizer='ternary()', ...)(x) but not QDense(10, kernel_quantizer='ternary', ...)(x). Making sure that an instance is returned in safe_eval.py seems to fix this, but I am wondering does this have any downsides?

Int4 Quantization support

Hi!
I want to get started with using this library, specifically for int4 quantization support. I have already built a model using Keras and want to quantize it to int4. Does this library have something similar to tflite converter?
Also I have used representative datasets when I used tflite for int8 quantization, does this library also require something of that sort.

Thanks

Release a new stable version

I see that there are lots of great commits since v0.8. However, people can't enjoy them without releasing a stable version. (or they have to deal with some annoying issue) Considering that there will be more and more people discovering this, releasing a new stable version would be nice!

Qkeras and ternary network (-1,0,1)

Is it possible to use Qkeras for training with a network where kernels are -1, 0 and 1 ?

For example in a layer as below if I set precision to 2 then I get weights -2,-1,0,1,2 and decent accuracy in the network but if I set precision to 1 all the weights become 0 and I get no accuracy at all.

    sub_model = QConv1D(512, kernel_size, input_shape=(128, 3),      
                kernel_quantizer=quantized_bits(precision,0,symetry),
                bias_quantizer=quantized_bits(precision,0,symetry),
                padding='same')(main_input)

Thanks,

max method of quantized_bits returns incorrect values

Hi,

I've noticed that the max method of quantized_bits doesn't return correct values.
According to the documentation, the max method should return the largest value which can be represented by the quantizer.
Defining an unsigned 8-Bit quantization with zero integer bits, the quantizer correctly quantizes
the value 1.0 to 0.99609375, which is the larges number which can be represented in this configuration.
But the max method returns 1.0.

Minimum example:

import qkeras as qk

quantizer = qk.quantized_bits(8, 0, 0, False)
x = 1.0
xq = quantizer(x)
q_max = quantizer.max()
print('x: {0}, xq: {1}, q_max: {2}'.format(x, xq, q_max))

Output:

x: 1.0, xq: 0.99609375, q_max: 1.0

Expected Output:

x: 1.0, xq: 0.99609375, q_max: 0.99609375

Quantized operation during inference?

Hello,

I just have a question about the qlayers during inference time here. So after training, the weights of the layers are quantized. However, would the operations between different layers quantized accordingly?

For example, you can have low bit weights but the operations between them is still 32 bits, which gives quite different hardware performance than low bit weights and low bit operations.

Thanks,

Duc.

Print activation issue

Good morning I am working with your library to quantize networks.
However it is not clear to me how to print the activations out of the layers. I tried the quantized_model_debug function but it doesn't seem to output activations.

AutoQKeras define goal

Hello and thank you for your work of the Project!

I'm trying to quantize the Encoder Stage of a Convolutional Autoencoder with AutoQKeras. My goal is to reduce the number of bits in those Layers. Now I'm not sure how to correctly call AutoQKeras in order to minimize the loss and get the best Model. My current Code:

import keras
from keras import layers
import tensorflow as tf

from qkeras.autoqkeras import *
from qkeras import *
from qkeras.utils import model_quantize
from qkeras.qtools import run_qtools
from qkeras.qtools import settings as qtools_settings

import tempfile

[...] # load the dataset

IMG_SHAPE = datagen[0][0][0].shape # shape = (96, 96, 1)

z_dim = 16

# Build Autoencoder
autoencoder = keras.Sequential(name='Autoencoder')

autoencoder.add(Input(shape=IMG_SHAPE))

autoencoder.add(Conv2D(32, 4, strides=2, activation='relu', padding='same'))
autoencoder.add(Conv2D(64, 4, strides=2, activation='relu', padding='same'))
autoencoder.add(Conv2D(128, 4, strides=2, activation='relu', padding='same'))
autoencoder.add(Conv2D(256, 4, strides=2, activation='relu', padding='same'))
autoencoder.add(Dense(z_dim))

autoencoder.add(Dense(z_dim))
autoencoder.add(Conv2DTranspose(128, 5, strides=2, activation='relu', padding='same'))
autoencoder.add(Conv2DTranspose(64, 5, strides=2, activation='relu', padding='same'))
autoencoder.add(Conv2DTranspose(32, 6, strides=2, activation='relu', padding='same'))
autoencoder.add(Conv2DTranspose(1, 6, strides=2, activation='sigmoid', padding='same'))

autoencoder.compile(optimizer='adam', loss='binary_crossentropy')

physical_devices = tf.config.list_physical_devices()
for d in physical_devices:
    print(d)
    

has_tpus = np.any([d.device_type == "TPU" for d in physical_devices])

if has_tpus:
    TPU_WORKER = 'local'

    resolver = tf.distribute.cluster_resolver.TPUClusterResolver(
        tpu=TPU_WORKER, job_name='tpu_worker')
    if TPU_WORKER != 'local':
        tf.config.experimental_connect_to_cluster(resolver, protocol='grpc+loas')
    tf.tpu.experimental.initialize_tpu_system(resolver)
    strategy = tf.distribute.experimental.TPUStrategy(resolver)
    print('Number of devices: {}'.format(strategy.num_replicas_in_sync))

    cur_strategy = strategy
else:
    cur_strategy = tf.distribute.get_strategy()
    
custom_objects = {}

quantization_config = {
        "kernel": {
                "binary": 1,
                "stochastic_binary": 1,
                "ternary": 2,
                "stochastic_ternary": 2,
                "quantized_bits(2,1,1,alpha=1.0)": 2,
                "quantized_bits(4,0,1,alpha=1.0)": 4,
                "quantized_bits(8,0,1,alpha=1.0)": 8
        },
        "bias": {
                "quantized_bits(4,0,1)": 4,
                "quantized_bits(8,3,1)": 8
        },
        "activation": {
                "binary": 1,
                "ternary": 2,
                "quantized_relu(3,1)": 3,
                "quantized_relu(4,2)": 4,
                "quantized_relu(8,2)": 8,
                "quantized_relu(8,4)": 8,
                "quantized_relu(16,8)": 16
        },
        "linear": {
                "binary": 1,
                "ternary": 2,
                "quantized_bits(4,1)": 4,
                "quantized_bits(8,2)": 8,
                "quantized_bits(16,10)": 16
        }
}

limit = {
    "Dense": [8, 8, 4],
    "Conv2D": [4, 8, 4],
    "DepthwiseConv2D": [4, 8, 4],
    "Activation": [4]
}

run_config = {
  "output_dir": tempfile.mkdtemp(),
  "quantization_config": quantization_config,
  "learning_rate_optimizer": False,
  "transfer_weights": False,
  "mode": "random",
  "seed": 42,
  "limit": limit,
  "tune_filters": "layer",
  "tune_filters_exceptions": "^dense",
  "distribution_strategy": cur_strategy,
  # first layer is input, layer two layers are softmax and flatten
  "layer_indexes": range(0, 4),
  "max_trials": 20
}

print("quantizing layers:", [autoencoder.layers[i].name for i in run_config["layer_indexes"]])

autoqk = AutoQKeras(autoencoder, custom_objects=custom_objects, **run_config)
autoqk.fit(x_data, x_data, validation_data=(x_data_val, x_data_val), batch_size=1, epochs=20)

qmodel = autoqk.get_best_model()

optimizer = Adam(learning_rate=0.02)
qmodel.compile(optimizer=optimizer, loss="binary_crossentropy")
qmodel.fit(x_data, x_data, epochs=20, batch_size=1, validation_data=(x_data_val, x_data_val))

score = qmodel.evaluate(x_data_val, x_data_val, verbose=1)
print(score)

When I train the Autoencoder on the Dataset without using AutoQKeras it works fine, however after quantizing and retrieving the best Model, the output of the prediction is all black.

I suspect I need to pass some argument, that AutoQKeras knows it should minimize the loss?

Best Regards,
Lukas

Requirements to implement new quantization classes

I would like to ask what are the requirements for a new quantizer class: is it mandatory to inherit from BaseQuantizer? What should return the__call__()method? What is _set_trainable_parameter() intended for? Which are the differences between using traditional keras layers assigning (kernel|bias)_constraint to a quantization class with respect to qkeras layers where you can specify (kernel|bias)_quantizer?
Thank you in advance

QSeparableConv2D: 'Keyword argument not understood:', 'depthwise_activation'

For the QSeparableConv2D layer, a error is being prompted saying that the keyword argument for depthwise_activation is not understood. Following the sample code from the readme, this error is prompted. Below is the code that prompted this error.

tensorflow==2.5.0
Qkeras==0.9.0
from tensorflow.keras.layers import *
from qkeras import *

def getQModel(INPUT_SHAPE,N_CLASSES):
    inputs = Input(INPUT_SHAPE)

    x = QConv2D(18, (3, 3),
            kernel_quantizer="stochastic_ternary",
            bias_quantizer="ternary", name="first_conv2d")(inputs)
    x = QActivation("quantized_relu(3)")(x)
    x = QSeparableConv2D(32, (3, 3),
            depthwise_quantizer=quantized_bits(4, 0, 1),
            pointwise_quantizer=quantized_bits(3, 0, 1),
            bias_quantizer=quantized_bits(3),
            depthwise_activation=quantized_tanh(6, 2, 1))(x)
    x = QActivation("quantized_relu(3)")(x)
    x = Flatten()(x)
    x = QDense(NB_CLASSES,
            kernel_quantizer=quantized_bits(3),
            bias_quantizer=quantized_bits(3))(x)
    x = QActivation("quantized_bits(20, 5)")(x)
    yh = Activation("softmax")(x)
    
    model = tf.keras.Model(inputs, yh)
    print(model.summary())
    return model

qmodel = getQModel(INPUT_SHAPE,N_CLASSES)

The error itself.

TypeError                                 Traceback (most recent call last)
/var/folders/1l/1j39gqlj2373rny0fddzmbgw0000gn/T/ipykernel_35211/3806686309.py in <module>
     26     return model
     27 
---> 28 qmodel = getQModel(INPUT_SHAPE,N_CLASSES)

/var/folders/1l/1j39gqlj2373rny0fddzmbgw0000gn/T/ipykernel_35211/3806686309.py in getQModel(INPUT_SHAPE, N_CLASSES)
      9             bias_quantizer="ternary", name="first_conv2d")(inputs)
     10     x = QActivation("quantized_relu(3)")(x)
---> 11     x = QSeparableConv2D(32, (3, 3),
     12             depthwise_quantizer=quantized_bits(4, 0, 1),
     13             pointwise_quantizer=quantized_bits(3, 0, 1),

~/opt/anaconda3/envs/golden/lib/python3.8/site-packages/qkeras/qconvolutional.py in __init__(self, filters, kernel_size, strides, padding, data_format, dilation_rate, depth_multiplier, activation, use_bias, depthwise_initializer, pointwise_initializer, bias_initializer, depthwise_regularizer, pointwise_regularizer, bias_regularizer, activity_regularizer, depthwise_constraint, pointwise_constraint, bias_constraint, depthwise_quantizer, pointwise_quantizer, bias_quantizer, **kwargs)
    766       activation = get_quantizer(activation)
    767 
--> 768     super(QSeparableConv2D, self).__init__(
    769         filters=filters,
    770         kernel_size=kernel_size,

~/opt/anaconda3/envs/golden/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py in __init__(self, filters, kernel_size, strides, padding, data_format, dilation_rate, depth_multiplier, activation, use_bias, depthwise_initializer, pointwise_initializer, bias_initializer, depthwise_regularizer, pointwise_regularizer, bias_regularizer, activity_regularizer, depthwise_constraint, pointwise_constraint, bias_constraint, **kwargs)
   2205                bias_constraint=None,
   2206                **kwargs):
-> 2207     super(SeparableConv2D, self).__init__(
   2208         rank=2,
   2209         filters=filters,

~/opt/anaconda3/envs/golden/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py in __init__(self, rank, filters, kernel_size, strides, padding, data_format, dilation_rate, depth_multiplier, activation, use_bias, depthwise_initializer, pointwise_initializer, bias_initializer, depthwise_regularizer, pointwise_regularizer, bias_regularizer, activity_regularizer, depthwise_constraint, pointwise_constraint, bias_constraint, trainable, name, **kwargs)
   1784                name=None,
   1785                **kwargs):
-> 1786     super(SeparableConv, self).__init__(
   1787         rank=rank,
   1788         filters=filters,

~/opt/anaconda3/envs/golden/lib/python3.8/site-packages/tensorflow/python/keras/layers/convolutional.py in __init__(self, rank, filters, kernel_size, strides, padding, data_format, dilation_rate, groups, activation, use_bias, kernel_initializer, bias_initializer, kernel_regularizer, bias_regularizer, activity_regularizer, kernel_constraint, bias_constraint, trainable, name, conv_op, **kwargs)
    127                conv_op=None,
    128                **kwargs):
--> 129     super(Conv, self).__init__(
    130         trainable=trainable,
    131         name=name,

~/opt/anaconda3/envs/golden/lib/python3.8/site-packages/tensorflow/python/training/tracking/base.py in _method_wrapper(self, *args, **kwargs)
    520     self._self_setattr_tracking = False  # pylint: disable=protected-access
    521     try:
--> 522       result = method(self, *args, **kwargs)
    523     finally:
    524       self._self_setattr_tracking = previous_value  # pylint: disable=protected-access

~/opt/anaconda3/envs/golden/lib/python3.8/site-packages/tensorflow/python/keras/engine/base_layer.py in __init__(self, trainable, name, dtype, dynamic, **kwargs)
    345     }
    346     # Validate optional keyword arguments.
--> 347     generic_utils.validate_kwargs(kwargs, allowed_kwargs)
    348 
    349     # Mutable properties

~/opt/anaconda3/envs/golden/lib/python3.8/site-packages/tensorflow/python/keras/utils/generic_utils.py in validate_kwargs(kwargs, allowed_kwargs, error_message)
   1135   for kwarg in kwargs:
   1136     if kwarg not in allowed_kwargs:
-> 1137       raise TypeError(error_message, kwarg)
   1138 
   1139 

TypeError: ('Keyword argument not understood:', 'depthwise_activation')

Requirements file

I would like to know if QKeras is architecture agnostic (following TF requirements) since it lists as import the standard tensorflow python library (I assume if that is correct, tensorflow-gpuwould be supported as well).

INT8 or INT16 for output feature map

Hello, I am wondering is it possible to quantize the output feature map into INT8 or INT16 using QKeras? So that the memory space can be compressed as well.

Warning: QActivation could not be transformed and will be executed as-is

Hello,

first I have to thank you for this great repository. However, I always receive this strange warning as mentioned below. I am using the following version of packages with Spyder 4.1. Is this a warning which can be ignored and if so, how can I suppress this message?

Packages

tensorflow version: 2.0.0
keras version: 2.2.4-tf
numpy version: 1.17.4
opencv version: 4.1.2

Warning

WARNING: Entity <bound method QActivation.call of <Utils.qkeras.qlayers.QActivation object at 0x000001C0816E5390>> could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, export AUTOGRAPH_VERBOSITY=10) and attach the full output. Cause:

How to save and load quantized models ?

I am using model_save_quantized_weights() and load_qmodles(), as suggested in: https://notebook.community/google/qkeras/notebook/QKerasTutorial
But, I keep getting:
NameError: name 'model_save_quantized_weights' is not defined

I installed qkeras in Anaconda using: !pip install git+git://github.com/google/qkeras
And I import stuff as: from qkeras import *
Quantized models, themselves work fine. Its saving and loading thats got me stumped.

I worked around the Name Error by basically copy pasting the module definition from source code into my notebook. But I cant find the code for load_qmodel() ?

Why am I getting this error ? Am I missing something ?

(s[0] == "'" and s[-1] == "'") AssertionError

I am trying to use "use_stochastic_rounding" like

config = {"QConv2D": {
                "kernel_quantizer":"quantized_bits()",
                "bias_quantizer":"quantized_bits(use_stochastic_rounding=True)"
				
  } 

but I get this error

keras-YOLOv3-model-set/qkeras/safe_eval.py", line 42, in Num
(s[0] == "'" and s[-1] == "'")

Change The kernel_range of QLayers

Thanks for your works, it's quite convenient for us who want to quantizate model.
But some things I get confused when I use it, so I hope to get your help if possible.

That is, I find the the config of Qlayers like QDense has the item named "kernel_range=1.0", does it constrain the weight within the range or something else?

https://github.com/google/qkeras/blob/92ec6d37c97c27a5ac9d59e0629ced0ddc432a20/qkeras/qlayers.py#L736
kernel_range=1.0,

Second is quantized_bit. the code make the max(x)=1, min(x)=-1, but mine max(x) and min(x) are much smaller than 1 or -1, and varies by layers, I think if make them 1 and -1 then make quantization will loss many quantization levels unused, so I want to know is there any way to change the max(x) or min(x) by layers, e.g pass it in q_dict like /examples/example_keras_to_qkeras.py?

https://github.com/google/qkeras/blob/92ec6d37c97c27a5ac9d59e0629ced0ddc432a20/qkeras/qlayers.py#L202
1) max(x) = +1, min(x) = -1 2) max(x) = -min(x)

Thanks again!

I think I have solved the issue, thanks again for your amazing library

QConv1D error

hello, I tried QConv1D in the following way

def Test_quant(bit_width):
    input_points = Input(shape=((32,32)))
    g = QConv1D(64, 1,
                kernel_quantizer=quantized_bits(bit_width,0,1),
                bias_quantizer=quantized_bits(bit_width,0,1),
                name="conv1")(input_points)

    model = Model(inputs=input_points, outputs=g)

    return model

net = Test_quant(8)
net.summary()

But get error as bellow:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\framework\tensor_shape.py in merge_with(self, other)
    670       try:
--> 671         self.assert_same_rank(other)
    672         new_dims = []

~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\framework\tensor_shape.py in assert_same_rank(self, other)
    715         raise ValueError("Shapes %s and %s must have the same rank" % (self,
--> 716                                                                        other))
    717 

ValueError: Shapes (1, 1) and (?,) must have the same rank

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\framework\tensor_shape.py in with_rank(self, rank)
    745     try:
--> 746       return self.merge_with(unknown_shape(ndims=rank))
    747     except ValueError:

~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\framework\tensor_shape.py in merge_with(self, other)
    676       except ValueError:
--> 677         raise ValueError("Shapes %s and %s are not compatible" % (self, other))
    678 

ValueError: Shapes (1, 1) and (?,) are not compatible

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\ops\nn_ops.py in __init__(self, input_shape, dilation_rate, padding, build_op, filter_shape, spatial_dims, data_format)
    396     try:
--> 397       rate_shape = dilation_rate.get_shape().with_rank(1)
    398     except ValueError:

~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\framework\tensor_shape.py in with_rank(self, rank)
    747     except ValueError:
--> 748       raise ValueError("Shape %s must have rank %d" % (self, rank))
    749 

ValueError: Shape (1, 1) must have rank 1

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
<ipython-input-57-fdfe4b11dc04> in <module>
     10     return model
     11 
---> 12 net = Test_quant(8)
     13 net.summary()

<ipython-input-57-fdfe4b11dc04> in Test_quant(bit_width)
      4                 kernel_quantizer=quantized_bits(bit_width,0,1),
      5                 bias_quantizer=quantized_bits(bit_width,0,1),
----> 6                 name="conv1")(input_points)
      7 
      8     model = Model(inputs=input_points, outputs=g)

~\Anaconda3\envs\tf36\lib\site-packages\keras\engine\base_layer.py in __call__(self, inputs, **kwargs)
    455             # Actually call the layer,
    456             # collecting output(s), mask(s), and shape(s).
--> 457             output = self.call(inputs, **kwargs)
    458             output_mask = self.compute_mask(inputs, previous_mask)
    459 

~\Desktop\quantization\PointNet-Keras\qkeras\qlayers.py in call(self, inputs)
   1014         padding=self.padding,
   1015         data_format=self.data_format,
-> 1016         dilation_rate=self.dilation_rate)
   1017 
   1018     if self.use_bias:

~\Anaconda3\envs\tf36\lib\site-packages\keras\backend\tensorflow_backend.py in conv1d(x, kernel, strides, padding, data_format, dilation_rate)
   3609         strides=(strides,),
   3610         padding=padding,
-> 3611         data_format=tf_data_format)
   3612 
   3613     if data_format == 'channels_first' and tf_data_format == 'NWC':

~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\ops\nn_ops.py in convolution(input, filter, padding, strides, dilation_rate, name, data_format)
    777         dilation_rate=dilation_rate,
    778         name=name,
--> 779         data_format=data_format)
    780     return op(input, filter)
    781 

~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\ops\nn_ops.py in __init__(self, input_shape, filter_shape, padding, strides, dilation_rate, name, data_format)
    854         filter_shape=filter_shape,
    855         spatial_dims=spatial_dims,
--> 856         data_format=data_format)
    857 
    858   def _build_op(self, _, padding):

~\Anaconda3\envs\tf36\lib\site-packages\tensorflow\python\ops\nn_ops.py in __init__(self, input_shape, dilation_rate, padding, build_op, filter_shape, spatial_dims, data_format)
    397       rate_shape = dilation_rate.get_shape().with_rank(1)
    398     except ValueError:
--> 399       raise ValueError("rate must be rank 1")
    400 
    401     if not dilation_rate.get_shape().is_fully_defined():

ValueError: rate must be rank 1

My Keras version is 2.24 and Tensorflow version is 1.12.0

The accuracy is very poor, am I doing something wrong?

Code

input_audio = Input(
                shape=flags.desired_samples,
                batch_size=flags.batch_size,
                dtype=tf.float32)
  net = input_audio

  frame_size = int(flags.sample_rate * flags.window_size_ms / 1000)
  frame_stride = int(flags.sample_rate * flags.window_stride_ms / 1000)
  net = dataframe.DataFrame(
        inference_batch_size=1,
        frame_size=frame_size,
        frame_step=frame_stride)(
        net)
  net = windowing.Windowing(
        window_size=frame_size, 
        window_type=flags.window_type)(
        net)
  net = magnitude_rdft_mel.MagnitudeRDFTmel(
        num_mel_bins=flags.mel_num_bins,
        lower_edge_hertz=flags.mel_lower_edge_hertz,
        upper_edge_hertz=flags.mel_upper_edge_hertz,
        sample_rate=flags.sample_rate,
        mel_non_zero_only=flags.mel_non_zero_only)(
        net)
  net = Lambda(lambda x: tf.math.log(tf.math.maximum(x, flags.log_epsilon)))(net)
  net = dct.DCT(num_features=flags.dct_num_features)(net)
          

  time_size, feature_size = net.shape[1:3]
  channels = utils.parse(flags.channels)
  net = tf.keras.backend.expand_dims(net)
    
  net = tf.reshape(
      net, [-1, time_size, 1, feature_size])  # [batch, time, 1, feature]
  first_kernel = utils.parse(flags.first_kernel)
  conv_kernel = utils.parse(flags.kernel_size)
  groups = flags.groups
  layer = 0

  net = qkeras.QConv2D(
        filters=channels[0],
        kernel_size=first_kernel,
        strides=1,
        padding='same',
        use_bias=True,
        activation='linear',
        kernel_quantizer='quantized_bits(8)',
        name='first_conv')(
          net)

  net = qkeras.QBatchNormalization(
        momentum=0.997,
        name='first_bn')(
          net)
  net = qkeras.QActivation('quantized_bits(8)',name='first_quantize')(net)
  net = Activation('relu',name='first_act')(net)

  channels = channels[1:]

  for n in channels:

    if(flags.mobile):
      net = qkeras.QDepthwiseConv2D(
            kernel_size=conv_kernel,
            strides=1,
            padding='same',
            use_bias=True,
            activation='linear',
            depthwise_quantizer='quantized_bits(8)',
            name='dw'+str(layer))(
              net)
      net = qkeras.QActivation('quantized_bits(8)',name='quantize_mobile'+str(layer))(net)
      net = qkeras.QConv2D(
            filters=n,
            # groups=groups,
            kernel_size=1,
            strides=1,
            padding='same',
            use_bias=True,
            activation='linear',
            kernel_quantizer='quantized_bits(8)',
            name='pw'+str(layer))(
              net)
        
    else:
      net = qkeras.QConv2D(
            filters=n,
            # groups=groups,
            kernel_size=conv_kernel,
            strides=1,
            padding='same',
            use_bias=True,
            activation='linear',
            kernel_quantizer='quantized_bits(8)',
            name='conv'+str(layer))(
              net)
      
    net = qkeras.QBatchNormalization(
          momentum=0.997,
          name='bn'+str(layer))(
            net)

    net = qkeras.QActivation('quantized_bits(8)',name='quantize'+str(layer))(net)
    net = Activation('relu',name='act'+str(layer))(net)
    layer = layer + 1

  net = AveragePooling2D(pool_size=net.shape[1:3], strides=1,name='pool')(net)

  # net = tf.keras.layers.Dropout(rate=flags.dropout)(net)

  # fully connected layer
  net = qkeras.QConv2D(
        filters=flags.label_count,
        kernel_size=1,
        strides=1,
        padding='same',
        use_bias=True,
        activation='linear',
        name='last_conv')(
          net)

  net = tf.reshape(net, shape=(-1, net.shape[3]), name='reshape')
  return tf.keras.Model(input_audio, net)

The first few layers are preprocessing layers and don't affect the quantization.

Error during Quantizing MobileNetV2

I am trying to quantize MobilnetV2 using 4-bit width, but when I run print_qstats(model), I am getting an error "A merge layer should be called on a list of inputs"
Capture2

Additionally, is there a way to implement Relu6 from Qkeras? I am also trying to implement 2-bit Quantization model in the same architecture, and the accuracy is very low ( fluctuating between 15-20% accuracy). I was wondering if you have any tips for full 2 bit quantization model. I have for now been using "QConv2D(kernel_quantizer=quantized_bits(2,2),
bias_quantizer=quantized_po2(2))" with respective activation functions and Batchnormalization

print_qstats(): operation type issue with Sequential() model

When I was applying quantization on a Keras Sequential() model, I found that there could be an issue about the operation type in print_stats() function.

For example, with the model in example_mnist.py but coded by the Sequential() API, I got an output as below. The operation type for the first conv2d layer is unull_4_-1, whereas it is smult_4_8 with the functional API.

Based on my experiments with some other models, this only happens to the first layer of the Sequential() model.

Also, for smult_4_8, I would like to know what does the 8 stand for here?

I am on:
tensorflow-gpu 2.2.0
tensorflow-model-optimization 0.4.1

Number of operations in model:
    conv2d_0_m                    : 25088 (unull_4_-1)
    conv2d_1_m                    : 663552 (smult_4_4)
    conv2d_2_m                    : 147456 (smult_4_4)
    dense                         : 5760  (smult_4_4)

Number of operation types in model:
    smult_4_4                     : 816768
    unull_4_-1                    : 25088

Weight profiling:
    conv2d_0_m_weights             : 128   (4-bit unit)
    conv2d_0_m_bias                : 32    (4-bit unit)
    conv2d_1_m_weights             : 18432 (4-bit unit)
    conv2d_1_m_bias                : 64    (4-bit unit)
    conv2d_2_m_weights             : 16384 (4-bit unit)
    conv2d_2_m_bias                : 64    (4-bit unit)
    dense_weights                  : 5760  (4-bit unit)
    dense_bias                     : 10    (4-bit unit)

Weight sparsity:
... quantizing model
    conv2d_0_m                     : 0.1812
    conv2d_1_m                     : 0.1345
    conv2d_2_m                     : 0.1156
    dense                          : 0.1393
    ----------------------------------------
    Total Sparsity                 : 0.1278

Quantized tanh does not correspond to the tanh used in tensorflow

The quantized tanh function defined using sigmoid does not correspond very well to the tanh function used in tf.keras.activations.tanh. A possibility to have better corresponding tanh function would be useful in some use-cases.

set_internal_sigmoid("real")
plt.plot(np.arange(-5,5,0.01), qkeras.quantized_tanh(32)(np.arange(-5,5,0.01)).numpy())
plt.plot(np.arange(-5,5,0.01), tf.keras.activations.tanh(np.arange(-5,5,0.01)).numpy())

tf_qkeras_activations

Converting regular Keras weights to Qkeras

Hello,

First I wanted to say: kudos for creating this library; I'm really excited to try it out on different models!

I saw in the readme:

QKeras extends QNN by providing a richer set of layers (including SeparableConv2D, DepthwiseConv2D, ternary and stochastic ternary quantizations), besides some functions to aid the estimation for the accumulators and conversion between non-quantized to quantized networks.

Is there any documentation on using those tools to convert pretrained weights (e.g. ImageNet) to the quantized versions?

Thanks!

LSTM not quantized weights after model_save_quantized_weights

I'm trying to quantize LSTM network from the notebook you have: https://github.com/google/qkeras/blob/eb6e0dc86c43128c6708988d9cb54d1e106685a4/notebook/QRNNTutorial.ipynb.
After seeing this issue I've changed the config file to look like this:

bits = 8
quantizer_config = {
  "bidirectional": {
      'activation' : f"quantized_tanh({bits}, 0, alpha=1)",
      'recurrent_activation' : f"quantized_relu({bits}, 0, alpha=1)",
      'kernel_quantizer' : f"quantized_bits({bits}, 0, alpha=1)",
      'recurrent_quantizer' : f"quantized_bits({bits}, 0, alpha=1)",
      'bias_quantizer' : f"quantized_bits({bits}, 0, alpha=1)",
  },
  "dense": {
      'kernel_quantizer' : f"quantized_bits({bits}, 0, alpha=1)",
      'bias_quantizer' : f"quantized_bits({bits}, 0, alpha=1)",
  },
  "embedding_act": f"quantized_bits({bits}, 0, alpha=1)",
}

I'm training this model and I apply model_save_quantized_weights function. Then when I print weights, they are still in floating point:

model_save_quantized_weights(qmodel, "quant_weights.h5")
for layer in qmodel.layers:
  for i, weights in enumerate(layer.get_weights()):
    print(weights)

The example of printed weights:

[[ 0.08662941 -0.05719738 -0.05291974 ... -0.6543944   0.13776235
   0.39616233]
 [ 0.1125139  -0.09429312  0.16143066 ...  0.12786183  0.1350617
  -0.02886106]
 [ 0.14597955  0.11171963  0.14480615 ...  0.31972137  0.17480904
  -0.15030576]
 ...
 [ 0.03954179 -0.01506722 -0.09103195 ... -0.11322258  0.07701313
  -0.12551346]
 [-0.02650027  0.0823105  -0.01624984 ...  0.2262283   0.08772285
  -0.17474762]
 [-0.11531919 -0.02932754  0.1707585  ...  0.18108878  0.03475188
  -0.16486846]]

Could you, please, guide me, what should I do get int8 weights?

AttributeError: 'stochastic_ternary' object has no attribute 'shape' --- happened when using Conv1D

Hi,

I was trying to use the QConv1D layer and encounter this error. Does anyone have any idea why?

Traceback (most recent call last):
  File "quantize-Conv1D.py", line 179, in <module>
    UseNetwork(args.weight_file, save_model = args.save_model, load_weights=lw)
  File "quantize-Conv1D.py", line 142, in UseNetwork
    model = QConv1D_model(weights_f, load_weights)
  File "quantize-Conv1D.py", line 101, in QConv1D_model
    x = QConv1D(filters=128, kernel_size=3, kernel_quantizer="stochastic_ternary", bias_quantizer="ternary", name="conv1d_1")(x)
  File "/home/duchstf/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/keras/engine/base_layer.py", line 842, in __call__
    outputs = call_fn(cast_inputs, *args, **kwargs)
  File "/home/duchstf/miniconda3/lib/python3.7/site-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
    raise e.ag_error_metadata.to_exception(e)
AttributeError: in converted code:
    relative to /home/duchstf/miniconda3/lib/python3.7/site-packages:

    qkeras/qconvolutional.py:127 call  *
        outputs = tf.keras.backend.conv1d(
    tensorflow_core/python/keras/backend.py:4804 conv1d
        kernel_shape = kernel.shape.as_list()

    AttributeError: 'stochastic_ternary' object has no attribute 'shape'

This is how the QConv1D layer was used in my code

x = x_in = Input(IN_SHAPE, name="input")
x = QConv1D(filters=128, kernel_size=3, kernel_quantizer="stochastic_ternary", bias_quantizer="ternary", name="conv1d_1")(x)

Thanks,

Duc.

Recurrent networks quantized_bits with alpha

quantized_bits should be the same using the default value None and alpha = 1.0.

scale set as 1.0 with self.alpha = None:
https://github.com/google/qkeras/blob/master/qkeras/quantizers.py#L550

scale set as 1.0 with self.alpha = 1.0:
https://github.com/google/qkeras/blob/master/qkeras/quantizers.py#L597

And can they return the same values as demonstrated with this example:

q_noalpha = quantized_bits(14, 4, 1)
q_alpha = quantized_bits(14, 4, 1, alpha=1.0)
testvalues = np.arange(-30,30,0.00001, dtype='float32')
np.argwhere((q_noalpha(testvalues) == q_alpha(testvalues)) == False)

Which returns an empty array meaning that both quantizers return the same values.

However with recurrent networks changing between these two changes the result. I have attached a small reproducible output that is based on the test code from qrecurrent_test.py to demonstrate the behavior. QLSTM and QSimpleRNN both give results that do not match.

np.random.seed(31)

tf.random.set_seed(31)

inputs = 2 * np.random.rand(10, 2, 4)

rnn = qkeras.QSimpleRNN

x = x_in = Input((2, 4), name='input')
x = rnn(
16,
activation=quantized_tanh(bits=8),
kernel_quantizer=quantized_bits(8, 0, 1, alpha=1.0),
#recurrent_quantizer=quantized_bits(8, 0, 1, alpha=1.0),
bias_quantizer=quantized_bits(8, 0, 1, alpha=1.0),
state_quantizer=quantized_bits(4, 0, 1, alpha=1.0),
name='qrnn_0')(
    x)
x = qkeras.QDense(
  4,
  kernel_quantizer=quantized_bits(6, 2, 1, alpha=1.0),
  bias_quantizer=quantized_bits(4, 0, 1),
  name='dense')(
      x)
x = Activation('softmax', name='softmax')(x)

model = Model(inputs=[x_in], outputs=[x])


# save weights
save_weights = model.get_weights()

original_output = model.predict(inputs).astype(np.float16)


x = x_in = Input((2, 4), name='input')
x = rnn(
16,
activation=quantized_tanh(bits=8),
kernel_quantizer=quantized_bits(8, 0, 1),
#recurrent_quantizer=quantized_bits(8, 0, 1),
bias_quantizer=quantized_bits(8, 0, 1),
state_quantizer=quantized_bits(4, 0, 1),
name='qrnn_0')(
    x)
x = qkeras.QDense(
  4,
  kernel_quantizer=quantized_bits(6, 2, 1),
  bias_quantizer=quantized_bits(4, 0, 1),
  name='dense')(
      x)
x = Activation('softmax', name='softmax')(x)

model = Model(inputs=[x_in], outputs=[x])

model.set_weights(save_weights)


output_no_alpha = model.predict(inputs).astype(np.float16)

print(original_output-output_no_alpha)

Expected output:

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

Actual output:

[[-0.02148  -0.0864   -0.01233   0.12036 ]
 [ 0.02441  -0.03333   0.01453  -0.00537 ]
 [ 0.08765  -0.0879   -0.003296  0.003418]
 [ 0.02734  -0.1011   -0.006287  0.0801  ]
 [-0.005615 -0.0481    0.03137   0.02222 ]
 [ 0.004883 -0.0454    0.001862  0.03857 ]
 [-0.04565  -0.03235  -0.00908   0.0874  ]
 [-0.007324 -0.04956  -0.002075  0.05884 ]
 [ 0.02588  -0.04425   0.01611   0.001953]
 [ 0.02344  -0.04398   0.0426   -0.02197 ]]

`keep_negative` parameter in `quantizers.quantized_bits` should be boolean

In

qkeras/qkeras/quantizers.py

Lines 318 to 319 in a55548e

def __init__(self, bits=8, integer=0, symmetric=0, keep_negative=1,
alpha=None, use_stochastic_rounding=False):

keep_negative is a boolean parameter by its context and documentation. In

self.keep_negative = (keep_negative > 0)

it's also converted to one. Is there a reason why its default value is 1, of type int? I might be missing something since I'm new to the package.

print_qstats(sho_fitter.model)

Hello, I tried to quantize a simple harmonic oscillator fitter model, but when I tried to few the stats of the layers in the model, an error occurred. I was hoping that you guys may have a solution? The method seems to be having trouble primarily with the quantizers of the dense layers--quantized_bits specifically, but I do not understand much more than that.

I have attached the error message below:

`WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/qkeras/estimate.py:314: Tensor.experimental_ref (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use ref() instead.

ValueError Traceback (most recent call last)
in ()
----> 1 print_qstats(sho_fitter.model)
3 frames
/usr/local/lib/python3.6/dist-packages/qkeras/estimate.py in print_qstats(model)
491 """Prints quantization statistics for the model."""
492
--> 493 model_ops = extract_model_operations(model)
494
495 ops_table = defaultdict(lambda: 0)
/usr/local/lib/python3.6/dist-packages/qkeras/estimate.py in extract_model_operations(model)
469 operations[layer.name] = {
470 "type":
--> 471 get_operation_type(layer, cache_q),
472 "number_of_operations":
473 number_of_operations if isinstance(number_of_operations, int) else
/usr/local/lib/python3.6/dist-packages/qkeras/estimate.py in get_operation_type(layer, output_cache)
293 if output_cache.get(layer.input.experimental_ref(), None) is not None:
294 x_mode, x_bits, x_sign = get_quant_mode(
--> 295 output_cache.get(layer.input.experimental_ref()))
296 else:
297 print("cannot determine presently model for {}".format(layer.name))
/usr/local/lib/python3.6/dist-packages/qkeras/estimate.py in get_quant_mode(quant)
253 mode = 4
254 return (mode, bits, sign)
--> 255 raise ValueError("Quantizer {} Not Found".format(quant))
256
257 `

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.