EMEA 2021 Student Forum
Squeeze-and-Threshold based quantization forLow-Precision Neural Networks
Binyi WU, PhD Student, Infineon Technologies AG
Problem statement: Deep Convolutional Neural Networks (DCNNs), widely used for image recognition, require a large amount of calculation and memory, making them infeasible to run on embedded devices. Among various effective techniques such as quantization and pruning, 8-bit quantization is the most widely used method. However, it is not sufficient for embedded devices with extremely limited hardware resource. Prior work has already demonstrated lower precision quantization is feasible but they have different schemes on 1-bit and multi-bit quantization. In this work, we proposed a new quantization method based on attention mechanism, which unify the binarization and multi-bit quantization of activations into one and demonstrate state-of-the-art performance.
Relevance to tinyML: The proposed low-precision (1,2,3,4-bit) quantization method is a neural network optimization method for low-power application.
Novelty: 1. First time to apply attention mechanism on quantization 2. A consistent method on 1-bit and multi-bit quantization
The floating-point convolution operation (left) is replaced with quantized convolution operation.