tinyML Summit 2021 https://www.tinyml.org/event/summit-2021
tinyTalks Algorithms and Tools
“Low-precision Winograd Convolution over Residue Number System”
Zhi-Gang LIU, Research Engineer, Arm
The low-precision (8 or sub-8bit) convolutional neural networks consume a fraction of memory footprint and power comparing to high-precision models running on mobile or embedded devices. The classical fast Winograd convolution algorithm requires high-precision floating-point operation and thus fails to accelerate the low-precision CNN. So, the current state-of-the-art low-precision convolution is a GEMM based approach relying on im2col or im2row transformations to convert the convolution into GEMM operation and each output demands 9 MAC operations for popular 3×3 filter, 25 ops for 5×5 filter. This work extends the Winograd algorithm to modular arithmetic and explores the optimized implementation of the fast low-precision convolution for ultra-low power machine learning (ML) at the edge. The new approach has arithmetic reduction up to 6.8x corresponding to 16×16 transformation tiles and only relies on int8 or int16 op which are well supported by commodity edge devices. We evaluated the performance of proposal with sub-8bit VGG16 and ResNet50v1 models on ImageNet dataset using Arm cortex A53 cpu and M7 mcu and observed more than 2x convolution latency reduction.