EMEA 2021 https://www.tinyml.org/event/emea-2021
The model efficiency pipeline, enabling deep learning inference at the edge
Bert MOONS, Research Scientist, Qualcomm
Today, most deep learning and AI applications are developed on and for high-performance computing systems in the cloud. In order to make them suitable for real-time deployment on low-power edge devices and wearable platforms, they have to be specifically optimized. This talk is an overview of a model-efficiency pipeline that achieves this goal: automatically optimizing deep learning applications through Hardware-Aware Neural Architecture Search, compressing and pruning redundant layers and subsequently converting them to low-bitwidth integer representations with state-of-the-art data-free and training-based quantization tools. Finally, we take a sneak peek at what’s next in efficient deep learning at the edge: mixed-precision hardware-aware neural architecture search and conditional processing.