tinyML EMEA – Mart van Baalen: Advances in quantization for efficient on-device inference



Advances in quantization for efficient on-device inference
Mart VAN BAALEN
Staff Engineer/Manager
Qualcomm AI Research in Amsterdam

Deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on edge devices within tight power and thermal budgets. Quantization is particularly important because it allows for automated reduction of weights and activations to improve power efficiency and performance while maintaining accuracy. This talk will cover:

– FP8 vs INT8 formats for efficient inference
– Oscillations in quantization-aware-training
– Removing outliers for improved quantization of transformers and LLMs
– New mixed-precision methods

source

Authorization
*
*
Password generation