tinyML Talks: Demoing the world’s fastest inference engine for Arm Cortex-M



Demoing the world’s fastest inference engine for Arm Cortex-M
Cedric Nugteren
Deep learning software engineer
Plumerai

Recently we announced Plumerai’s inference engine for 8-bit deep learning models on Arm Cortex-M microcontrollers. We showed that it is the world’s most efficient on MobileNetV2, beating TensorFlow Lite for Microcontrollers with CMSIS-NN kernels by 40% in terms of latency and 49% in terms of RAM usage with no loss in accuracy. However, that was just on a single network and it might have been cherry-picked. Therefore, we will give a live demonstration of a new service that you can use to test your own models with our inference engine. In this talk we will explain what we did to get these speedups and memory improvements and we will show benchmarks for the most important publicly available neural network models.

source

165 thoughts on “tinyML Talks: Demoing the world’s fastest inference engine for Arm Cortex-M

Leave a Reply

Your email address will not be published.