Machine Learning

tinyML Summit 2023: Multi-Lingual Digital Assistance on Edge Devices



Multi-Lingual Digital Assistance on Edge Devices
Mahesh GODAVARTI, Engineering Technical Leader, Cisco Systems

Advances in speech-recognition technology in the past decade have led to ubiquitous deployment of natural language and speech-based digital assistant technologies albeit, in the cloud where the latency of ASR can be high. With a system that is built to mimic normal conversations, this latency can lead to non-natural user experiences. With an ASR model based in the cloud, there is no one source that we can reliably pinpoint to reduce latency, as factors out of our immediate control (back and forth transmission time, end of speech detection, server location, etc.) can vary along user-specific lines. But hardware and software compute limitations on edge devices also makes offloading an entire multilingual ASR model infeasible for recognition of domain-agnostic natural language input. Therefore, developing a hybrid model, that resides partly in the cloud and partly on the edge device, is key to developing natural interactions between digital assistants and end users. This is where the tinyML approach plays a crucial role in the development of a hybrid system that can be supported by the edge device’s compute capabilities.

We present our approach to building a natural language multilingual hybrid-model that combines the expertise of linguistics, modeling, and implementation to create a system that mimics a system running wholly on the edge device. Our hybrid-model consists of a short-phrase “local command-recognition” system running on the edge device and a larger natural language command-recognition system running in the cloud. We describe the process for selecting the commands to be offloaded to the local device and the training regimen for developing the tinyML “command-recognition” model. This model’s network architecture includes a common embedding network, universal to all languages, followed by a per-language decision network that captures the variations in output commands, specific to each language. We then share the improvement in user experience in terms of reduced latency using our hybrid approach.

source

Authorization
*
*
Password generation