EMEA 2021 Student Forum
Enabling autonomous navigation on UAVs with onboard MCU based camera running TinyML models at the edge
Andrea ALBANESE, Fellow Researcher, University of Trento
Drones are becoming essential in everyday life; they can assist experts in different fields for challenging tasks, demonstrating feasibility and success. However, autonomous drones that can act autonomously in the surrounding environment would be more efficient, reliable, and permit scalable and cheaper solutions that are not human-assisted. Camera on-board can be used with deep learning models for more and more complex tasks such as monitoring, autonomous navigation, rescue, and aerial surveillance. Many drone applications are based on the cloud computing approach. This technique is limited due to the data transmission that introduces latency (e.g., tight to satisfy real-time requirements) and a consistent energy consumption. Drones flight time is a bottleneck; thus, an edge computing approach that executes all vision and control algorithms on the vehicle is preferable to improve reactiveness and efficiency even in small-size UAVs. Recent papers have shown the possibility to use micro-computers for edge AI computing; however, their power consumption is still in the order of watts. MCU based cameras with Tiny inference onboard are also being investigated for the smaller drones. Common DL algorithms for detection and classification cannot be used because they exceed the maximum memory available in MCU based cameras and pruning and quantization are essential to fit complex models on MCU devices.
We present a particular approach for assisting autonomous navigation. The idea is to use arrows or other symbols drawn on the ground to suggest high-level targets and actions (example in Figure 1). The drone finds the “written messages”, classifies, and decodes the action, and elaborates the paths for its autonomous navigation towards the target detected ad-hoc CNNs. In this way, drones do not need the supervision of an expert pilot, and more vehicles can navigate in a shared space interpreting indications on the ground. Human operators will no longer be in a real-time control loop but provides only high-level goals by writing on the floor or with gestures.
We used OpenMV cam H7 Plus, which consumes only 240mA at 3.3V in active mode. It is based on the STM32H7 ARM Cortex M7 processor running at 480MHz with 1MB of internal SRAM and 2MB of internal flash. The arrow direction is predicted with a custom DL algorithm which consists of a classifier among eight classes: 0°(north), 45°(north-east), 90°(east), 135°(south-east), 180°(south), 225°(south-west), 270°(west) and 315°(north-west). The dataset has been generated in a semi-automatic way, starting from raw videos acquired off-line representing arrows of different fonts and sizes in order to ensure a generalization capability to the network. Videos are processed and the obtained dataset consists in 26600 images for training and 6700 images for test equally split among the 8 classes. The DL model has been trained and tested with input image size 64 x 64 and with three different CNN architectures in order to compare them with a small (SqueezeNet), medium (MobileNetV2) and big (LeNet-5) size structure complexity. Then, the obtained models are analyzed and optimized to fit the camera memory constraint, which is about 500KB because of the footprint of the main firmware and other libraries. In particular, model optimization techniques have been assessed both before and after training. Before training, the model architectures have been optimized by reducing their convolutional layers. A trial-and-error approach has been used to find the best trade-off to fit model size and obtained accuracy. After training, cascade pruning and a float fall back quantization are also executed. Three architectures are compared in terms of memory optimization, and the loss in accuracy during their intermediate representations.