tinyML Summit 2021 https://www.tinyml.org/event/summit-2021
“Real-World Performance Analysis of Visual Wake Words”
Luke BERNDT, Senior Director, In-Q-Tel
The Google Visual Wake Words paper (Chowdhery et al., 2019) proposes techniques for creating object recognition models appropriate for microcontrollers. The paper demonstrates an accurate person detection model trained using the Microsoft Common Objects in Context (COCO) dataset. Because the COCO dataset is built on photographs found internet photography sites and because these images are composed by a photographer, the COCO dataset, we hypothesize, may be ill-suited for tinyML visual sensors. Typical visual sensors often have unusual perspectives of an object, which can result in poor object recognition.
We therefore investigated model performance on classes other than persons, evaluated performance by deploying the model on hardware in the wild, and then built a novel dataset for real world testing. In certain real-world environments, we found a decrease in accuracy of over 50%. Additionally, we investigated transfer learning and techniques for identifying blind spots in models to better target the augmentation of objects in the dataset. We find that extra care is needed when using general-purpose image datasets, like COCO, to train models for tinyML based visual sensors.