CORRECT QR Codes
“Cracking a 600 million year old secret to fit computer vision on the edge”
Dr. Shivy Yohanandan
Co-founder and Chief Technology Officer
The ultimate goal of AI IoT is to be aware of our surroundings through sensors which can respond in real-time so we can be more selective with how we use and manage our limited resources, which reduces business and environmental cost. But the big problem with AI IoT is that current AI uses more energy to process IoT data than the energy it’s trying to save, which is a paradox. The main cause is expensive algorithm families like YOLO, SSD, R-CNN, and their derivatives, which account for most of the computer vision algorithms used by everyone!
YOLOs and SSDs do object detection (a staple in most computer vision) by shrinking the full resolution image to 416×416 or 300×300 and then doing both localization and classification on this shrunken image. But you’ve now lost over 95% of information from the original image, which is why accuracy, robustness and generalizability seems to be poor, especially when trying to scale across many IoT sensors (e.g. cameras). In addition to this inherent design flaw, these models are huge and computationally expensive, which is why everyone is trying to fit them on the edge by shrinking these models. However, this often results in losing even more accuracy on a model that was already inaccurate to begin with!
Xailient solved this problem by cracking a 600 million year old secret in biological vision: selective attention and salience. The secret mechanism shows us how to split object detection into two separate models: detection and classification. This results in Xailient’s detector being only 44 KB — 5000x smaller than YOLO! You can then use your own flavor of classifier to process each detected ROI one-by-one, except now using a crop from the original image, thus preserving more information for better accuracy. So we’ve solved both model size and accuracy in one hit!