A low power and High Performance Artificial Intelligence Inference Approach for Embedded Data Processing
Mandar HARSHE, Senior Developer
Klepsydra Technologies GmbH
Machine learning techniques when used in the automotive, space and IoT fields require a large number of high-quality sensors and a corresponding high computational power. New sensors available in the market produce this high quality data at desired high rates, while new processors allow a substantial increase in available computation power. Current mobile phones are also equipped with cameras having a higher resolution than cameras used traditionally in the automotive or space sectors. This combination of increased computational power coupled with better high quality sensors allows for the consideration of advanced embedded artificial intelligence algorithms.
However, the use of advanced AI algorithms with increased sensor data, and increased processor power brings new challenges with it: low determinism, excessive power consumption, large amounts of potentially redundant data, parallel data processing, and cumbersome software development. Current approaches to address these challenges and to increase the throughput of data processed by AI algorithms are to consider using FPGAs or GPUs. However, these solutions present other technical problems including increased power consumption and programming complexity and, in case of space applications, radiation hardening limitations.
We present a novel approach to AI that can produce deterministic and optimal code for scenarios presented above and which can work on a CPU. The approach uses advanced lock-free programming techniques coupled with high-performance event loops to optimize the data flow through neural networks during AI inference. Lock-free techniques reduce context switching caused by traditional approaches, reducing CPU usage. They are also suitable for higher throughput when used with fast event loops. The approach presented uses these ideas to stream data between layers of deep neural networks and allows for an increased throughput in data processed. This approach allows the use of state of the art AI algorithms to process data onboard, thus allowing the target application to work with limited connection to the cloud and reduces the costs associated with storing and transferring redundant or low-value data.
The approach is also highly configurable to different target scenarios and can be tuned easily to address different constraints like throughput, latency, CPU consumption or memory requirements. The solution presented is used in aerospace and robotics applications to produce data and process it at rates substantially higher than other available products. The experimental setup tests the Alexnet and Mobilenet V1 architectures on an Intel machine and a Raspberry Pi running Ubuntu. The presented results show that our implementation is not only high performance and scalable but also has a substantially low power consumption, which makes it suitable for a variety of applications and allows targeting different constraints as per the application.