Integrating Embedded Machine Learning to Real-World Applications

In tests, AI models produced amazing results; nevertheless, implementing them in real-world applications necessitates integrating neural networks with pre and post-processing procedures. As a result, versatile hardware and firmware platforms are required.

Key Points

Choosing adaptable hardware for DNN architectures for applications like as ATM camera systems.
How “embeddings” might be useful in image-recognition reidentification.
How Arcturus Networks used NXP’s i.MX8 architecture to create a configurable vision pipeline.

Machine learning based on deep-neural-network (DNN) architectures has shown excellent results in a variety of studies, notably in tasks like item and person detection in pictures. Many of these tests, as well as the first real-world deployments, were carried out on high-performance cloud-server hardware capable of delivering the necessary computation throughput.

Processing on cloud-server hardware is impractical for a wide range of applications. Communications delay and limited bandwidth necessitate the usage of local intelligence for processing.

Consider a camera system used to monitor behaviour at an automated teller machine (ATM) in conditions like social distance during the current COVID-19 epidemic. Banks have discovered that it is critical to use automated monitoring at each ATM to ensure that individuals in the queue or in the surrounding area do not stand too close together. Furthermore, they may wish to restrict access to a lobby-based ATM or the machines themselves to customers wearing masks.

One method of processing the data is to send live footage from security cameras to the cloud. However, in all except dense metropolitan areas, this might be prohibitively expensive and impossible to execute. Furthermore, the system’s ability to respond to arrivals and crowd movements is hampered by communications latency.

Processing video data locally has the potential to deliver much-improved responsiveness if the hardware can meet the computing needs of the suitable image recognition pipeline.

Model Options

Integrators seeking edge-computing processors intended to tackle the performance problems associated with DNNs have many options. For throughput, some use customized graphics processing unit (GPU) designs. However, specialized neural processing unit (NPU) architectures for DNN processing can provide higher performance-energy ratios.

The most essential factor to consider in selecting hardware that provides great flexibility rather than merely strong results for off-the-shelf benchmark DNNs like ImageNet or MobileNet. Preprocessing procedures are frequently required to transform picture data into a format suitable for the application, and the DNNs involved must be fine-tuned to accommodate the application’s unique needs.

The necessity to deal with masks adds to the complexity. This is not as simple as simply teaching the network to identify that there is a subset of people who wear masks. Personal protection equipment can also take the form of helmets or other face coverings (PPE). The network must be able to consider various different forms of face or head coverings, each with its own detection class.

Other criteria may be imposed on the system, such as the ability to identify suspicious activity (e.g., loitering) and in situations when the subject is not constantly visible. They may move in and out of the camera’s frame of vision at various times. This necessitates the capacity to monitor individuals throughout time rather than just recognizing mask users, as well as ensuring that the persons in view are adequately separated.

Each of these demands necessitates changes to the model’s functioning as well as preprocessing procedures. Because partial occlusions and body posture generate variability in detection findings, determining whether a subject is wearing PPE with live video is more challenging than in controlled studies.

The judgment must be made utilizing the findings of many frames to enhance accuracy (Fig. 2). This, in turn, necessitates motion tracking on each individual inside the field of vision.

In theory, motion tracking alone is an effective choice because it requires less computing resources. It does, however, rely on continual detections. Occlusions, obstructions, or a person leaving and reentering the field of vision would result in the person being regarded as new subjects rather than reidentifying them in a simple image-recognition system.

For example, using embeddings, which are representations of the objects in the field of vision with extra data encoded into them, is a successful technique for dealing with reidentification. Language processing frequently uses embeddings. For example, they encode words and phrases as vectors, allowing them to be clustered in vector space with comparable meanings.

In the instance of the ATM-monitoring program, an embedding is used to represent not only the visual look of the item but also where the object was last seen in a frame and what class that object was allocated. The visual appearance of the pixels inside the localization bounding box is sampled to produce a feature vector that may be utilized for further comparisons.

Embeddings have the advantage of being able to be shared across multiple-camera systems, which may be used to improve accuracy and scalability over bigger regions. The embeddings can also be utilized for archive searches, such as creating active watch lists for offline research.

Flexible Architecture

A flexible architecture that can manage the many aspects of a real-world machine-learning architecture is required. Arcturus’ work has taken use of the NXP i.MX8 Plus architecture’s versatile mix of processing components to develop an easily configurable vision pipeline. Different phases of processing are represented by nodes in the Arcturus method.

Arcturus offers a library of prebuilt models with varying precisions. These models have been pre-validated to work with all main edge runtimes, including Arm NN, TensorFlow Lite, and TensorRT, as well as CPU, GPU, and NPU support. There is software for training and fine-tuning models, as well as dataset curation, image scraping, and augmentation.

When compared to other publicly accessible systems running the same model, the outcomes of that combination of improved runtime, quantized model, and NPU hardware can give a 40X performance gain. In a library like this, comprehensiveness is essential. Edge runtime versions frequently do not support all layers required by all sorts of networks.

Newer models that may perform better are less widely supported than older kinds utilized in often referenced benchmarks.

The last component is a runtime inference engine capable of loading the DNN model onto the i.MX 8M Plus. Ported and verified versions of the Arm NN and TensorFlow Lite inference engines are available in NXP’s eIQ machine-learning software development environment.

In real-world machine learning applications, flexibility and performance scalability are critical. Each application is unique, and it will impact not just the choice of DNN, but also the processing that will take place around it. It is critical to have a structure that supports this demand for flexibility.

And it’s one of the main reasons why the combination of an Arcturus microservices environment and processing hardware like the NXP i.MX8 Plus may be a strong instrument in the migration of machine learning to the edge.

Summary

And it’s one of the main reasons why the combination of an Arcturus microservices environment and processing hardware like the NXP i.MX8 Plus maybe a strong instrument in the migration of machine learning to the edge.

Facebook Comments Box