Edge Analytics and machine learning, the truth, and the fallacies

Author – Tim Smith

Integrating data from machines, OT systems and enterprise systems on the Factory Floor is a big topic within Industry 4.0. There is a philosophy that intelligent integrations with logic, including machine learning, can be done on the shop floor.

The only legitimate reason that edge analytics is even a consideration in data collection is as a response to latency, connectivity and bandwidth costs inherent in the collection and analysis of big data. Rachael Taylor states in her article “What’s holding back edge analytics” ( https://thenewstack.io/whats-holding-back-edge-analytics ), “By 2025, it’s predicted there will be 41.6 billion Internet of Things (IoT) devices generating 79.4 zettabytes of data. While this explosion of connected devices will benefit businesses by providing access to more data to derive better insights, it is also going to put immense pressure on enterprise architecture. Moving massive amounts of data from devices and sensors to a data center or the cloud introduces issues with latency, bandwidth, connectivity, and cost. Businesses are turning to edge computing as a strategy to handle this influx of data and applying analytics at the edge to gain actionable insights in real-time. Rather than try to bring data to the data center or the cloud faster, an efficient approach is to bring processing and analytics to the devices that create the data.”

There is truth in that by adding a level of compute outside of the enterprise or cloud engine will, where necessary offload work from the engine, closer to the edge, but beyond that statement there is a number of fallacies attached to edge analytics.

The “Edge Camp” believes that everything should be done at the edge including analytics and machine learning. They believe that machine learning models should be rewritten into small containers to be applied at the edge to predict events such as catastrophic failures. They have taken a device which was deployed to manage the collection and preprocessing of data to minimize data load to the enterprise and shoe-horned in a concept that the device should also provide analytics and predictive model responses. Furthermore, the Edge Camp expects their devices to be deployed on all machine assets, even if native protocols exist to output data. With machines that output MTConnect, FOCAS or OPC UA an edge device will trade off a data rich protocol for whatever the device can accept. IoT deployments around utility metering and other low data requirements does not come close to the demand on IIoT devices. The paper “Dependability in Edge Computing” by Purdue University ( https://engineering.purdue.edu/dcsl/publications/papers/2019/dependability_in_edge_computing.pdf ) cites a number of critical considerations related to IIoT edge devices.

This author contends that by deploying a fog computing infrastructure, where IIoT gateways are placed near the edge to process the data, will address the limitations of attempting to move all the raw data to a central processing engine. I am an advocate of preprocessing data to support validation, normalizing and providing data output rules. But the stored dataset must be intact to recreate any metrics associated with the events.

Fallacy #1 The edge computing eliminates the scalability issue- Wrong, it is only as powerful as the device. Not all hardware supports analytical processes. Simply put, not every IIoT device has the memory, CPU and storage hardware required to perform deep analytics onboard the device. Scalable redundancy can be built into the cloud or enterprise solution. The same cannot be said for the devices on the shop floor.

Fallacy #2 Edge can do all of the analytics- Wrong, there are 5 areas of critical data, whereas the automated data from the device is only one data type and considering that it is the only required dataset is myopic. Critical data comes from the machine, the Job, the Shop and Maintenance. Without data from each of these areas, any analytics will be incomplete. The convergence of data sources is key to proper analytics. Data must flow from the machine, the operator, and the backend business systems. An analytical process solely from the device is not valid given the missing data.

Fallacy #3 The edge is sufficient for machine learning. That premise is wrong. The edge does not perform machine learning. It is only applying a model. the machine learning is still done at the enterprise level where there is sufficient compute power and access to the aggregated data to build and test models.

Fallacy #4 The edge is more secure- Claims of security because the data is edge processed and does not need to be sent across the wire is a fallacy. With respect to gateway and engine connectivity, encrypted tunnels are the norm. The Edge Camp is trying to position a feature that neither mitigates an issue nor adds value.

Fallacy #5 Edge computing reduces cost. Wrong, cost reduction only applies if it is related to ingress and egress costs of cloud computing, and that claim only stands if you stop sending data to the cloud and somehow manage and store it locally. But since storage is usually at the enterprise or cloud engine, then the claim doesn’t hold water.

The concept of leveraging critical analytics on edge devices is purely a commercial play. If a vendor can get a customer to depend on their devices, then they are assured to be part of the solution for the foreseeable future. For digital manufacturing, some type of connectivity is required. But it makes sense to seek a more agnostic approach to protect oneself for device obsolescence, price gouging, and single source shortages.

The Edge Camp claims near real time analysis. There has never been a challenge to real time response. If the Edge Camp is talking about millisecond response to avert catastrophic failure, then that is a different application. The former is a model for prescriptive response and the latter is a predictive response, easily solved at the PLC level. Additionally, these devices rarely report to the enterprise the frequency at which the modelled decision has been executed, thus losing the ability to refine the model.

Further, with more complicated edge devices the potential for failure is higher. Where, in the past sensors with a reputation for longevity are employed for data collection with the push of the raw data to local gateways, the ability for store and forward is standard. When more complex devices fail at the machine edge, it causes data to be permanently lost. The Edge Camp is myopic in its understanding of what data is important since they are only focused on machine specific analytics. They do not consider that the machine data is critical to drive system responses in relation to the other four data types and two data sources detailed earlier.

Finally, the IIoT devices do not aggregate their data, in fact the original raw data is replaced by the processed output which makes it unavailable for other related analytical processes.

Edge computing has its place in the emerging Industry 4.0 data driven manufacturing approach. However, the prime reason for an edge device is to capture data not normally available from the machine asset. The device must be designed for industrial use with optical isolation. It must be easily swapped out in case of failure. It must have a track record with a MTBF of thousands of hours. It should not have moving parts (fans, spinning rust) that can fail. It must talk a standard protocol. It must be configurable to be pointed at, or available to any target data collector, such as a gateway.

Just as everyone is not a Data Scientist, no one can expect a limited compute device to provide the power for high level analytical processing. Any consideration beyond data collection and potentially running a model for machine health is an unreasonable expectation. So, before you invest in deploying high priced edge devices, give thought to the enterprise system they should be talking to. The human interface and its wealth of tools is much more important than the machine interface. That is the truth.