A number of data scientists reached out to me about data storage and processing as discussed in my last blog around ‘IoT’. All of their questions largely fell into the same bucket: they are puzzled by what to do with their data. Whether they should store or discard their enterprise data, and if stored, what is the best approach they can take to making that data a strategic asset for their company.
Despite widespread proliferation of sensors, the majority of industrial internet of things, or ‘IIoT’ data, collected is never analysed—which is tragic. Many existing IoT platform solutions are painfully slow, expensive and a drain on resources—which makes analysing the rest extremely difficult.
Gartner mentioned that 90% of deployed data will be useless and Experian mentioned about 32% of data in US firms to be inaccurate. The key takeaway is that data is the most valuable asset for any company. So it would be a shame to completely discard or let it lie dormant in an abandoned data lake somewhere. It’s imperative that all data scientists tap into their swelling pools of IoT data to make sense of the various endpoints of information and help develop conclusions that will ultimately deliver business outcomes. I am totally against discarding data without processing.
As mentioned in ‘IoT Blog’, in few years there will be an additional 15 to 40 billion devices generating data from the edge vs. what we have today. That brings new challenges. Just imagine an infrastructure transferring this data to data lakes and processing hubs to process. The load will continue to rise exponentially over coming months and years, creating just another problem of stretching the limits of your infrastructure.
The only benefit of this data will come from analysis either it is traffic of “things” or surveillance cameras. In time critical situations, if we delay this analysis that might be “too late”. The delay could be due to many reasons like limited network availability or overloaded central systems.
A relatively new approach namely “edge analytics” is in use to address these issues. Basically it is as simple as to say, perform analysis at the point where data is being generated. It’s about analysing in real-time on site. The architectural design of “things” should consider built-in analysis. For example, sensors in train or at stop lights that provide intelligent monitoring and management of traffic should be powerful enough to raise an alarm to nearby fire or police departments based on their analysis of the local surroundings. Another good example is security cameras. To transmit the live video without any change is pretty much useless. There are algorithms that can detect a change and if new image is possible to generate from pervious image, they will only send the changings. So these kind of events makes more sense to be processed locally rather than sending them over the network for analysis. It is very important to understand that where edge analytics makes sense and if “devices” do not support local processing, how we can architect a connected network to make sense of data generated by sensors and devices at the nearest location. Companies like Cisco, Intel and others are proponents of Edge computing and they are promoting their gateways as Edge computing devices. IBM Watson IoT, an IBM and Cisco project that is reshaping analysis architectural design by offering powerful analytics anywhere. Dell, a typical server hardware vendor, has developed special devices (Dell Edge Gateway) to support analytics on edge. Dell has built a complete system, hardware and software, for analytics that allows an analytics model to be created on one location or on cloud and deployed to other parts of the ecosystem.
However, there are some compromises that must be considered with edge analytics. Only a subset of data is processed and analysed. The analysis result is transmitted over the network. Which means that we are effectively discarding some of the raw data and potentially missing some insights. The situation arises here if this “loss” is bearable? Do we need the whole data or the result generated by that analysis is enough for us? What impact it will have? There is no generalised answer to this. An airplane system cannot afford to miss any data so all data should be transferred to be analysed to detect any kind of pattern that could lead to any abnormality. But still transferring data during flight is not convenient. So collection of data offline and edge analytics during flight is a better approach. The others where there is a fault tolerance can accept that not everything can be analysed. This is where we will have to learn by experience as organizations begin to get involved in this new field of IoT analytics and review the results.
Again, data is valuable. All data should be analysed to detect patterns and market analysis. Data driven companies are making a lot more progress compare to traditional one. IoT edge analytics is an exciting space and is the answer of maintenance and usability of data as many big companies are investing in it. An IDC FutureScape report for IoT reported that by 2018, 40 percent of IoT data will be stored, processed, analysed and acted where they are created before they are transferred to the network. Transmission of data cost and we need to cut the cost without impacting the quality of decision in a timely manner and Edge Analytics is definitely answer to that.
-  “The Data of Things: How Edge Analytics and IoT go Hand in Hand,” September 2015.
-  Forbes article by Bernard Marr, “Will Analytics on the Edge be the Future of Big Data?”, Aug 2016.