Building a Data Sharehouse – Agile Data Management and Industrial Data Space (IDS)
It's no secret that manufacturing companies face enormous challenges when it comes to managing their data. After all, it takes skill and time to integrate an array of IT systems (such as CAD, CAPP, MES, PPS and ERP) into a unified whole.
The IoT, cyber-physical systems, and Industry 4.0 make the situation even more complex. The amount of data that's available to manufacturers is already vast, and the volume continues to increase. Consider that a single aircraft engine on a flight from Munich to Frankfurt/Main produces more than 1TB of data, or that a standard vibration sensor generates approx. 1.3 TB of raw data in one year.
In order to stay competitive, manufacturers must be able to navigate the challenges that result from a constant stream of incoming data. Three strategies in particular are critical for success:
- Manufacturers must specify their data requirements with respect to timeliness, accuracy and relevance.
- Companies must identify and deploy tools that make it easier (and more cost-effective) to capture, store, and prepare their data
- Manufacturers need to reduce the time to value for all of their data
See how Talend helped Elsevier achieve agile, cloud-based data integration.
Data and industry 4.0
Data enables manufacturers to unlock their full potential and increase their ROI. That's because data can be turned into information, and that information can be turned into action. Widely recognized technology innovators such as Google, Facebook and Amazon provide some of the most well-known examples of how data can be used to develop new products, connect with customers, and hone a competitive edge.
The same is true for manufacturers and the B2B market. Companies striving for vertical and horizontal integration of their processes depend on seamless data exchange. This is why they have to address data management from a strategic perspective, i.e. to open the gates for exploratory analysis, self-learning processes and intelligent decision-making.
Industry 4.0 makes things more complicated. It makes an enormous difference whether the sensor data from an elevator or a welding machine is owned by the buyer or the supplier of the machine. The value of the data for the supplier is obvious: they can compare the data with similar data from installations world-wide. This information can increase their competitiveness, help to reduce costs by predictive maintenance or provide general information for further optimizations to the equipment.
A similar advantage is obvious for suppliers of mechanical parts. Obviously, the machine owner benefits from better service or, possibly, from the next generation of the machine. But in this scenario, it is the owner who has made the investment, who might upgrade the machine with sensors and who has the burden of integrating sensor data into their MES, PPS, ERP or logistic systems.
This leads us to some fundamental questions:
- How can companies leverage the value of their data in a technically volatile and interconnected world?
- What is necessary to guarantee interoperability, to claim ownership, to guarantee a certain level of data quality?
- How can we organize a collaborative vocabulary?
- How can we subscribe to a data source and report on the use of data?
Industrial Data Space Initiative
These questions are being addressed by the Industrial Data Space initiative (IDS). This initiative was launched in Germany at the end of 2014 by representatives from business, politics, and research. The overall goal was to provide a reference architecture for the safe, secure and transparent exchange of data between the producers and (possible) consumers of industrial data. The research results are currently being transferred to an association of the same name, which now has more than 100 members, many of them from the top league of German industry.
The main focuses of IDS are
- Data sovereignty — the data owner must be able to specify the terms and conditions of use for their data
- Easy linkage of data — a linked-data concept and common vocabularies will facilitate the integration of data between participants
- Trust — all participants, data sources, and data services of the IDS will be certified according to defined rules
- Secure data supply chain — data exchange will be secure across the entire data supply chain, i.e. from data creation and data capture to data usage
- Data governance — participants will jointly decide on data management processes as well as on applicable rights and duties.
This prominent approach requires very solid architectural foundations, which are based on four dimensions:
- Business architecture that addresses questions regarding the economic value of data, the quality of data, applicable rights and duties (data governance), and data management processes
- Data and service architecture specifies — in an application- and technology-independent format — the functionality of the IDS, especially the functionality of the data services, on the basis of existing standards (vocabularies, semantic standards etc.)
- Security architecture addresses questions concerning secure execution of application software, secure transfer of data, and prevention of data misuse and
- Software architecture specifies the software components required for pilot tests by IDS.
Confining my viewpoint to the data and service architecture, the diagram gives us a rough insight. The concept differentiates between three components: the connector for the exchange of data (request handling, data transformation, data preprocessing etc.), the broker (support and version control of the sources, search of sources, exchange agreements, monitoring etc.) and an app store (services for data transformation, quality support etc.). With the help of an app store, third parties can offer software code which could be injected into the connector to enrich the data with additional value (for instance from meta-data or analytics).
The IDS is an ambitious and unique approach. Currently, proof-of-concept projects for “collaborative supply chain risk management”, “intelligent inventory information” and “dynamic time slot management and tracking in cross-enterprise supply chains”, among others, are being carried out to demonstrate the technical viability of IDS.
The Future of Innovation with Third Party Data
From the perspective of data management, IDS can help companies to open their Enterprise Data Warehouses, Data Lakes or Hadoop-based storage for third parties. Digitally transformed companies must be able to guarantee the quality of their data. No one would accept defective machines, so why should someone accept invalid or erroneous data? The data provider (or their agent) has to prepare, merge, cleanse their data — fast, flexibly and without programming. And this is where Talend’s talent comes in. Talend and IDS perfectly complement each other. Talend’s Data Fabric can support the IDS stakeholders — Data Producer, Data Broker and Data User — to prepare, cleanse, transform, enrich, and integrate data assets. The Data Fabric’s ability to assure the quality of data and to create native code for integration into the Hadoop ecosystem is particularly valuable. While IDS is still in its infancy, Data Fabric is a sophisticated market-leading product. It remains to be seen if IDS will be able to gain traction in Germany and beyond to become a de facto standard for the exchange of data.
At QuinScape, an ambitious partner of Talend and a member of the IDS community, we are working on integrating Talend’s agile management capabilities with the opportunities of the IDS - for the benefit of our customers.
About the Author — Dr. Norbert Jesse