Gartner has recently released its 2019 Market Guide for Data Preparation (), its fourth edition of a guide that was first published in the early days of the market, back in 2015 when Data Preparation was mostly intended to support self-service uses cases. Compared to Magic Quadrants, the Market Guide series generally cover early, mature or smaller markets, with less detailed information about competitive positioning between vendors, but more information about the market itself and how it evolves over time.
While everyone’s priority with these kinds of documents might be to check the vendor profiles (where you’ll find Talend Data Preparation listed with a detailed profile), I would recommend focussing on the thought leadership and market analysis that the report provides. Customers should consider the commentary delivered by the authors, Ehtisham Zaidi and Sharat Menon, on how to successfully expand the reach and value of Data Preparation within their organization.
After searching the report myself, I thought I’d share three takeaways addressing our customers’ requirements in that exciting market.
Data Preparation Turns Data Management into a Team Sport
Self-Service was the trend that started the data preparation market. This happened at a time when business users had no efficient way to discover new data sources before they could get insights, even after they were empowered with modern data discovery tools such as Tableau or Power BI. They had to depend on IT… or alternatively create data silos by using tools like Microsoft Excel in an ungoverned way.
Data Preparation tools addressed these productivity challenges, an area where reports have shown that data professionals and business analysts spend 80% of their time searching, preparing protecting data before they could turn actually turn them into insights. Data Preparation came to the rescue by enabling a larger audience with data integration and data quality management.
This was the challenge in the early days of the 21st century, but since that time data has turned into a bigger game. It is not only about personal productivity but also about creating a corporate culture for data-driven insights. Gartner’s Market Guide does a great job at highlighting that trend: as disciplines and tools are maturing, the main challenge is now to turn data preparation into a team sport where everybody in the business and IT can collaborate to reap the benefits of data.
As a result, what’s critical is operationalization. In order to capture what lines of business users, business analysts, data scientists or data engineers are doing ad-hoc and turn it into an enterprise-ready asset that can run repetitively in production in a governed way. Ultimately, this approach can benefit to enterprise-wide initiatives such as data integration, analytics and Business Intelligence, data science, data warehousing or data quality management.
Smarter people with smarter tools… and vice-versa
Gartner’s market report also highlights how tools are embedding the most modern technologies, such as data cataloging, pattern recognition, schema on read or machine learning. This empowers the less skilled users to do complex activities with their data, while automating tasks such as transformation, integration, reconciling or remediation as soon as they become repetitive.
What’s even more interesting is that Gartner relates those technologies innovation with a market convergence, as stated in this prediction: “By 2024, machine-learning-augmented data preparation, data catalogs, data unification and data quality tools will converge into a consolidated modern enterprise information management platform”.
In fact, a misconception might have been to consider Data Preparation as a separate discipline geared towards a targeted audience of business users. Rather, it should be envisioned as a game-changing technology for information management due to its ability to enable potentially anyone to participate. Armed with innovative technologies, enterprises can organize their data value chain in a new collaborative way, a discipline that we refer to at Talend as collaborative data management, and sometimes also referred to as DataOps by some analysts, including by Gartner in the market guide.
Take Data Quality management as an example. Many companies are struggling to address their Data Quality issues because their approach rely too heavily on a small number on data quality experts from a central organization such as central IT or the office of the CDO. Although those experts can play a key role in orchestrating data quality profiling and remediation, they are not the ones in the organization that know the data best. They need to delegate some of the data cleansing effort to colleagues that are working closer to where the data is sourced. Empowering those people with simple data preparation tools makes data quality management much more efficient.
The value of the hybrid cloud
Gartner also heard growing customer demands for Data Preparation being delivered through innovative Platform as a Service deployment models. What they highlight are requirements for much more sophisticated deployment models that goes beyond basic SaaS. The report notes that “organizations need the flexibility to perform data preparations where it makes the best sense, without necessarily having to move data first”. They need a hybrid model to meet their constraints, both technical (such as pushing down the data preparation so that it runs where the data resides) and business (such as limiting cross borders data transfers for data privacy compliance).
This is a brilliant highlight, one that we are seeing very concretely at Talend: we are hearing sophisticated requirements in our Data Preparation tool with respect to hybrid deployments: Some of our cloud customers are requiring to run their preparations on premises. Others want a cloud deployment, but with the ability to access remotely to data inside the company’s firewalls through our remote engines. Others want to be able to operationalize their data preparations so they can run natively inside big data clusters.
Are you ready for Data Preparation? Why don’t you give it a try?
Enabling a wider audience to collaborate on data has been a major focus for Talend over the last 3 years. We introduced Talend Data Preparation in 2016 to address the needs of business analysts and lines of business workers. One year later, we released Talend Data Stewardship, the brother in arms of Data Preparation for data certification and remediation. Both applications were delivered as part of Talend Cloud in 2017. In Fall 2018, we brought a new application, Talend Data Catalog to foster collaborative data governance, data curation and search-based access to meaningful data.
And now we are launching Pipeline Designer. As we see more and more roles from central organizations or lines of business that want to collaborate on data, we want to empower those new data heroes with a whole set of applications on top of a unified platform. Those applications are designed for the needs of each of those roles in a governed way, from analysts to engineers, from developers to business users, and from architects to stewards.
2019 is an exciting year for Data Preparation and Data Stewardship. We added important smart features in the Spring release, for example extracting part of a name into respective sub-parts with machine learning or extracting parts of a field into subparts based on semantic types definition, i.e. the ability to split a field composed of several parts into the respective sub-parts. We improved the data masking capabilities, a highly demanded set of functions now that GDPR, CCPA and other regulation are raising the bar for privacy management. Stay tuned for other innovations coming in this year that leverage machine learning, deliver more options for hybrid deployment and operationalization, or allow to engage a wider range of data professionals and business users to collaborate for trusted data in a governed way.