Gartner has just released its annual “Magic Quadrant for Data Quality Tools.”
While everyone’s first priority might be to check out the various recognitions, I would also recommend taking the time to review the market overview section. I found the views shared by analysts Saul Judah and Ted Friedman on the overall data quality market and major trends both interesting and inspiring.
Hence this blog post to share my takeaways.
In every enterprise software submarket, reaching the $1 billion dollar threshold is a significant milestone. According to Gartner estimates, the market for Data Quality tools has reached it a couple of months ago and “will accelerate during the next few years, to almost 16% by 2017, bringing the total to $2 billion”.
Although Data Quality represents a significant market already, its growth pace indicates that it has yet to reach the mainstream. Other signs that point to this include continued consolidation on the vendor side and, from a demand side perspective, a growing demand for democratization (in particular, lower entry costs and shorter implementations times).
Data quality is gaining popularity across data domains and use cases. In particular, “party data” (data related to customers, prospects, citizens, patients, suppliers, employees, etc.) is highlighted as the most frequent category. I believe demand for data quality is growing in this area because customer-facing lines of businesses are increasingly realizing that data quality is jeopardizing customer-relationship capabilities. To further illustrate this fact, see the proliferation of press articles mentioning data quality as a key success factor for data-driven marketing activities (such as this one titled Data quality, the secret assassin of CRM). In addition, media coverage appears to reinforce that data quality together with MDM of Customer Data, are “must haves” within CRM and digital marketing initiatives (see example in this survey from emarketer).
The Gartner survey referenced in the Data Quality Magic Quadrant also reveals that data quality is gaining ground across other domains beyond party data. Three other domains are considered as a priority: financial/quantitative data, transaction data and product data (and this wasn’t the case in last year’s survey).
In my view, this finding also indicates that Data Quality is gaining ground as a function that needs to be delivered across Lines of Businesses. Some organizations are looking to establish a shared service for managing data assets across the enterprise, rather than trying to solve it on a case by case basis for each activity, domain, use case, etc. However, this appears to be an emerging practice delivered in only the most mature organizations (and we at Talend would advise to only consider it once you have already demonstrated the value of data quality for some well-targeted use cases). Typically, those organizations are also those that have nominated a Chief Data Officer to orchestrate information management across the enterprise.
In terms of roles, Gartner sees an increasing number involved with data quality especially among the lines of businesses and states “This shift in balance toward data quality roles in the business is likely to increase demand for self-service capabilities for data quality in the future.”
This is in sync with other researches: for example, at a recent MDM and data governance event in Paris, Henri Peyret from Forrester Research elaborated on the idea of Data Citizenship.
Our take at Talend is that data quality should be applied where the data resides or is exchanged. So, in our opinion, the deployment model would depend on the use case: data quality should be able to move to the cloud together with the business applications or with the integration platforms that process or store the data. Data quality should not however mandate moving data from on premises to the cloud or the other way round for its own purposes.
Last, the Gartner survey sees some interest, but not yet a key consideration for buyers, for big data quality and data quality for the Internet of Things.
“Inquiries from Gartner clients about data quality in the context of big data and the Internet of Things remain few, but they have increased since 2013. A recent Gartner study of data quality ("The State of Data Quality: Current Practices and Evolving Trends") showed that support for big data issues was rarely a consideration for buyers of data quality tools.”
This is a surprising, yet very interesting finding in my opinion, knowing that at the same time other surveys show that data governance and quality are becoming one of the biggest challenges in big data projects. See as an example this article from Mark Smith from Ventana Research, showing that most of the time spent in big data projects relate to data quality and data preparation. The topic is also discussed in a must watch webinar on Big Data and Hadoop trends (requires registration), by Gartner analysts Merv Adrian and Nick Heudecker. An alternative to the highly promoted data lake approach is gaining ground, referred as the “data reservoir approach”. The difference: While the data lake aims to gather data in a big data environment without further preparation and cleansing work, a reservoir aims to focus on making it more consumption ready for a wider audience and not only for a limited number of highly skilled data scientists. Under that vision, data quality becomes a building block of big data initiatives, rather than a separate discipline.
I cannot end this post without personally thanking our customers for their support in developing our analyst relations program.
 Gartner, Inc., "Magic Quadrant for Data Quality” by Saul Judah and Ted Friedman, November 26, 2014