This week,Talend did to data quality (DQ) what it did to data integration (DI) almost two years ago. We announced the first open source data quality solution. And the same way open source has stormed the data integration market, it will also storm the data quality market.
Proprietary vendors in this field command extremely high license prices. Very few of them are actually independent: a few years ago most DQ vendors were taken over by DI vendors. In turn, these DI vendors have been acquired by database or ERP vendors. As a result, clients buy DQ as part of a full IT stack: it will come along with your WebSphere or your SAP package, and if you spend $10m with the vendor maybe you’ll get DQ as a free add on. But if you don’t have $10m to spend - well you’d better be ready to spend $250,000 or more for DQ.
Here is why Talend’s solution is different:
- It’s open source, and thus commands a much lower “acquisition” price and TCO than competing products
- It’s open source, and thus it’s open. For users, that means essentially extensible. Want to add lookups against an industry repository, or public domain information? Yes you can. Want to customize the solution? Yes you can. Try that with proprietary solution, whose bread and butter consists of selling you add-ons to connect to this or that.
- It was grown from the same code base as our DI platform. That means that DQ can actually be embedded into DI (by design and not as an afterthought). And since it’s the responsibility of the DI guys to make sure that they don’t propagate bad data, putting DQ features at their fingertips brings tremendous value.
We have briefed a number of industry analysts and members of the press this week, and the feedback has been overwhelmingly positive. I’ll post separately about some of the results from these discussions.
Yves











This open source solution has been around for some time also:
http://eobjects.org/trac/wiki/DataCleaner
A price as $250.000 stated in the blog is only valid for some of the US based solutions. Some other commercial solutions around will do the job for a much smaller investment.
Henrik - thanks for the comment. Yes, a number of open source projects around data quality have been reported to us after we issued this announcement. However, despite their merits, I don’t think any is as comprehensive as Talend Data Quality - and none is backed by a vendor. I am not saying this to be dismissive: open source projects and commercial open source are perfectly complementing one another, and working together.
As far as commercial software being available for a much smaller investment: yes, the same holds true of commercial data integration software. However this is not who we are going after. Talend is competing head to head with the heavyweights market leaders. In data integration, that means Informatica, IBM, Business Objects - not the small products of the world (such as Pervasive - and this time I am being dismissive!). You will see that same happening in data quality.
Yves
I second Henrik that DataCleaner has been around for some time and I disagree that it is not as mature as Talend Open Profiler (TOP). TOP is only for MySQL and Oracle (and I’ve heard that youre adding Postgresql support soon, fine) whereas DataCleaner supports almost all databases (ie. MS SQL, Oracle, MySQL, Postgresql, Derby, OpenOffice base, Hsqldb, Firebird and more) and also supports a lot of file formats such as CSV/TSV files, Excel spreadsheets, XML files etc. The metrics that the two applications offer are pretty much the same (and I even believe that DataCleaner offers some more advanced metrics than yours, for example the Pattern Finder which helps identify patterns in values) and DataCleaner has shown considerable growth (I think we’re up to 4 releases the last half year). I also believe that your “integrated suite” argument is a bit wrong since as far as I can see TOP and TOS are not integrated in other ways than it has somewhat similar UI components? This of course is not meant as a “TOP bash” but just a notice that there’s a long way to go before I think you can call yourself the best open source data quality application available…
Beno, that’s because you are looking only at TOP and TOS, and no they are not fully integrated. The subject of this post was the announcement of a full data quality solution, that not only includes data profiling but also data cleansing. As stated in the press release that will be available later in September.
And as far as “call[ing ourselves] the best open source data quality application available” - you said it, not me!
But at the end of the day, it’s after the proprietary vendors that we are - not after one another in the open source field.
Yves
Yves, the discussion about open source versus closed source and main stream products versus best of bread is long and perhaps never ending.
I will say that ROI rules – this means that you measure your total benefit versus your total investment.
I wish you welcome on the DQ battlefield – actually I think there is room – the need out there is hidden (or rather neglected) but enormous.