We have just released Talend Open Profiler - the first open source data profiler.
Put simply, data profiling is the process of examining the data available in existing data sources and collecting statistics and information about this data. Data profiling - while an interesting discipline in its own right - is especially interesting when executed as part of a data quality strategy. In other words: know your data, before you attempt to fix it.
Talend has always considered data quality to be an integrant part of data integration. From day one, we have started to build some data quality and data cleansing components: deduplication, enrichment, fuzzy logic matching… Data profiling clearly takes us to the next step and allows us to introduce a data quality focused product suite. A data quality suite that can - and should - be used wherever data integration is used. But also, that can be used standalone when dealing with data quality issues - outside of the realm of data integration.
Like human viruses, poor quality data travels faster when applications are integrated. In the 19th century, epidemics would stay local. In the 21st century, the SARS infection spread worldwide in days… As information systems are no longer standalone, and all applications and databases communicate and exchange, being certain of the level of quality of your data is key. Before you send erroneous or incomplete data to corrupt other systems…
So - couple things. First, join me in congratulating the Talend data quality development team, led by Sebastiao, for a terrific product. Second, download Talend Open Profiler, test it, use it, post in the forums, report bugs or features request, and tell us what you think!
Yves












0 Response to “Talend Open Profiler, the first open source data profiling solution”
Leave a Reply