Talend Data Quality

Data quality entails more than helping companies get correct data into their information systems; it also means getting rid of bad, corrupted, or duplicate data. Clean data is a key element when integrating information across systems, because misinformation can proliferate quickly - internally of course, but also to business partners. With today's interconnected information systems, poor quality data spreads the same way viruses are spread by travelers: erroneous information can spread quickly to other applications. The cost of compromised data is incalculable, including lost sales, wasted productivity, loss of reputation or goodwill, and missed opportunities.

Want to learn more about open source Talend Data Quality? Then watch an online demo or check out our users' testimonials.

Not sure if you need open source Talend Open Profiler or Talend Data Quality? Check out the features comparison matrix.

Data Profiling

The first step in improving the quality of an enterprise's data is to "profile" (data profiling) or evaluate that data. Sophisticated, yet easy to use, the data profiler is an advanced system that does not require an understanding of database engines and file structures. Business analysts or other non-technical personnel can define a set of indicators, patterns and business rules for each data element that needs to be analyzed or monitored through the open source data profiling tool. These indicators can range from simple or advanced statistics, to pattern and soundex frequencies as well as text string and numeric analysis, including summary data and statistical distributions of records. The patterns are preset or customized expressions that define the expected form of data analyzed and the open source data quality business rules help define custom business thresholds and value ranges.

Talend Data Quality: open source Data Profiling

By reviewing the metrics on a regular basis, and following their trends, a company can follow the evolution (improvement or degradation) of the quality of its data through data profiling.

Talend Data Quality includes other profiling and reporting functionalities:

  • History of data profiling analyses
  • Batch analyzing
  • Report stylesheet customization
  • Various report formats including PDF, HTML and XML.

Data Cleansing

Talend Data Quality: open source Data Cleansing

Once the problem areas have been identified, the data must be corrected. For data that does not conform to your standards, Talend Data Quality has powerful tools for repairing and cleansing it. Talend Data Quality allows you to use reference data to set the standards for values, regular expressions to set standards for data shape and size, and matching algorithms to find and repair duplicates and near duplicates in your data.

Set up cleansing processes using a wide range of dedicated data integration and quality components. These dedicated components, such as name & address cleansing components, fuzzy deduplication components, are natively available in Talend Data Quality.

Data Enrichment

Talend Data Quality: open source Data Enrichment

Data Enrichment fills in the missing pieces in your data so that you can reach your business goals. The variety of this information is limitless - it can include incorporating a company's Dun & Bradstreet information or a consumer's credit score, getting the longitude and latitude of an address to help plan delivery routes, or collecting census data to target demographics or income categories. The intuitive development environment helps users develop seamless processes in one single environment, to consolidate, merge or simply insert data into any target system.

Analytical Portal

Data Quality Portal provides customizable web-based data quality monitoring and reporting to help organizations keep watch over crucial data quality metrics that may impact important business processes.

Talend Data Quality: Analytical Portal

Data Quality Portal delivers customized key quality indicators (KQI) to a web-based portal where teams can collaborate on the process of improving data quality across the enterprise. It includes PDF report generation, user customized dashboards, ad-hoc queries and time-based monitoring of KQIs. The Data Quality Portal also provides access to a predefined set of reports and global quality gauges that watch for the violation of data quality thresholds.

Data Quality and Data Integration

Talend Data Quality: Data Quality and Data Integration

Since all Talend products are part of the same unified platform, all data quality functionality is seamlessly integrated with Talend Integration Suite, and with Talend MDM, providing users with consistent ergonomics, fast learning curve and a high-level of reusability. This offers unrivaled benefits in terms of resource optimization & utilization, and project consistency.

Key features of this integrated platform include:

  • Single development studio based on Eclipse: Objects such as transformation and validation rules, business rules, expressions, variables, joblets, etc. can be easily reused from one project to the other.
  • A common metadata repository that promotes sharing of vital information assets including user data, application metadata, business models, business rules, transformation and validation rules, connectors, data validation and workflows.
  • Unified deployment environment, that includes a distributed and high availability execution paradigm, single monitoring console and real-time execution reporting.