`

Talend Data Quality Basics

Talend Studio for Data Quality enables business users and data management teams to assess the quality of data in any data source. This product also lets you verify data completeness, accuracy, and integrity in preparation for data migration, instance consolidation, and data integration.

This course is designed to help you immediately utilize Talend Studio for Data Quality. It teaches you how to evaluate the quality of data in the information system according to a set of metrics and thresholds based on a series of indicators, models, and rules for each data item to be analyzed or monitored.

Duration 2 days (14 hours)
Target audience Anyone who wants to use Talend Studio for Data Quality to assess data quality
Prerequisites Completion of Talend Data Integration Basics, familiarity with SQL
Course objectives
After completing this course, you will be able to:
  • Connect to a database or file delimited data source and run an analysis on it
  • Examine the contents of a connection to a data source
  • Run a data analysis using catalog and schema analysis tools
  • Create, configure, run, and analyze results for every type of data quality analysis offered in the Studio on several sample data sets. This includes profiling data based on these categories of analysis: structural, column, table, cross-table, and correlation
  • Generate regular expressions for pattern matching within an analysis to test data quality
  • Define indicator thresholds that are flagged in analysis results when violated
  • Create and apply a set of business rules to separate compliant data from noncompliant data
Course agenda

Connections

  • Creating database and file delimited connections

Structural analysis

  • Using connection overview analysis
  • Using catalog overview analysis

Column analysis

  • Performing a basic column analysis
  • Adding regular expressions
  • Defining indicator thresholds
  • Running additional basic column analyses
  • Running and reconfiguring predefined column analyses

Semantic discovery analysis

  • Configuring and using a semantic discovery analysis

Table analysis

  • Using a column set analysis
  • Using a match analysis
  • Using a business rule analysis
  • Using a functional dependency analysis

Cross-table analysis

  • Using redundancy analysis

Correlation analysis

  • Using numerical correlation analysis
  • Using time correlation analysis
  • Using nominal correlation analysis

Tasks

  • Defining and managing tasks in the profiling perspective