Data profiling is the process of examining the data available in existing data sources (e.g. databases, applications, files, etc.) and collecting statistics and information about this data. Data profiling enables the assessment of the quality level of the data contained in the information system, according to a defined set of metrics and goals.
Talend Open Profiler is a sophisticated, yet simple-to-use open source data profiling tool that defines the content, structure, and quality of highly complex data structures. The open source data profiler allows business users and data management staff to perform a large variety of analyses using a set of indicators, patterns and rules for each data element being analyzed or monitored. It analyzes data on an ongoing basis, and analyzes changes to source data over time to help improve data quality.
Download Talend Open Profiler now!
Want to learn more about open source data quality tool Talend Open Profiler? Then watch an online demo or check out our users' testimonials.
Not sure if you need Talend Open Profiler or Talend Data Quality? Check out the features comparison matrix.
Metadata discoveryTalend Open Profiler connects to databases to introspect their structures and stores the description of their metadata in its Metadata Manager. A filtering system helps users to only select partial tables or columns for the analysis, optimizing the connection performance in case of a large number of tables and helping data analysts to focus their analysis on the most relevant data.
The metadata is then used by data analysts to perform database comparisons and analyses and set up data quality metrics and indicators that help users to assess the quality of the analyzed data and make decision about possible data cleansing, data integration or data stewardship measures to take. In addition, an embedded data explorer allows users to directly drill down into the tables of the analyzed databases and browse the data using industry-standard SQL queries. Custom business rulesBusiness rules are specific criteria, thresholds or range of values that are used to identify matching records, illogical records (e.g.: age entered < 0 or is decimal) or records that do not match the expected values. A dedicated wizard makes it easy to set up data quality custom business rules using Industry-standard SQL language to define these rules, and allowing advanced use of join conditions for more complex needs. The data quality rules are used to define expected thresholds on the data quality indicator's value. The range or statement defined is used for measuring the data quality in the selected table in the data profiling tool. PatternsPatterns are master data, which analyzed data are checked against during the data profiling. A library of predefined patterns is available for most frequent data quality issues. A number of preset patterns are available natively to help define most commonly expected forms of data analyzed. In addition, fully customized patterns can be built based on regular expressions or SQL statements for optimized and more detailed inspection of data. Profiling users can also share their home-grown patterns as well as leverage patterns developed by other users of the open source Talend Community through the Talend Exchange platform directly accessible in the Talend Open Profiler studio. Regular expressions or SQL patterns can also be imported from a CSV file when the number of patterns is to handle is very large. IndicatorsIndicators are the results of the implementation of different patterns. They define the content, structure and quality of the analyzed data and can result from simple to highly complex operations based on data-matching and other data-related operations. A number of system indicators are available natively in Talend Open Profiler to help users get started with data profiling, including:
Dedicated wizards help users to define their own customized indicators based on industry-standard SQL or Java statements to track new data quality metrics or specific data characteristics. Rendering
|