Data Profiling
|
The first step in improving the quality of an enterprise's data is to "profile" (data profiling) or evaluate that data. Sophisticated, yet easy to use, the data profiler is an advanced UI-based system that does not require an understanding of database engines and file structures. Business analysts or other non-technical personnel can define a set of indicators, patterns and business rules for each data element that needs to be analyzed or monitored through the open source data profiling tool. These indicators can range from simple or advanced statistics, to pattern and soundex frequencies as well as text string and numeric analysis, including summary data and statistical distributions of records. The patterns are preset or customized expressions that define the expected form of data analyzed and the open source data quality business rules help define custom business thresholds and value ranges. By reviewing the metrics on a regular basis, and following their evolution and trend, a company can follow the evolution (improvement or degradation) of the quality of its data through data profiling. Other functionalities include:
|
Data Cleansing
|
Once the problem areas are identified, the data must be corrected. All data goes through a "data quality firewall" and records with missing values; values that are improperly formatted or do not match other values in the record in other data sources; duplicates; duplicates with synonyms; even simple typos -all need to be brought into alignment. This is done by cross checking against other databases and reference data.
|
Data Enrichment
Open Source Data Enrichment provides value-added information to the data. The variety of this information is limitless - it can include incorporating a company's Dun & Bradstreet information or a consumer's credit score, getting the longitude and latitude of an address to help plan delivery routes, or collecting census data to target demographics or income categories. |
Analytical Portal
|
Data Quality Portal provides customizable web-based data quality monitoring and reporting to help organizations apply tangible data profiling and data quality metrics and support data quality reporting enterprise-wide. Data Quality Portal releases specialized portlets according to the different user typologies and allows the use of many categories of analytical tools: Reporting, TDQ Dashboard, User Dashboard, Analytical Processing (OLAP), and Adhoc Query. It also provides access to a predefined set of reports, of global quality gauges. |












