5 best practices to deliver trust in your data project: Tip #1 Master Data Quality
During Summer, The Talend Blog Team will relay to share fruitful tips & to securely kick off your data project. This week, we’ll start with the first capability: make sure the data you create, develop and share within your organization stay clean and governed.
Master data quality with trustworthy, complete and up-to-date data assets
According to Harvard Business Review, 47% of newly created data has at least one critical error. Poor data quality adversely affects all organizations on many levels, while good data quality is a strategic asset and a competitive advantage to the organization. Having the ability to master data quality can really make the difference: it’s a key component for any organization willing to gain value out of its data.
Bad data can come from every department within your organization - sales, marketing or engineering - under diverse forms.
Some examples of bad data include:
- First names and surnames with missing marks
- National ID numbers with an invalid suffix
- Credit cards exposed to unauthorized persons
- Obsolete post codes or incomplete billing addresses
- Heavily abbreviated names, surnames and addresses
- Miscellaneous remarks not stored in designated fields
- Wrong or missing product references
All of these examples can negatively affect your income statement on the long term if nothing is done.
Our recent data trust readiness report reveals that only 43% of data professionals believe their organizations’ data is always accurate and up-to-date. That figure falls to 29% for data practitioners. That shows that the problem seems to be controlled at top level, but data workers are less confident. Download the Guide to learn more.
When is Data Quality needed?
Data Quality is required at every stage of the data lifecycle.
Data quality is a process that needs to be pervasive throughout your data lifecycle - all the time, all users, for all projects: you will need to provide inflight data quality self-service tools to enable business experts and empower business people with stewardship applications to resolve missing data over time.
What are key capabilities to look at?
When we’re talking about Data Quality, some key capabilities rise to the surface: profiling, deduplication, matching, classification, standardization, remediation and masking are one of them. Good to know they are all integrated into the Talend Platform making it accessible to a wide array of technical and business users.
How to get started:
Regardless of an organization’s size, function, or market, every organization. Data quality cannot be an afterthought, otherwise it will soon become the main obstacle for your data-driven transformation. Start by discovering your data asset, understanding the data quality issues and how they can negatively impact your decisions and operations. Then cleanse your data as soon as it enters your information chain with the right stakeholders on board and tools that can automate whenever possible.
How Talend tools can help
Data Quality is everywhere in the Talend Platform and shared with everyone. It starts with Data Catalog that can automatically and systematically profile and categorize any data set, present data samples and profiling as part of data discovery process, and assign accountabilities, so that data owner can be responsible of the most critical data set and take actions when data quality issues are highlighted.
Profiling is also delivered across tools and persona, from a business analyst using Data Prep to a data engineer using Pipeline Designer and its new trust index (a great new feature from the Summer 18 release), or the IT developer using Talend Studio.
Once data quality issues have been identified, then can handled automatically as part of Talend Data Pipelines in Talend Studio or Data Prep. Rules for data protection, for example for data masking, can be applied there as well.
In some cases, data remediation requires a manual intervention for arbitration, correction. This is where workflows where anyone can participate matters, and this where Data Stewardship comes into play.
What happens next?
Once you have established a data quality culture for every department, with data literacy programs and modern accessible data quality tools, behaviors will change, and people will start taking care of bad data and avoid polluting data systems with inaccurate or incomplete data.
Some Questions to ask yourself, your it team and your organization:
1. How do you discover your data?
As a matter of fact, you cannot solve any data quality issue if you don’t have a global state of your data quality.
2. How do you measure the cost of bad data and the ROI of data quality?
Making sure you can track progress I will help you to highlight the problems and gains associated to it.
3. How do you engage data stewards for data consistency and accountability?
Experts need to be part of the Data Quality Loop. Talend Data Stewardship help them to correct and reduce human errors in data pipelines.
4. How do you automate data quality remediation?
Data Quality is not a set of repetitive manual tasks. It can be fully automated. This short video shows you all the power of Talend Studio with Data Quality Components.
Want to explore more capabilities?
This is the first out of ten trust & speed capabilities. Cannot wait to discover our second capability?