In an earlier Talend blog, I shared some key tips that a company needs to keep in mind while designing data quality (DQ) processes in Hadoop. The data processes and frameworks I mentioned in that blog are important not only because it impacts your data quality program, but also your data governance program — if you have one, of course! So do you have a data governance program? This question is not easy to answer because data governance as a concept is not fully understood. Your company is probably already doing some Data Governance but you just don’t realize it, which begs the question: what is data governance (DG)?
The Data Governance Institute defines data governance as “a system of decision rights and accountabilities for information-related processes, executed according to agreed-upon models which describe who can take what actions with what information, and when, under what circumstances, using what methods.” That’s a loaded answer but at the end of the day remember that data governance is really is about standards, policies and reusable models.
If you have a Data Warehouse (DW), which is the traditional approach towards getting insights with data, you probably already have some data governance frameworks and standards in place by having standards for your dimensional tables. So when we talk about best practices in DG, the first step is to understand what really DG is for your company.
Data Governance for YOUR company
What I have heard people about DG is that it is equivalent to MDM. Well, there is nothing wrong with that notion but it just is — incomplete. Data Governance doesn’t need to be just one platform or one concept. In fact, a sound data governance approach can and should involve more than one platform or project. DG is a program in your company which sets rules and standards for Data related matters. For example, if your business needs a sales reporting solution, there will be some governance issues such as
- Which internal databases have this information?
- Who has access to it?
- Have we defined what we call a ‘customer’ or a ‘vendor’?
- Are the structures of sales data already defined?
- What is the quality of the source data?
- Are there any metrics around data sizes?
The IT teams will be responsible to provide solutions for the project and provide development and infrastructure services but it would the responsibility of the data governance team to provide some guidance to the IT teams about Data Related policies and standards. This brings us to the next key consideration.
The Data Governance Council
The council would be responsible for setting the data governance framework for the organization. The DG framework itself should be customized for your company’s specific needs but in general, the framework could include strategic planning tasks such as determining data needs, developing data policies and guidelines, and planning data management projects. The framework also could include ongoing control tasks such as managing and resolving data related issues, monitor data policies and promote the value of data assets.
Similar to the IT project leadership teams, the Data Governance Council members would need to include members from the business and also IT. It is critical to get business buy in into this program and they will be actively involved in the DG tasks.
It is also important to have a flexible org structure for the council. A good practice is to follow a top down approach where the leadership of the council is driving the governance while the business analysts and Data Stewards are implementing the policies. The Data Stewards are responsible to provide the necessary feedback to the leadership.
Implementing DG involves bringing in a huge change in the organization. And that’s why it is imperative for the DG Council to come up with a mission which aligns with the business interests and takes into consideration the strengths of the Implementation teams. The mission of the DG programs would need to be communicated clearly and succinctly articulating the DG main drivers within the organization. The mission needs to be communicated repeatedly and consistently using various avenues.
Focus area: A data governance program can include a multitude of focus areas and it is important to pick an area which provides the most value to your company. These initiatives can be at the enterprise level, or at the project level. Below are some focus areas and a brief description:
- Standards and Policy: This sort of program would collect standards, review existing them and check against the corporate standards. Another main activity is to define a Data Strategy for the company and provide support for any siloed projects trying to join the enterprise landscape.
- Data Quality (DQ): This kind of program deals with finding, correcting and monitoring Data Quality issues in the enterprise. These programs normally involve software for profiling, cleansing and matching engines. DQ initiatives also lead to Master Data Management (MDM) projects, which define the master data and give a 360-degree view of domains such as customer or Vendor.
- Data Security and Privacy: Every company have compliance and regulations requirements and this program would try to address these issues by setting Access Management rights, Information Security controls, Data privacy procedures etc. particularly for sensitive data.
- Architecture/Integration: This focus area aims to achieve operational efficiency by simplifying Data Integration Architecture components such as Data Modeling, Master Data Modeling, Service Oriented Architecture etc.
- DW and Business Intelligence (BI): This program promotes the use of building Data Warehouses and Data Marts to support historical reporting and also futuristic reporting.
- Self-service Architectures: This kind of program takes into consideration the stewardship and Data Preparation challenges and aims to build workflows limiting the ‘shadow IT’ paradigm, which happens so often in organizations.
It’s a journey, not a destination
It is important to understand just like Corporate governance, data governance is not a project but it’s an ongoing process. Any ongoing process will need to have goals defined and a method to measure the progress of the program. A recommended approach for this would be to scan the progress against a Data Governance Maturity Model. Depending on the focus area you chose for the DG program, your company should also have metrics defined to measure the success of the program. It is also recommended to use agile practices. Agile methods such as continuous delivery, constant collaboration between IT and business, welcoming change and having continuous attention to technical excellence and good design fit perfectly into data governance practices.
Just like processes and people, technology is a big part of the data Governance and technology is ever changing. Whether you are in business or IT, it is recommended to embrace technological innovation. New innovations in machine learning, cloud, and the big data space can make data governance initiatives effective. For example, building a data lake on Hadoop could make storage of master data and DW data cheaper and the processing faster.
There is a very good chance your company already has one of the data governance initiatives successfully implemented. My recommendation is to use that as a base for implementing other focus areas. Have a vision for data governance in your company, get buy in from business and IT leadership and make IT and business collaboration better. With these initial steps in place, DG can successfully evolve and provide true value to the organization!