The 2019 Gartner Magic Quadrant for Data Quality Tools : Talend named a leader

The Fundamentals of Data Governance – Part 2

The Fundamentals of Data Governance – Part 2

  • Jeff Tyzzer
    Jeff joined Talend in 2017 and has spent the entirety of his career on data. His competencies encompass data engineering, metadata, data quality, and data catalogs. Jeff lives in Sacramento, CA.

Introduction

In part 1 of my post on data governance fundamentals, I introduced the "5 Ws and 1 H" of problem-solving—"What”, “Why”, “Who”, “When”, “Where”, and “How”— and applied the first three to data governance. This part covers how you can apply the last three pieces and suggest some next steps. Let's get started!

The "When" of Data Governance

In 2001 Doug Laney, an analyst at Gartner, defined the now-ubiquitous “3-Vs” of data: volume, velocity, and variety. Laney’s original context was e-commerce, but most discussions of the 3-Vs today are within the milieu of big data. Laney argued that increases in each of the Vs necessitate changes in how data is managed. Since he wrote his paper, other “Vs” have been added, such as value and veracity. I’ve included this last one since it speaks to data quality.

When organizations reach a threshold in data volumes and varieties where existing ad hoc, reactive, methods no longer work, a more disciplined approach to data becomes necessary. In this case, data governance.

Volumes and varieties grow as companies gain market share, acquire companies, or offer additional products and services. In this way, a company’s success may compel a governance practice. As to veracity—or truth – data quality can also be a driver of data governance. As more data is generated and combined, there's more opportunity area for data quality issues. If a lot of people within your organization are dedicated to fixing data, that’s a sure sign that data governance is warranted. The right time to start governing your data is likely now.

The "Where" of Data Governance

There are many opinions as to who should own data governance and where it should reside. Some are:

  1. The office of the Chief Data Officer: After all, data strategy is their remit.
  2. The business: Having the business own data governance necessitates that they take an active role in it and makes it more strategically aligned. But which "business"? Compliance?
  3. IT: Data Governance typically originates with them, and they’ll likely administer its enabling technologies, to say nothing of the systems and data stores they already run. Maybe they should do it?
  4. It doesn't matter: Establishing who does "what" is more important than "where" as data governance can succeed irrespective of who manages it.

The best answer, though, is that once data governance is in steady state, the data governance “organization” should be a federation of business and IT personnel ubiquitous throughout the organization with no single owner. Consider a fabric metaphor: To make fabric, you need two threads—the warp and the weft.

In like fashion (see what I did there?), the best kind of data governance is the one woven into the organization chart. You won’t see a data governance department on there, nor will you see human resources titles like “data steward.” Data governance is most successful when its functions are put into place within the existing organizational hierarchy, as an overlay on people’s “regular” jobs. As the DMBoK puts it, data governance should be embedded within existing organizational practices.

<<ebook: Download our full Definitive Guide to Data Governance>>

The "How" of Data Governance

Data governance exists at the intersection of people, processes, and technologies (fig. 2) As I said earlier governance is not achieved through technology alone (but technology is critical to its success).

Figure 2

People

There are two principal roles in data governance, stewards and owners. Data stewards are responsible and accountable for data, particularly its control and use as pertains to data’s fitness for its intended purpose(s). Again, data steward isn’t a position on an organization chart. Rather, it’s a function people perform as part of their daily work, with stewards assigned based on a person’s existing relationship to data. Further, there might be different levels of stewards, for example, domain data stewards. I can’t say too much beyond that as details of stewardship are going to be highly specific to the organization.

Data owners are subject matter experts and approve what data stewards do. By subject, I mean data-subject areas, e.g., customer, product, loan, location, etc.--the things against which transactions are executed. Another term sometimes used for a data owner is a data custodian.

Finally, there’s the data governance council, which may sound ominous but needn’t be. The council consists of stewards, owners, and IT staff who meet regularly to discuss and resolve escalated data issues. Think of it as governance’s “supreme court.”

The roles just described together form the data governance organization framework (fig. 3).

Figure 3

Its hierarchy isn’t an aggregation but consists of escalation paths for approvals in cases where consensus can’t be reached at lower levels. Let me hasten to add that the DMBoK estimates that 80-85% of data governance issues can be resolved at these lower levels, with the council needing to arbitrate only around 5% of all issues.

There are many ways to layer the framework roles; I’ve just shown a particularly flexible one. Often, the leaf levels are lines of business (LoBs).

Federation is a key component of the data governance organization framework. According to Ladley, federation describes the extent to which data governance permeates a given subject area. It is the means by which you blend and stratify the various governance roles and functions across the organization. Why this is important is that some aspects of some subjects, for example, the creation of a new product entity, will likely be more centrally controlled than others. Greater central control might also be justified by an organization’s relative lack of data management maturity.

Processes

When I say “processes” I’m using the term rather loosely as a catch-all for things people need to codify and document. These “things” are, in fact, principles, policies, and standards (fig. 4).

Referring again to Ladley:

  • Principles are statements that guide conduct and are the foundation of the other three and the behaviors they’re meant to guide. An example of a principle might be “Data is an asset that needs to be formally managed.”
  • Policies are a type of process. If principles answer "why", then policies address "how". The DMBoK defines policies as “codify[ing] requirements by describing…guidelines for action.” Policies operationalize principles and are enforceable (i.e., require that they be followed).
  • Standards, a kind of policy, establish norms or criteria against which to be evaluated, such as a business glossary term definition standard.

Figure 4

Having established roles and processes, the next step is to map the two. A RACI matrix is ideal for this (fig. 5). Those letters stand for, respectively: responsible, accountable, consulted and informed. Responsible means the role owns the process, accountable signs-off on the work, consulted has the necessary information, and informed must be notified of results.

Figure 5

Tools and Technology

I’ve mentioned on several occasions that data governance cannot be achieved through technology alone. A governance program is often precipitated by a sponsoring effort that has a significant technology component in the form of data quality and/or metadata. These two technology suites are not only complementary but are instrumental in furthering data governance’s goals. (Remember, part of control—a key aspect of DG—is monitoring.) Talend has best-in-class offerings in both of these spaces, which I’ll be covering in my future sessions.

Making the Case 

Recall the principal determinants of business value: revenue, cost, and risk exposure. When making the case for governance keep these in mind. Having said that, though, don’t just say “we should govern our data because doing so will increase revenue, reduce costs, and mitigate risk” (although it will :-) and expect to leave it at that. What’s critical to making the case for governance is tying it to business goals. Data governance is intended to be an enabling function, and you may recall that in Talend’s definition of data governance is the phrase “enabling an organization to achieve its goals.”

A good way to begin tying governance to business goals is to do a gap analysis. Seiner suggests keeping in mind the question “what can’t the organization do with the data it has now?” when looking for gaps. Ladley echoes this by recommending a business alignment exercise, a good next step, which links business processes to data requirements. If you can tie data issues to business needs and cash flows to these business needs, then transitively you can put a number on the business value added by governance.

Conclusion

Successful governance is achievable only if an organization is committed to changing its data management behaviors. Once this commitment is made, the thoughtful orchestration of people, business processes, and tools operationalize the new behavior. This two-part blog introduced the fundamentals of data governance by addressing what it is, why do it, who should do it, when it should be done, where it should live, and how to do it. Thank you for reading.

References

Data Management Association. Data Management Body of Knowledge. Basking Ridge: Technics Publications, 2017.

Ladley, John. Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program. Waltham: Morgan Kaufman, 2012.

Sarsfield, Steve. The Data Governance Imperative. Cambridgeshire: IT Governance Publishing, 2009.

Seiner, Robert. Non-Invasive Data Governance. Basking Ridge: Technics Publications, 2014.

Talend. The Definitive Guide to Data Governance. https://www.talend.com/resources/definitive-guide-data-governance/.

Join The Conversation

0 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *