The Value of Data Quality in Healthcare

Modern cloud data processing and management systems are rapidly being adopted and scaled across the healthcare industry. This has afforded many invaluable improvements to the industry—such as the personalization of treatment, an increased efficacy in policy creation, and the ability to share valuable information between complementary organizations. However, it has also introduced a number of significant administrative and operational issues in many organizations, due to the inability to source and manage consistent, quality data.

Improving data quality in healthcare begins by understanding the core tenets of data quality management, the value it offers, and some of the most common problems to avoid. Recognizing the characteristics of good data quality—as well as examples in other healthcare companies—and understanding where it is headed in the near future, will help any healthcare organization master data quality.

What is data quality management?

Data quality management is a set of procedures and technologies for effectively integrating and validating data sources, securely collaborating between trusted parties, handling lifecycle systems such as aggregation and deduplication, and safely sharing the results while protecting sensitive customer information.

Effectively managing data quality is particularly important in the healthcare industry. Electronic healthcare records and reports are not only heavily governed by strict regulations—such as HIPAA—but also affect physical treatments and policies. They have a very real and tangible impact on people’s lives.

The value of data quality management in healthcare

Healthcare organizations must source quality data and build strong processes to manage it long-term in a conceptually structured manner. By doing so, they can expect to both speed up their existing processes and build learnings that allow for smarter policy decisions that can affect all stakeholders.

Clean health records and downstream datasets can, additionally, compound in value over time. This occurs as parties across and between organizations can start sharing more data and thus gain more insight into their systems than they otherwise could have with the siloed data stores of the past. Therefore, the impact of quality data and management is not only in performance and efficiency gains, but also in the ability to extract novel insights that could not have been produced with manual analysis.

4 major problems caused by poor data quality

While maintaining strict data quality may seem like a lower-level operational or implementation concern, it must be taken very seriously by all stakeholders. Poor data quality can often cause problems throughout an organization. This impact includes everything from treatment quality to policy-level decision making.

Patient frustration and mistreatment. When source data has been improperly entered into an electronic healthcare system, a human has to manually intervene to resolve discrepancies and/or inaccuracies. This can lead to delays and even mistreatments, both of which result in a poor experience for patients.
Employee distrust of critical technology. With human intervention prevalent in an organization with poor data quality, employees begin to get frustrated and distrust the data that they’re provided with. In addition to basic morale issues, this can lead employees to fall back to manual reporting and analysis, which often increases the likelihood of error and prevents the effective collection of valuable data.
Decrease in efficiency and increase in bottlenecks. With manual interventions to clean incorrect data regularly slowing or stopping operations on otherwise automatable jobs, employee efficiency is dramatically affected. This can lead to inconsistent procedures and backlogs in work that could have been avoided with more systematic approaches.
Poor and ineffective policy decisions. As healthcare policy-makers and management are increasingly relying on large datasets to make smarter and more informed decisions, inaccuracies can lead to decisions that are improper or based on statistically insignificant results relative to the otherwise correct, clean data. Given that decision making is often done off of aggregated and transformed datasets, even small inaccuracies in a datasource can compound when impacting multiple, downstream transformations or when joined with larger datasets.

As these problems continue to occur at a large number of healthcare organizations who have failed to modernize their systems and processes, it is essential that decision makers begin to develop and procure better systems to handle these cases.

Characteristics of good data quality management

A proper and well maintained data quality management system can help prevent the common issues that arise from poor data quality. There are a number of characteristics that define such a system:

Ensures structure at the source. When data is input into a system, particularly when performed by a human, it is critical that the data is properly structured to allow for proper cleaning and validation. Unstructured fields add complexity and significant room for error. Good data quality management makes sure that the data is structured appropriately.
Validates data formatting and preconditions. With input data in a structured format, the systems must properly validate not only the structure and data types of the input but also the higher-level preconditions that dependent systems and aggregations rely upon. These types of procedures limit unexpected processing failures and reduce logical issues such as duplication of records.
Allows for secure collaboration. Without proper sharing and collaboration capabilities, knowledge of a dataset’s context, provenance, and semantics become siloed. This prevents effective long-term maintenance of systems and allows for small or single points of failure. It is important, furthermore, that these collaboration tools are properly access controlled and monitored to prevent misuse and leaking of sensitive information.
Has tools for ongoing maintenance and profiling. As systems and data inputs evolve over time, it is critical that monitoring is setup to account for unexpected behavior and to provide abilities to quickly triage and resolve issues.
Properly accounts for data lifecycle. As the use and governance for data changes, the system must be able to handle common procedures such as deduplication, access control changes, retention, and deletion. While a dataset may remain semantically valid over time, the policies surrounding it can and do change, and thus the dataset must be managed to accommodate these changes in its context.

It is important to note that these are not just theoretical concepts. A number of organizations have successfully implemented systems that are governed by these ideas and are reaping the benefits of more stable and robust technology.

Example: Data quality management at Omaha Children’s Hospital & Medical Center

The Omaha Children’s Hospital handles nearly 300,000 outpatients and 7,000 to 8,000 inpatients per year. As they frequently run incremental updates to their data warehouse to track orders and charges for each and every patient, they’re tasked with processing and managing a vast amount of data complexity.

Prior to setting up a robust data quality management system, the Children’s Hospital struggled to keep track of their numerous data transformation job completions and failures. By setting up proper monitoring systems, the staff is now able to determine exactly where such jobs fail. According to Kevin Sherlock, Data Warehouse Administrator at the Children’s Hospital, the hospital is now able to “configure data integration jobs so that if there’s a problem or exception somewhere in the data processing chain, email alerts or other error conditions are created.”

In addition to monitoring their transformations, the Children’s Hospital has set up systems to allow executives, physicians, and other clinicians to easily extract data and perform any transformation in preparation for reporting. By effectively allowing for collaboration and combining heterogeneous systems, they are now able to effectively identify uniqueness across different systems and normalize the data into aggregated secondary datastores for validation and reporting. This has allowed for analysis that otherwise would have been unimaginable given the combinatorial complexity of these large systems.

The cloud and the future of data quality management

As healthcare organizations continue to collect exponentially more data over time, their processing and storage needs continue to expand at a rapid pace. The cloud enables scalability on both of these fronts, as the organizations no longer have to continuously reinvest in expensive hardware systems. They can now seamlessly adapt to such change with the economies of scale of cloud computing and storage.

Furthermore, by using the cloud to not only store and process data but also to allow for efficient sharing and reporting of data, organizations can more efficiently collaborate on and manage their processes. Cloud data warehouses lift the burden off of IT departments to handle the security and performance implications that occur when trying to handle such complexity of a more traditional on-premises system.

While the cloud in and of itself affords flexible and cost-effective scalability, data quality management becomes ever more important as users increase and data logic complexity grows with such scale. Without proper validation and monitoring systems, for instance, growing data scale simply makes such systems unusable as manual resolution of errors becomes unfeasible. These types of issues will only continue to grow in number and scale, making investment in data quality management an ever-more important aspect of a healthy IT system in any large organization.

Getting started with data quality management in healthcare

Data quality management has become an essential part of healthcare organizations of all forms. While data processing systems are becoming key components of operational decision making and individualized treatment processes, poor data quality and management is becoming a primary inhibitor of operational success and is causing significant strain on such processes.

Building a comprehensive and trusted data integration system that can deliver with speed is essential to resolving the strains that come from poor data management. Talend Data Fabric offers a suite of applications to help healthcare organizations properly manage their data in all environments – multi-cloud and on-premises – by providing a unified and collaborative system for securely collecting, processing, and governing large amounts of data.

To see how Talend can benefit your organization, try Talent Data Fabric to learn for yourself how data quality management can be used to create a more comprehensive and scalable data system for your entire organization.

Ready to get started with Talend?

Contact sales