From Data Lake to Data Swamp – How the Legacy Trap Stifles Innovation

Companies depend on the rapid evolution of technology to meet their data integration needs. But many find themselves stuck in something called the “legacy trap.” The legacy trap is what happens when companies and organizations try to manage new technology with existing or outdated processes. The results are the inability to effectively implement new data tools and a tremendous waste of resources, especially when it comes to the use of a data lake.

When it comes to data storage, the legacy trap transforms agile data lakes into languishing data swamps. In this scenario, companies are unable to adapt or innovate their data storage strategy as quickly as their competitors. As a result, efficiencies are reduced and competitive edges lost. In this article, we’ll look at exactly how the legacy trap affects data lakes and how cloud integration can prevent data lakes from becoming data swamps.

Download Data Lakes: Purposes, Practices, Patterns, and Platforms now.
Download Now

Data Lakes and the Legacy Trap

The pace of technology innovation today is dizzying. Businesses and organizations trying to stay ahead of the curve may find that their latest data management solutions are outdated upon arrival.

Consider a company that spends two years building a new data storage solution. In 24 months, the company unveils its new data warehouse only to find that:

  1. It’s no longer on the leading edge of innovation
  2. A great deal of time, money, and effort have been spent with little gained in return
  3. The knowledge and skills used to manage the previous data warehouse are no longer effective

This is what we mean by “legacy trap.”

The legacy trap puts businesses and organizations into a frustrating position in which the majority of their resources are spent trying to maintain existing systems and infrastructure. As a result, there’s little time or money left for true innovation. The legacy trap is a no-win situation from which it can be difficult to escape. Nowhere is this more apparent than in the realm of data storage.

Data Lakes — Where Data Flows Clear and Free

The concept of the data lake emerged in response to the dramatic increase in the volume and complexity of data over the past few years. IT departments struggled to keep pace with the vast amount of data that was available to them, and traditional data warehouses were no longer able to meet capacity. Instead of letting data languish or go unused, a new approach to collection and storage was needed.

Data lakes are large, centralized repositories that house massive amounts of data. Data lakes can receive and store data of any type including raw, structured, semi-structured, and unstructured formats. Data lakes are typically used to store data generated by high-velocity, high-volume sources in a continuous stream, such as the Internet of Things (IoT), product logs, or internet transactions. Data lakes are a relatively new phenomenon and not as well known as data warehouses or data marts.

In a data lake, data of all types continually flow into the repository from a variety of sources. Since data does not need to be processed or refined before entering the lake, it can be managed with metadata to make the process of locating specific data much more efficient.

Unfortunately, most data lakes don't succeed. One major factor in this high rate of failure is the legacy trap: instead of a smoothly flowing lake filled with useful data, many organizations find their data in the dark and dirty waters of a data swamp.

Data Swamps — Where Good Data Goes to Die

Companies find data lakes attractive because they require little structure and can accept virtually any type of data. However, data lakes which are poorly designed or neglected become more of a liability than an asset. This is what we refer to as a “data swamp.”

A swamp is the perfect analogy for a mismanaged data lake: the murky water makes it impossible to see what’s beneath the surface and uncontrolled plant growth and debris make navigation difficult. This is essentially what happens to a neglected data lake. It becomes nearly impossible to find the data you’re looking for or to mine it for actionable business insights.

Download Build a True Data Lake with a Cloud Data Warehouse now.
Download Now

Data Lake Failure — 5 Reasons

One of the main reasons that data lakes become data swamps is the legacy trap. Companies go to great effort to implement a data lake solution, but continue clinging to policies and processes that weren’t designed to manage the volume and variety of data contained within a data lake. Here are 5 reasons why the legacy trap causes data lakes to fail.

Skills Gap

The skills needed to build or maintain a data warehouse are not the same skills needed to build a data lake. A data lake project will require someone with experience in big data engineering, as well as a team with the capacity for coding, building, and managing the data lake throughout the implementation process. Otherwise, you risk overwhelming your IT and developer teams and creating a data lake that never reaches its full potential.

Data Architecture

The legacy trap makes it difficult to create a data architecture that possesses agility and portability. In the era of cloud computing, platforms are updated and new applications emerge every day. Without flexibility and adaptability, data lakes can miss out on new services. As a result, implementing changes may result in unnecessary transition costs. Even worse, users may find that their data lakes are failing because they were not designed to scale.

Data Quality

The size of the global data sphere is projected to reach 163 ZB by the year 2025. That’s roughly the equivalent of watching the entire Netflix catalog 489 million times. With so much data being produced, processed, and stored, opportunities for errors are everywhere. In legacy trap situations, the tools needed to cleanse, profile, and process data may lag behind the data itself. Dirty data can generate enormous waste in terms of both time and money, as companies struggle to solve data quality issues.

Security and Compliance

A data lake is a great option for warehousing data from many different sources, but securing it can be a challenge. GDPR and other regulatory compliance standards mean that liability is on the line, and with it comes the risk of financial penalties and damaged reputations.

The legacy trap creates security and compliance vulnerabilities because it hinders the implementation of the most current data security protocols.

Gated Asset: Executive Summary — A Practical Guide to Data Privacy and Compliance

Undefined Goals

Many data lake projects fail because they don’t begin with a business goal in mind and because buy-in has not been secured from all the stakeholders. Instead, the data lake is created, raw data is dumped, and no plan is established for how the data will be used or how insights will be gleaned. Without a clear course of action and strong partnerships between IT and business teams, the data lake will never live up to its full potential.

Cloud Integration — How to Solve Your Legacy Trap Problem

The legacy trap is a real problem for many companies, but it’s also avoidable. Cloud integration produces platforms, applications, and solutions that are defined by the adaptability and elasticity that solve the legacy trap problem. For cloud data lakes, that means continual access to evolving technology at a much faster pace than can be achieved in on-premises or legacy systems.

Cloud-native data lakes can take advantage of the scalability, security, and data management features that help them succeed. And with platforms and applications that are managed by third-party vendors, your internal team’s talent won’t be diverted to damage control: they’ll remain free to make the most of the data that’s in your lake and to continue driving the innovation that drives your company.

Talend Cloud Integration Platform provides a full range of ETL, data processing, and data management tools to help your data lake thrive. And with over 900 connectors, you’ll have total control of your data in any format, no matter where it’s stored. Download a free trial of Talend Cloud Integration Platform to see what it’s like to escape the legacy trap for good.

| Last Updated: January 28th, 2019