What is a Data Mart?

In a market dominated by big data and analytics, data marts are one key to efficiently transforming information into insights. Data warehouses typically deal with large datasets, but data analysis requires readily available data that’s easy to find. Should a business person have to perform complex queries just to access the data they need for their reports? No — and that’s why so many smart companies use data marts.

A data mart is a subject-oriented database that is often a partitioned segment of an enterprise data warehouse. The subset of data held in a data mart typically aligns with a particular business unit like sales, finance, or marketing. Data marts accelerate business processes by allowing access to relevant information in a data warehouse or operational data store within days — as opposed to months or longer. Because a data mart only contains the data applicable to a certain business area, it is a cost-effective way to gain actionable insights quickly.

Data mart vs. data warehouse

Data marts and data warehouses are both highly structured repositories where data is stored and managed until it’s needed. However, they differ in the scope of data stored: data warehouses are built to serve as the central store of data for the entire business, whereas a data mart fulfils the request of a specific division or business function. Because a data warehouse contains data for the entire company, the best practise to have strictly control who can access it. Additionally, querying the data you need in a data warehouse is an incredibly difficult task for the business. Thus, the primary purpose of a data mart is to isolate — or partition — a smaller set of data from a whole to provide easier data access for the end consumers.

A data mart can be created from an existing data warehouse — the top-down approach — or from other sources, such as internal operational systems or external data. Similar to a data warehouse, it is a relational database that stores transactional data (time value, numerical order, reference to one or more object) in columns and rows making it easy to organise and access.

On the other hand, separate business units may create their own data marts based on their own data requirements. If business needs dictate, multiple data marts can be merged together to create a single data warehouse. This is the bottom-up development approach.

Data Mart Data Warehouse
Size < 100 GB 100 GB +
Subject Single Subject Multiple Subjects
Scope Line-of-Business Enterprise-wide
Data Sources Few Sources Many Source Systems
Data Integration One Subject Area All Business Data
Time to Build Minutes, Weeks, Months Many Months to Years

Data mart vs. data lake

A data mart is also different from a data lake. Data lakes serve as central repositories for raw, unstructured, semi-structured, or structured data that can be stored, then accessed and processed later. It uses schema-on-read at the time of analysis. Its data may or may not be curated, which means its quality is not ensured.

Typically, data scientists, data developers, data engineers, and data architects use data lakes. Typical uses include machine learning, exploratory analytics, streaming, operational analytics, big data, profiling, and data discovery.

Types of data marts

There are three types of data marts: dependent, independent, and hybrid. They are categorised based on their relation to the data warehouse and the data sources that are used to create the system.

1. Dependent data marts

A dependent data mart is created from an existing enterprise data warehouse. It is the top-down approach that begins with storing all business data in one central location, then extracting a clearly defined portion of the data when needed for analysis.
To form a data warehouse, a specific set of data is aggregated (formed into a cluster) from the warehouse, restructured, then loaded to the data mart where it can be queried. It can be a logical view or physical subset of the data warehouse:

  • Logical view: A virtual table/view that is logically — but not physically — separated from the data warehouse
  • Physical subset: Data extract that is a physically separate database from the data warehouse

Granular data — the lowest level of data in the target set — in the data warehouse serves as the single point of reference for all dependent data marts that are created.

2. Independent data marts

An independent data mart is a stand-alone system — created without the use of a data warehouse — that focuses on one subject area or business function. Data is extracted from internal or external data sources (or both), processed, then loaded to the data mart repository where it is stored until needed for business analytics.

Independent data marts are not difficult to design and develop. They are beneficial for achieving short-term goals but may become cumbersome to manage — each with its own ETL tool and logic — as business needs expand and become more complex.

3. Hybrid data marts

A hybrid data mart combines data from an existing data warehouse and other operational source systems. It brings together the speed and end-user focus of a top-down approach with the benefits of the enterprise-level integration of the bottom-up method.

Structure of a data mart

Similar to a data warehouse, a data mart may be organised using a star, snowflake, vault, or other schema as a blueprint. IT teams typically use a star schema consisting of one or more fact tables (a set of metrics relating to a specific business process or event) referencing dimension tables (primary key joined to a fact table) in a relational database.

The benefit of a star schema is that fewer joins are needed when writing queries, as there is no dependency between dimensions. This simplifies the ETL request process, making it easier for analysts to access and navigate.

In a snowflake schema, dimensions are not clearly defined. They are normalised to help reduce data redundancy and protect data integrity. It takes less space to store dimension tables, but it is a more complicated structure (multiple tables to populate and synchronise) that can be difficult to maintain.

Advantages of a data mart

Managing big data and gaining valuable business insights from it is a challenge all companies face — one that most are answering with strategic data marts. Here’s why:

  • Efficient access — A data mart is a time-saving solution for accessing a specific set of data for business intelligence.
  • Inexpensive data warehouse alternative — Data marts can be an inexpensive alternative to developing an enterprise data warehouse, where required datasets are smaller. An independent data mart can be up and running in a week or less.
  • Improved data warehouse performance — Dependent and hybrid data marts can improve the performance of a data warehouse by taking on the burden of processing to meet the needs of the analyst. When dependent data marts are placed in a separate processing facility, they significantly reduce analytics processing costs as well.

Other advantages of a data mart include:

  • Data maintenance — Different departments can own and control their data.
  • Simple setup — The simple design requires less technical skill to set up.
  • Analytics — Key performance indicators (KPIs) can be easily tracked.
  • Easy entry — Data marts can be the building blocks of a future enterprise data warehouse project.

The Future of data marts is in the cloud

Even with the improved flexibility and efficiency that data marts offer, big data — and big business — is still becoming too big for many on-premises solutions. As data warehouses and data lakes move to the cloud, so do data marts.

With a shared, cloud-based platform to create and house data, access and analytics become much more efficient. Transient data clusters can be created for short-term analysis, or long-lived clusters can come together for more sustained work. Modern technologies are also separating data storage from compute, allowing ultimate scalability for querying data.

Other advantages of cloud-based dependent and hybrid data marts include:

  • Flexible architecture with cloud-native applications
  • Single depository containing all data marts
  • Resources consumed on-demand
  • Immediate real-time access to information
  • Increased efficiency
  • Consolidation of resources that lowers costs
  • Real-time, interactive analytics

Getting started with data marts

Companies are facing an endless, ever-growing amount of information and an constantly evolving need to parse that information into manageable chunks for analytics and insights. Data marts in the cloud provide a long-term, scalable solution. To create a data mart, be sure to find an ETL tool that will allow you to connect to your existing data warehouse or other essential data sources that your business users need to draw insights from. In addition, make sure that your data integration tool can regularly update the data mart to ensure that your data — and the resulting analytics — are up-to-date.

Talend Data Management Platform helps teams work smarter with an open, scalable architecture and simple, graphical tools to help transform and load applicable data sources to create a new data mart. Additionally, Talend Data Management Platform simplifies maintaining existing data marts by automating and scheduling integration jobs needed to update the data mart.

With Talend Open Studio for Data Integration, you can connect to technologies like Amazon Web Services Redshift, Snowflake, and Azure Data Warehouse to create your own data marts, leveraging the flexibility and scalability of the cloud.

Ready to get started with Talend?