Data Warehouse Tools and Optimization

Leverage Hadoop to improve operational performance

Data warehouses are an important tool to gain business insight into your customers and operations. However, as companies increase the amount of information that they store and analyze, the operational and licensing costs of enterprise data warehouses (EDW) are becoming prohibitive. For example, if transactional volumes in your business doubled, if new regulations require data to be stored twice as long, or if you doubled the amount of data (structured, semi-structured, unstructured) that you need to analyze, it is possible that your data warehousing licensing costs could double as you scale out systems.

To reduce operational costs and improve data warehouse performance, firms are optimizing how they use and archive data, e.g. infrequently used data and aged data can be offloaded to less expensive Hadoop systems (20X cheaper or more) for storage and still be quickly retrieved for analysis. Companies are determining how much "data in use" they need, e.g. 30, 45, 90 or 180 days, and then employ a sliding window of archiving to Hadoop.

The benefits of optimizing your data warehouse with Hadoop are:
  • Dramatically lowers the cost per terabyte to store data.
  • Allows you to store more information across data warehouses and Hadoop systems, while keeping costs flat.
  • Increases data warehouse performance, e.g. queries, reporting and analytics, since you only store “fresh” data.

 

Obstacles: Quickly Integrating Hadoop

There are numerous challenges to quickly integrating a data warehouse optimization project:

  • Limited big data resources. Although increasing, the pool of big data developers is relatively small.
  • Lack of tooling. Even with big data developers, you do not want to get into the development and maintenance costs of complex, big data coding and manually wiring the systems for reliable, secure and scalable connectivity.
  • Project governance. You will need a way to manage and govern the data between the systems.

 

Solution: Talend Big Data

Talend simplifies the process of moving data between traditional data sources, Hadoop and enterprise data warehouses. Talend’s big data solution provides:

  • Comprehensive big data support including Hadoop HDFS, HBase, Hive, Pig, Sqoop, BigQuery; and is Certified for all major Hadoop based distributions – Amazon EMR, Cloudera, Hortonworks, Mapr, Greenplum/Pivotal. Talend extends this with over 800 components that allow for integration with nearly any application, warehouse or database.
  • Easy-to-use, graphical code generating tools that simplify big data integration without writing or maintaining complex big data code. Reduces time-to-market with drag-and-drop creation and configuration, prebuilt packages and documented examples based on real-world experiences.
  • Built in Data Quality and Governance. Administer and manage even the most complex teams and projects whose members have different roles and responsibilities
  • Openness. Talend Big Data is powered by the most widely used open source projects in the Apache community. A large collaborative community and software created through open standards and development processes eliminates vendor lock-in.

 

Talend Products

Talend Big Data

Talend Open Studio for Big Data combines big data components for MapReduce, Hadoop, HBase, Hive, HCatalog, Oozie, Sqoop and Pig into a unified open source environment so you can quickly load, extract, transform and process large and diverse data sets from disparate systems. Talend Enterprise Big Data adds teamwork, advanced management features, indemnification and support.

Learn More

Talend Platform for Big Data

Talend Platform for Big Data is a powerful and versatile big data integration and data quality solution that simplifies the loading, extraction and processing of large and diverse data sets so you can make more informed and timely decisions. This offering expands on the Enterprise version with data quality, clustering and advanced support.

Learn More