Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Information Is Your Business
   Information Is Your Business Advanced Search
advertisement

RESOURCE PORTALS
Business Intelligence
Compliance
Corporate Performance Management
Data Integration
Data Quality
Data Warehousing Basics
ETL
Master Data Management
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
DM Review Extended Edition
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Tech Evaluation Center:
Evaluate IT solutions
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

What is Really Behind the Success of Data Warehouse Appliances?

  Article published in DM Direct Newsletter
February 2, 2007 Issue
 
  By Phil Francisco

Over the past four years, data warehouse appliances have become a disruptive force in the data warehousing market, increasingly displacing systems built on traditional computing architectures. The market is characterized by tremendous growth, projected to increase at a compound annual growth rate (CAGR) of 70 percent through 2010, and its success is borne out by large brand-name companies worldwide, now numbering well over 100, who are implementing data warehouse appliances as a key component of their strategic business intelligence initiatives.1 According to Gartner, data warehouse appliances are projected for mainstream market adoption within two to five years. 2

But what is behind all this growth? Why are data warehouse appliances consistently able to outperform general-purpose systems to uncover deeply buried customer and operational trends? What accounts for their low total cost of ownership and ease of use in the data center?

In this article, you will see why data warehouse appliances are a proven long-term enterprise solution with advantages in performance, cost, administrative simplicity, space, power and cooling - key factors for enterprises that depend on high-performance analytics. I'll show what's "under the hood" of a true data warehouse appliance - how a design based on data streaming rather than general-purpose architectures is ideal for data analytics. And you'll see why an integrated appliance built with commodity components provides a sustainable and ever-increasing performance advantage.

Meeting a Market Need

Data warehouse appliances emerged in response to a market need: general-purpose processing architectures (those based on Intel chips, for example) are not designed for the type of processing required in data warehousing. As the data volumes, analytic demands and number of users grow, query performance slows - not a good thing for businesses that depend on fast, complex analytics. And because of the limitations of the underlying architecture for data warehousing, the systems are expensive to scale, complex to administer and time-consuming to deploy.

General-purpose servers operate by reading data off disk, bringing it across an I/O connection and loading it into memory for processing. This approach has been used by generations of PCs and general-purpose servers for standard computing applications - whether processing an invoice or accessing a Web site - that are characterized by operations on individual data elements. But the method works poorly when shuttling huge blocks of data back and forth across backplanes and I/O channels for analysis, with bottlenecks in these shared resources adversely impacting performance. Continuing advances in chip technology have not been able to resolve this limitation of the underlying architecture. As a result, data warehouse systems bound to general-purpose architectures can't keep pace with today's BI demands.

Unlike general-purpose computing, analytic applications require examining massive volumes of data as smoothly and efficiently as possible. Data warehouse appliances were created specifically to meet this need, with an architecture designed for quickly querying terabytes of data and integrating components best suited for this task.

What Constitutes a Data Warehouse Appliance?

Since the data warehouse appliance approach began receiving widespread attention in 2003, a number of new entrants (from industry behemoths to the smallest startups) have tried to stake their claims to it. One way to recognize a true data warehouse appliance is to see if it meets a few basic criteria - which, in many ways, reflect the qualities of any good appliance:

Purpose-built for performance. A data warehouse appliance is a fully integrated device built for a single purpose: to enable real-time BI and analytics on terabytes of data. As such, it is not bound to general-purpose computing architectures, but starts with a clean slate to meet the challenge in the most effective and efficient way.

Based on commodity components. The architecture makes the be st use of commodity hardware for powerful and economic deployment. O ff-the-shelf processors, storage and other modules can be replaced as more powerful versions emerge, allowing continuing increase in performance without being bound to a particular vendor.

Simple to install and use. A true data warehouse appliance requires no tuning, indexing, partitioning or aggregations. Like any good household appliance, it's easy to deploy and maintain, with installation in hours and the ability to have a large data warehouse up and running in a day or so.

Low acquisition and ongoing costs. Appliances are just less costly to own and maintain - even for a large enterprise data warehouse implementation of 100 terabytes or more.

Enterprise compatibility. A data warehouse appliance uses standards-based interfaces and plug-and-play integration with all major BI and data integration vendors.

Low power, cooling and space consumption. A true appliance delivers high-performance in a compact footprint at a modest consumption of electrical power. Heat generation is a fraction of conventional architectures, eliminating the need for skip-a-row equipment patterns to keep the system cool.

Under the Hood: Where Does the Performance Edge Come From?

Like many conventional large data warehouse systems, a data warehouse appliance derives its processing power from a Massive Parallel Processing (MPP) array of nodes. In this architecture, nodes are deployed in a "shared nothing" architecture that provides a very efficient way of combining many nodes in a highly parallel environment. Unlike traditional MPP solutions, however, where the cost of each node and the added complexity of every additional node prevent any high degree of parallelism-by-hardware, it is very common for a data warehouse appliance to deploy dozens, hundreds or even more query processing nodes in a single appliance package.

At the front end of an appliance, one or more Linux hosts are responsible for managing and prioritizing the workload among the nodes and aggregating the results. In addition to optimizing overall query performance, the host gives the data warehouse appliance broad enterprise compatibility, running a powerful MPP architecture within a standard Linux box that is simple to integrate into a company's IT infrastructure.

The goal of a data warehouse appliance should be to eliminate the traditional bottlenecks of business analytic systems - I/O, memory, processing and network. While some data warehouse appliance architectures may still separate storage from processing, the most optimal design for a high-performance MPP node in a data warehouse appliance would have a very low ratio of disks to CPU and memory, with a highly effective bandwidth transfer rate from the disks. Ideally these would be configured with a 1:1 ratio (one disk drive per CPU) in a direct-attached configuration to simplify data movement.

In general, data warehouse appliances represent a much better CPU-to-disk ratio, providing more processing power per amount of user data versus conventional solutions, and at a lower cost. One potential cost-saving element in the MPP node is the use of low-cost, commodity, embedded CPU technology instead of more expensive high-end processors such as those used in blades and conventional servers. These CPUs can make a lot of sense in the purpose-built design of data warehouse appliances because there is no need to run full operating systems or other applications on the nodes and they are more suitable for the data streaming requirements of the data warehouse versus general-purpose computing. These devices also tend to use as little as one-twentieth of the power of high-end CPUs, which allows for much denser, power-efficient packaging.

An even more effective, high-performance design of a data warehouse appliance may include a field-programmable gate array (FPGA) in the parallel processing node for query performance acceleration. In this architecture, each query-processing node contains an FPGA together with the CPU, memory and direct-attached storage device.

The approach can be seen as bringing the query to the data - recognition that a streaming architecture that moves processing intelligence to a data stream as it is flowing off disk produces results much faster than the opposite (and conventional) approach of moving vast amounts of data across expensive I/O interconnects into memory. It's a built-in performance advantage for powering the complex queries at the heart of business analytics. A common off-the-shelf device about the size of a thumbnail, the FPGA filters and performs processing operations on data streaming through the device at high speed, without interrupting the flow. In addition, the performance gains of FPGAs are actually outpacing CPU technology - where Moore's Law suggests a doubling of CPU performance approximately every 18-24 months, FPGAs are progressing much faster.3 On the query processing node, it can filter more than 90 percent of initial data as data streams off the disk, greatly accelerating application performance over "brute force" CPU-based processing.

This fully integrated architecture, built with inexpensive commodity components, provides a dramatic performance advantage - 10 to 100 times faster than data warehousing systems based on general-purpose architectures. The architecture accounts for the low purchase price of the data warehouse appliance as well as its administrative simplicity because there's no indexing, partitioning or other traditional tuning required to tweak performance. It also accounts for the low power and cooling requirements because processors are not straining to handle overwhelming amounts of data.

Can Other Approaches Catch Up?

How have other data warehouse vendors responded to this disruptive force? While multipurpose servers continue to increase in performance, their technology path for data warehousing remains hindered by I/O and memory bottlenecks. Many vendors are trying to apply the latest innovations in general-purpose computing - from higher-speed, multi-core processors to faster interconnect technologies - to squeeze greater query performance from the underlying architecture. Other common approaches combine server clusters with rack-mounted storage in the same cabinet or some other "hybrid" blade approach combining CPUs with disk drives.

Whether these systems are marketed as appliances or not, they are simply not designed to handle deep analysis of massive amounts of data - and remain at an inherent disadvantage to a true data warehouse appliance. Furthermore, by relying on multiple cores and ever-faster clock rates as a 'brute-force' answer, these attempted solutions are further limited by growing power and cooling concerns in the data center.

New processing technologies continue to emerge, but none have been able to overcome these basic handicaps.

Staying Power for Unconstrained Analytics

Data warehouse appliances are rapidly growing their share of the data warehousing systems market. The model of an integrated appliance built with commodity components has also shown that it can sustain its performance advantage, easily incorporating new technology for continuous improvement to keep up with growing BI demands. Since the first models were released, query performance has already increased by orders of magnitude as new components for streaming, processing and storage have come on the market. And because the development of faster FPGAs is outpacing CPUs, the performance gap between appliances and systems built with general-purpose architectures appears to be widening.

What really matters is the impact that a true data warehouse appliance can have on an enterprise, allowing users to perform unconstrained analytics on all their business data, even in extremely busy mixed-workload environments. Companies can run existing queries faster and more deeply, but even more importantly, they can perform new, previously impossible analyses to drive business growth. The impact goes even further: from changing the way companies think about staffing their data warehouse to helping mid-tier businesses solve critical BI needs that were previously out of reach. Today, data warehouse appliances are fundamentally changing the way people operate their businesses, allowing them to fully leverage BI for competitive advantage - because now they can.

References:

  1. IDC Report: "Business Analytics Appliances Are Here to Stay." June 2006.
  2. Ted Friedman, et al."Hype Cycle for Data Management, 2006." Gartner Research, July 2006.
  3. Aussie Schnore & Malachy Devlin. OpenFPGA BOF presentation at SCO5, GE Global Research & Nallatech, 16 Nov 2005.

...............................................................................

For more information on related topics visit the following related portals...
Business Intelligence (BI) and DW Design, Methodology.

Phil Francisco is the director of Product Marketing for Netezza. He may be reached at pfrancisco@netezza.com.



E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2007 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.