Defining the Data Warehouse Appliance

By Philip Russom

What do you think a data warehouse appliance is? We asked this question in a recent survey of attendees at the TDWI World Conference in San Diego, and the answers were revealing (see Figure 1):

  • It’s a familiar term. Only 19 percent said they don’t know what a data warehouse appliance is, meaning that the wide majority of survey respondents had some idea (though definitions vary). Hence, the data warehouse community now has a solid awareness of the data warehouse appliance and what it can be.
  • It’s purpose-built hardware and software. More than half of survey respondents think the server hardware and database software must be built specifically for data warehousing. Let’s consider this the strictest and best definition of an appliance.
  • Or maybe it’s a bundle of hardware and software. The data warehouse bundle typically combines components designed for other purposes, like transaction processing. Strictly speaking, a bundle isn’t an appliance, although it has similar characteristics: it’s preintegrated and tuned for data warehousing.
  • To some, it’s either. A few folks feel a data warehouse appliance can fit either the first, strict definition or the second, looser one.

Tech Survey

Figure 1: Survey results. Source: TDWI Tech Survey, August 2005, 139 respondents.
Appliance and Bundle Vendors

In response to demand, vendors now supply data warehouse appliances and bundles:

Data warehouse appliances. Better-known vendors include Netezza, DATAllegro, and Teradata. The term data warehouse appliance was coined by Foster Hinshaw, a founder of Netezza, and this vendor has blazed a trail by proving the concept and educating the market. DATAllegro is even newer than Netezza, but shows a lot of promise, having already developed two lines of appliances (focused on capacity and performance, respectively).

In many ways, Teradata’s core product line is the mother of all data warehouse appliances. Teradata representatives don’t approve of the term, although the product has for many years provided appliance-like characteristics: hardware and software purpose-built for high-end data warehousing that requires little system integration and administration.

Data warehouse bundles. These bundles offer advantages—tuned for warehousing, fast queries, preintegrated—but not to the degree of a true appliance. Hardware and software bundles regularly applied to data warehousing include the IBM DB2 Integrated Cluster Environment (ICE) for Linux, Sun-Sybase iForce Solutions, and Unisys ES7000 Business Intelligence Solutions.

Other appliances. Not unique to data warehousing, appliances are well established in other areas of IT, as seen in the Network Appliance (storage), Google Search Appliance, and Thunderstone Search Appliance—not to mention the many blade servers, network storage, and other rack-mounted systems that resemble appliances.


Sweet Spots for the Data Warehouse Appliance

Most data warehouse appliances support “large data marts,” where the mart is part of an analytic application that is typically focused on the analysis of a single subject such as call-level detail, customers, shopping baskets, and so on. This isn’t just my opinion. Vendors describe their customers and prospects this way, and users I’ve interviewed describe their appliance implementations this way, too. Hence, the large data mart is a sweet spot for data warehouse appliances, in the sense that users have succeeded with this kind of project.

In fact, some users see this sweet spot as a strategy for bringing a data warehouse appliance into their data center with minimal risk. In these situations, the appliance satisfies the analytic application’s requirements and also serves as a vehicle for proving the concept of the data warehouse. If the concept is proved, users can consider whether to apply appliances in other projects and organizations.

Large data marts aside, a few users have deployed an enterprise data warehouse (EDW) on an appliance. There are even more EDWs on data warehouse bundles. So these platforms can support EDWs, despite the large data mart being the current sweet spot.

Another way to describe the sweet spot is to note that these large data marts usually manage between 1 TB and 10 TB of live, query-able data. This will shift upward as users manage more data and vendors come out with models of greater capacity. In other words, appliance users I’ve interviewed start with 1–3 TB and work toward 10 TB over a couple of years. But a handful of appliance users have exceeded 10 TB, and I know of one with over 20 TB. So the appliance sweet spot will soon shift up toward 20 TB.

Recommendations to Users

You should consider a data warehouse appliance for projects that involve a terabyte-size data mart supporting an autonomous analytic application. The fit is even better when the application involves intense ad hoc queries that would put an undue load on the organization’s enterprise data warehouse. Furthermore, this kind of isolated analytic application can be a low-risk project for trying out a data appliance in your data center before deciding whether to use appliances more broadly. The data warehouse appliance also seems to work well with projects that need a short time to deployment, low price per terabyte, and minimal system integration and administration.

Dozens of early adopter users have succeeded with data warehouse appliances in these situations—you can, too.

Philip Russom is senior manager of research and services at TDWI, where he oversees many of TDWI’s research-oriented publications, services, and events. He’s been an industry analyst researching BI issues at Forrester Research, Giga Information Group, and Hurwitz Group. You can reach him at prussom@tdwi.org.

You can download Philip Russom's recent Webinar on this topic here.


tdwi Partners
Knightsbridge

Need to drive more value from your data? Knightsbridge Solutions provides business intelligence/data warehousing solutions for the largest companies with the most complex data challenges.