IDG Network:   
   



Quickly retrieve an article from Computerworld.com by entering it's special QuickLink number.

Search Computerworld.com


Go to Advanced Search


Computerworld Home



XML Feeds





Free E-Newsletters

Keep up on technology news and trends with our free e-mail newsletters! Select from daily and weekly updates -- including alerts and roundups by topic.
Sign up today!

Knowledge Centers

Security
Storage
Mobile & Wireless
Hardware
Business Intelligence
Networking
Software
More Knowledge Centers:


Partner Zones

Blade-Ready Data Center
Business Service Mgmt.
IT Manager Zone
Risk Management
SSL Services
Web Meeting Solutions
Features

Latest Headlines
This Week's Issue
Shark Tank
Research
Webcasts
White Papers
Buyer's Guides
E-mail Newsletters
News Feeds


Home > Browse Topics > Data Management > Business Intelligence > Data Warehouse


Data Warehouse Boost on a Budget
Start-ups are challenging established data warehouse vendors with products that increase performance for ad hoc queries but cost less.




There's more to this story:

Sidebar: Data Warehouse Acceleration Appliances
Sidebar: Nielsen's Data Mart Factory

Related to this topic


Sidebar: Data Warehouse Acceleration Appliances
Data Warehouse Boost on a Budget
Sidebar: Nielsen's Data Mart Factory



Other resources

Are You Ready for Your Sarbox Audit? - Get the insight you need in this free white paper.
Critical! The role of Data Consolidation for Your Data Protection Systems - Download this important white paper now.
Executive Briefing: Get Smart About Business Intelligence - Free ($195 value) for a limited time


News Story by Robert L. Mitchell

APRIL 11, 2005 (COMPUTERWORLD) - When Premier Inc.'s medical databases began bogging down last year, the San Diego-based provider of clinical data put its data warehouse in a box—literally.

Premier sells access to clinical data it gathers from 400 hospitals to pharmaceutical manufacturers. Last year, the company's IBM Red Brick data warehouse had grown to 3TB, and one table included 3 billion entries. "When you go through 3 billion rows of data, you get long runtimes," says Chris Stewart, director of data warehouse architecture.

The problem wasn't just the size of the database, however, but how clients used the data. "Our users want to access all of the data from top to bottom," says Stewart, and the complex, multipass queries created by Premier's 4,000 users each week were slowing performance. Some wouldn't run at all.

Instead of adding to its 24-processor Solaris server infrastructure or making further attempts to optimize the database, Stewart brought in an all-inclusive data warehouse appliance from Netezza Corp. in Framingham, Mass. Some calculations that took one or two days now finish in six to eight minutes on the appliance's 108 processors. Premier still uses Red Brick for most queries, but the NPS 8150 appliance handles the "really, really ugly questions" that weren't possible to process before, he says. "We couldn't offer the product offerings we do today" without the appliance, Stewart says.

As data warehouses continue to grow, more users are demanding access to business intelligence (BI) tools to conduct data-mining exercises across large data sets. "We're talking about using every single call-detail record generated in the last three years," says Claudia Imhoff, president of Intelligent Solutions Inc., a consulting firm in Boulder, Colo. It's hard for database administrators (DBA) to create aggregations of data, such as summarizations, that can facilitate the processing of these complex queries because users often don't know in advance what they're looking for. "These unplanned questions are the ones that knock the stuffing out of databases," she says.

But such queries are increasingly seen as business-critical, says William Fellows, an analyst at The 451 Group in New York. "The problem of querying data sets that are growing at over 100% a year has led to what might be called a data warehouse capability gap," he says. While market leaders like Teradata, a division of NCR Corp. in Dayton, Ohio, offer integrated systems to address this for high-end applications, Netezza and others are jumping in with moderately priced systems that don't require the same high-end hardware and software investments as those from IBM, Oracle Corp. and Teradata. It's an interesting trend but still a small part of the $16 billion market for data warehouse hardware and software, says Dan Vesset, an analyst at IDC.

Small Players, Big Databases

Some start-ups offer only software, while others include software and hardware in a single bundle or appliance. But all use a parallelization scheme that involves symmetric multiprocessing or a massively parallel processing architecture. Designs vary, but all are based on the partitioning of data across servers—something Teradata has been doing for years, says Fellows. "There's nothing new under the sun in terms of approach here except packaging and price," he adds. While Netezza and competitors like to position themselves against Teradata, the company still dominates on the high end, he adds.

Netezza's NPS appliance abandons database indexes in favor of direct table scans, using brute-force processing to get the job done. The system includes its own database, with specialized field programmable gateway array (FPGA) logic that links processors and storage to speed up I/O. A system comparable to Premier's, with 4.5TB of disk space, sells for "a little more than a million dollars," says Netezza CEO Jit Saxena.

By dumping the indexes, Premier's database dropped from 3TB to 1TB. The system is sufficiently fast that Stewart now uses the appliance to both process queries and build the data-aggregation tables that he loads into the Red Brick data warehouse.

Start-up Calpont Corp. in Rockwall, Texas, is developing a similar appliance that hard-codes the database on an FPGA chip. Because it will store the data on a solid-state disk, or synchronous dynamic RAM, however, it will be targeted at smaller data sets. A 128GB box capable of supporting 40GB to 50GB of data will have a price tag in the "couple hundred-thousand dollar range," says CEO Jim Janicki. "We wanted a brute-force engine to handle everything we could throw at it," he says of the device, which is scheduled to ship by midyear. Datallegro Inc. in Aliso Viejo, Calif., is rolling out a turnkey system that functions much like the Netezza appliance, but it's built using off-the-shelf components. "We're taking standard, commodity servers with an open-source database," says CEO Stuart Frost. Datallegro's 3TB P3000 includes 21 dual-Xeon-processor servers, each connected to 12 Western Digital Corp. Raptor drives, and will sell for $450,000 when released this month. Frost is targeting Oracle customers with databases in the 1TB to 5TB range and up to 300 concurrent users.

Metapa Inc. takes a similar approach but lets users buy their own components based on its specification, rather than bundling everything together. Users "can assemble systems that are just as fast as the high-end data warehouses at a fraction of the cost. We don't believe you need a specialized ASIC chip to get there," says Scott Yara, founder and president of the San Mateo, Calif., start-up. The total price, including Metapa's Cluster DataBase—due to ship in the second quarter—and required hardware, will be half the cost of a Netezza appliance, he claims.

Clareos Inc.'s CrossCut software, now available, adds yet another twist. Instead of using database tables, it combines a BI reporting tool with a spreadsheetlike data model that creates a single, flat file of rows and columns.

"The next generation of BI tools will have a flat file structure that will be very fast," predicts Steve Foley, CEO of Herndon, Va.-based Clareos. CrossCut software and recommended hardware to process 146GB of data costs about $65,000. But the product differs from products like Netezza's in one key respect: CrossCut is a read-only database that doesn't provide update capability, Foley says. Competitors that use vector-based processing to support a real-time decision-making application include Alterion Inc. and Aleri Inc., says Fellows at The 451 Group.

By contrast, Teradata's integrated systems connect clusters of high-performance servers using a proprietary high-speed interconnect called Bynet and store data in a Fibre Channel storage-area network. The vendor focuses on allowing large numbers of concurrent queries in a mixed-workload environment and supports "active data warehousing," where databases are continuously updated, says Stephen Brobst, chief technology officer. He sees the start-ups' products as best suited for single-function, low-end data marts and cautions that "data marts end up replicating data."

But that's a trade-off users may be willing to make when cost is a factor. "With an IBM or Teradata solution, your scalability is in large chunks," says the vice president of infrastructure at a large financial services company that's beta-testing a Datallegro system. The incremental cost for adding capacity to an appliance can be a small fraction of what it costs to upgrade his Sun Microsystems Inc. system. He is cautious about buying from a small vendor, but adds, "If they can deliver the same or better performance at 20% of the cost of an IBM or Teradata solution, then you have to do it."

Most of these systems take a black-box approach to optimization, which means DBAs can't do any tuning. That paradigm shift may be the toughest sell, says Intelligent Solutions' Imhoff, and it's definitely a weakness for Michael Benillouche, director of technology at ACNielsen Corp., who prefers to optimize his Oracle data marts.

But Premier's Stewart sees that as an advantage. "My DBA staff has more time for development instead of hand-holding a database. We don't need to build in cycles to make queries go faster," he says.

In traditional systems, ad hoc queries that bog down the data warehouse are restricted, says Imhoff. Now IT can spin off a subset of data to more groups for business analytics without supplying DBA resources. "If I can bring in a technology that doesn't require an army of DBAs, great Scott, what a boost," she says.



Subscribe to our Data Management e-mail newsletter:
E-mail

Data Warehouse Section
Data Warehouse News  |   Mobile Channel  |   E-mail newsletters
  > Data Warehouse XML Feed    > XML Feed FAQ

Also in the Business Intelligence Knowledge Center

News  |  Discussions  |  Buyers' Guides  |  White Papers  |  Mobile Channel  |  E-mail newsletters
  > Business Intelligence XML Feed    > XML Feed FAQ






Additional Content
Data Warehouses White Papers

Computerworld White Papers

Read up on the latest ideas and technologies from companies that sell hardware, software and services.

>Sales Analytics
>Automate Your IP Management For Business-Focused Payback
>Securing Your Website for Business
>Improve Your Sales Process Using Web Conferencing

>View Data Warehouses whitepapers
Free Computerworld Report

White Paper PDF Image Get Smart About Business Intelligence


Most companies are planning big investments in business intelligence -- so why not learn the best practices and avoid the common mistakes? Computerworld’s Executive Briefing, “Get Smart About Business Intelligence,” tells how to distribute BI “to the masses” and make sure the underlying data is solid and secure. Get this report free (a $195 value) for a limited time, and learn more about business intelligence, compliments of Oracle.


Download the free report

Business Intelligence Webcast


Register now!




Sponsored Links

Trade up to DB2 -   the world’s leading database platform.

Get 50% off:   Select Microsoft(R)Windows Server(TM) 2003 E-Learning courses.

CDW. Click Here   The Storage Solutions You Need When You Need Them.

Looking for your feedback on Web Site Content Management -   Please provide us with your thoughts by filling out this survey.

The Dell (TM) PowerEdge (TM) 6850.   Take a 360 product tour now.

Experience the power of SSL security   Click here to register for a free SSL trial ID

Hold meetings over the internet from anywhere!   WebEx Free Demo - Click here to register

Transforming Software Development into a Managed Business Process   Download now

Join us for Three Steps to Better Reporting:   Cognos

The Dell (TM) PowerEdge (TM) 6850. Take a 360 product tour now.   Take a 360 product tour now.

Security Compliance - Map, Scan, and Audit Your Network Today!   Free trial available for a limited time!

Yosemite Backup Advanced:    The New Name for Data Protection. Free white paper.

Fill out one form and find   your online degree program at FastFind(TM)

Free Webcast:   ow T-Mobile Connects Customers with Real-Time SOA

NetWorld Interop   - Las Vegas, May 1-6, 2005 - Register today! www.interop.com

Web Design   Ceonex Web Design & Web Development Solutions focused on Conversion and Retention

NetWorld+Interop   - Las Vegas, May 1-6, 2005 - Register today! www.interop.com

Sharpening the blades?   Building today's high-density data center

Independent Interoperability and Competitive Testing for Your Storage Solution   Checkout the VeriTest Storage Lab

Serving Your Customers An Outstanding Online Experience   Download this free white paper now!

Sprint - Connects the PGA to the World.   Learn More.

Secure Site Seal Demo   Want to Let Customers Know Your Website is Safe?

MS IT Secrets   Watch a live webcast. Ask field experts questions. Sign up

NEW Glasshouse White Paper from ADIC:   Pathlight VX: A Truly Integrated VTL



   
   
 

Copyright © 2005 Computerworld Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.