IDG Network: |
||
|
|
||||||||
|
Data Warehouse Boost on a Budget Start-ups are challenging established data warehouse vendors with products that increase performance for ad hoc queries but cost less.
News Story by Robert L. Mitchell APRIL 11, 2005 Premier sells access to clinical data it gathers from 400 hospitals to pharmaceutical manufacturers. Last year, the company's IBM Red Brick data warehouse had grown to 3TB, and one table included 3 billion entries. "When you go through 3 billion rows of data, you get long runtimes," says Chris Stewart, director of data warehouse architecture. The problem wasn't just the size of the database, however, but how clients used the data. "Our users want to access all of the data from top to bottom," says Stewart, and the complex, multipass queries created by Premier's 4,000 users each week were slowing performance. Some wouldn't run at all. Instead of adding to its 24-processor Solaris server infrastructure or making further attempts to optimize the database, Stewart brought in an all-inclusive data warehouse appliance from Netezza Corp. in Framingham, Mass. Some calculations that took one or two days now finish in six to eight minutes on the appliance's 108 processors. Premier still uses Red Brick for most queries, but the NPS 8150 appliance handles the "really, really ugly questions" that weren't possible to process before, he says. "We couldn't offer the product offerings we do today" without the appliance, Stewart says. As data warehouses continue to grow, more users are demanding access to business intelligence (BI) tools to conduct data-mining exercises across large data sets. "We're talking about using every single call-detail record generated in the last three years," says Claudia Imhoff, president of Intelligent Solutions Inc., a consulting firm in Boulder, Colo. It's hard for database administrators (DBA) to create aggregations of data, such as summarizations, that can facilitate the processing of these complex queries because users often don't know in advance what they're looking for. "These unplanned questions are the ones that knock the stuffing out of databases," she says. But such queries are increasingly seen as business-critical, says William Fellows, an analyst at The 451 Group in New York. "The problem of querying data sets that are growing at over 100% a year has led to what might be called a data warehouse capability gap," he says. While market leaders like Teradata, a division of NCR Corp. in Dayton, Ohio, offer integrated systems to address this for high-end applications, Netezza and others are jumping in with moderately priced systems that don't require the same high-end hardware and software investments as those from IBM, Oracle Corp. and Teradata. It's an interesting trend but still a small part of the $16 billion market for data warehouse hardware and software, says Dan Vesset, an analyst at IDC. Small Players, Big Databases Some start-ups offer only software, while others include software and hardware in a single bundle or appliance. But all use a parallelization scheme that involves symmetric multiprocessing or a massively parallel processing architecture. Designs vary, but all are based on the partitioning of data across servers—something Teradata has been doing for years, says Fellows. "There's nothing new under the sun in terms of approach here except packaging and price," he adds. While Netezza and competitors like to position themselves against Teradata, the company still dominates on the high end, he adds. Netezza's NPS appliance abandons database indexes in favor of direct table scans, using brute-force processing to get the job done. The system includes its own database, with specialized field programmable gateway array (FPGA) logic that links processors and storage to speed up I/O. A system comparable to Premier's, with 4.5TB of disk space, sells for "a little more than a million dollars," says Netezza CEO Jit Saxena. By dumping the indexes, Premier's database dropped from 3TB to 1TB. The system is sufficiently fast that Stewart now uses the appliance to both process queries and build the data-aggregation tables that he loads into the Red Brick data warehouse. Start-up Calpont Corp. in Rockwall, Texas, is developing a similar appliance that hard-codes the database on an FPGA chip. Because it will store the data on a solid-state disk, or synchronous dynamic RAM, however, it will be targeted at smaller data sets. A 128GB box capable of supporting 40GB to 50GB of data will have a price tag in the "couple hundred-thousand dollar range," says CEO Jim Janicki. "We wanted a brute-force engine to handle everything we could throw at it," he says of the device, which is scheduled to ship by midyear. Datallegro Inc. in Aliso Viejo, Calif., is rolling out a turnkey system that functions much like the Netezza appliance, but it's built using off-the-shelf components. "We're taking standard, commodity servers with an open-source database," says CEO Stuart Frost. Datallegro's 3TB P3000 includes 21 dual-Xeon-processor servers, each connected to 12 Western Digital Corp. Raptor drives, and will sell for $450,000 when released this month. Frost is targeting Oracle customers with databases in the 1TB to 5TB range and up to 300 concurrent users. Metapa Inc. takes a similar approach but lets users buy their own components based on its specification, rather than bundling everything together. Users "can assemble systems that are just as fast as the high-end data warehouses at a fraction of the cost. We don't believe you need a specialized ASIC chip to get there," says Scott Yara, founder and president of the San Mateo, Calif., start-up. The total price, including Metapa's Cluster DataBase—due to ship in the second quarter—and required hardware, will be half the cost of a Netezza appliance, he claims. Clareos Inc.'s CrossCut software, now available, adds yet another twist. Instead of using database tables, it combines a BI reporting tool with a spreadsheetlike data model that creates a single, flat file of rows and columns. "The next generation of BI tools will have a flat file structure that will be very fast," predicts Steve Foley, CEO of Herndon, Va.-based Clareos. CrossCut software and recommended hardware to process 146GB of data costs about $65,000. But the product differs from products like Netezza's in one key respect: CrossCut is a read-only database that doesn't provide update capability, Foley says. Competitors that use vector-based processing to support a real-time decision-making application include Alterion Inc. and Aleri Inc., says Fellows at The 451 Group. By contrast, Teradata's integrated systems connect clusters of high-performance servers using a proprietary high-speed interconnect called Bynet and store data in a Fibre Channel storage-area network. The vendor focuses on allowing large numbers of concurrent queries in a mixed-workload environment and supports "active data warehousing," where databases are continuously updated, says Stephen Brobst, chief technology officer. He sees the start-ups' products as best suited for single-function, low-end data marts and cautions that "data marts end up replicating data." But that's a trade-off users may be willing to make when cost is a factor. "With an IBM or Teradata solution, your scalability is in large chunks," says the vice president of infrastructure at a large financial services company that's beta-testing a Datallegro system. The incremental cost for adding capacity to an appliance can be a small fraction of what it costs to upgrade his Sun Microsystems Inc. system. He is cautious about buying from a small vendor, but adds, "If they can deliver the same or better performance at 20% of the cost of an IBM or Teradata solution, then you have to do it." Most of these systems take a black-box approach to optimization, which means DBAs can't do any tuning. That paradigm shift may be the toughest sell, says Intelligent Solutions' Imhoff, and it's definitely a weakness for Michael Benillouche, director of technology at ACNielsen Corp., who prefers to optimize his Oracle data marts. But Premier's Stewart sees that as an advantage. "My DBA staff has more time for development instead of hand-holding a database. We don't need to build in cycles to make queries go faster," he says. In traditional systems, ad hoc queries that bog down the data warehouse are restricted, says Imhoff. Now IT can spin off a subset of data to more groups for business analytics without supplying DBA resources. "If I can bring in a technology that doesn't require an army of DBAs, great Scott, what a boost," she says. |
|||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||
Sponsored Links Trade up to DB2 - the world’s leading database platform. Get 50% off: Select Microsoft(R)Windows Server(TM) 2003 E-Learning courses. CDW. Looking for your feedback on Web Site Content Management - Please provide us with your thoughts by filling out this survey. The Dell (TM) PowerEdge (TM) 6850. Take a 360 product tour now. Experience the power of SSL security Click here to register for a free SSL trial ID Hold meetings over the internet from anywhere! WebEx Free Demo - Click here to register Transforming Software Development into a Managed Business Process Download now Join us for Three Steps to Better Reporting: Cognos The Dell (TM) PowerEdge (TM) 6850. Take a 360 product tour now. Take a 360 product tour now. Security Compliance - Map, Scan, and Audit Your Network Today! Free trial available for a limited time! Yosemite Backup Advanced: The New Name for Data Protection. Free white paper. Fill out one form and find your online degree program at FastFind(TM) Free Webcast: ow T-Mobile Connects Customers with Real-Time SOA NetWorld Interop - Las Vegas, May 1-6, 2005 - Register today! www.interop.com Web Design Ceonex Web Design & Web Development Solutions focused on Conversion and Retention NetWorld+Interop - Las Vegas, May 1-6, 2005 - Register today! www.interop.com Sharpening the blades? Building today's high-density data center Independent Interoperability and Competitive Testing for Your Storage Solution Checkout the VeriTest Storage Lab Serving Your Customers An Outstanding Online Experience Download this free white paper now! Sprint - Connects the PGA to the World. Secure Site Seal Demo Want to Let Customers Know Your Website is Safe? MS IT Secrets Watch a live webcast. Ask field experts questions. Sign up NEW Glasshouse White Paper from ADIC: Pathlight VX: A Truly Integrated VTL |
|||||||||||||||||||||||
|
Copyright © 2005 Computerworld Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc. |
|||||||||||||||||||||||
|
|
||