| |||
------------------------ Volume 18, Number 28 ------------------------ September 23, 2002 ------------------------ |
DBMS in a box A company called Netezza is launching an appliance dedicated to speedily answering complex business-intelligence queries Sometimes, it's hard to imagine just how much the computer has changed the business of business. The cumulative productivity gains have been so enormous that it's difficult to comprehend how thriving global enterprises and even empires were built and maintained by clerks marking up paper records and exchanging them through the post. But it doesn't take too much imagination to realize that the amount of data being gathered every day is now beyond human capacity to analyze. All the clerks in all the counting houses of Victorian England or those at the adding machines at companies in early-20th century America couldn't hope to keep up with the torrents of data being generated by ordinary commerce in a single 24-hour period today. As a result, one of the most important and difficult challenges in computing is to get the data back out of the databases and digital repositories where it's stored and make sense of it. Or, put another way, to distill masses of data records into usable insight and information. The vast investments in online transaction processing, customer relationship management, enterprise resource planning, and other data-intensive tasks are only one factor. A day's worth of cell phone traffic, the click streams and credit card transactions of Web-based e-commerce, the unceasing shuttle of assets on world's financial markets, the event logs at data centers, the analysis of shopping baskets at supermarkets and stores all across the planet--all of these activities are generating terabytes upon terabytes of new data. And that, as some wag once put it, is a heck of alottabytes. Thanks for the memory As we've noted recently in these pages, database technology is the focus of a great deal of attention from venture investors right now (CL 18/11, April 29, 2002). Companies such as Aleri, Alterian, Required Technologies, and TimesTen are pushing new approaches to building fast, flexible database software for a variety of applications. For instance, Aleri and Alterian have come up with new ways of organizing data to speed up complex queries. TimesTen, among others, aims to turbocharge database performance by making much greater use of main memory, which, though once very costly, is cheap and plentiful today. Coherity, eXcelon, XML Global, and others are focused on the challenges of organizing and retrieving XML-tagged data. While we've little doubt that some of the new database management software schemes will prove their worth and that some very significant advances may emerge, we're particularly intrigued by a startup called Netezza, based in Framingham, Mass. It's blasting away at the database problem from a much different direction, namely the hardware side. Netezza's strategy is to build a specialized, high-performance appliance that will nothing but run analytic queries against massive, multi-terabyte databases. Not a general-purpose machine, Netezza's is a fresh approach to the business intelligence and data analytics problem with an innovative architecture that promises some stunning performance gains: Netezza says its machine can deliver ten to twenty times the database performance performance of standard database software running on standard processors, and at half the cost. Move it As one might imagine, Netezza's appliance gains its speed from the use of parallelization, both in its processors and in disk storage. One of the central problems in grinding through large databases is the heavy traffic load typically involved in retrieving records from storage and moving them to the central processor for sorting and so-called table joins--the classic von Neumann problem writ large. Instead of moving large amounts of data from disk to the processor and back again, Netezza in effect moves the processing to the data. That is, it puts the processor silicon--in the form of field-programmable gate arrays (FPGAs)--right at the disk drives holding the data records being searched and does as much processing as possible there. The result is a vast reduction in the crippling overhead of input/output (I/O) operations that typically hamper traditionally-architected database setups. Running in place Netezza calls its approach "intelligent query streaming." In its box, data is moved or copied only when absolutely necessary, and only the most relevant information gets pulled off disk. Running the show at the front-end are one or two powerful SMP (symmetric multi-processor) hosts, which also take advantage of parallelism. Netezza says its architecture--which it dubs asymmetric massively parallel Processing, or AMPP--provides significant performance gains and scales well for attacking large problems. Its initial servers, called the NPS 8000 series, can be configured to store as much as 18 terabytes of data. In key elements of the design, Netezza relies heavily on standard components--an SMP Linux server as host for compiling and distributing queries to the FPGAs and aggregating results, for example, and off-the-shelf processors at the data storage arrays, and gigabit-Ethernet switches for communication between host and storage arrays. In its largest configuration, capable of handling as much as 18 terabytes of data, the Netezza system incorporates 450 processors, all but two of them FPGAs located right up against the disk drives. This setup would take up four standard equipment racks. Less pain, more gain As Netezza describes it, several trends have contributed to laying the groundwork for such a hardware-based approach. The market demand for better, faster database setups is unquestionably real-and growing, even as general spending on information technology is severely constrained. The cost of disk storage continues to plummet. And the astonishing growth in the amount of data that must be put through this kind of analysis-rates on the order of 30% to 50% a year-is a source of considerable pain for large enterprises, particularly those in transaction-rich businesses such as retail, financial services, and telecommunications. The data they gather and need to analyze goes right to the heart of their businesses: cell phone call data, detailed website logs, point-of-sale data, customer transaction records, and so forth. As Netezza CEO Jit Saxena points out, the growth in data volumes significantly exceeds the general gains in computing performance that Moore's law has so reliably delivered. What's more, the data itself is changing at a very fast rate, adding to the burden. Only connect Another key element of Netezza's hardware appliance strategy is the standardization that has shaped the relational database marketplace. By adhering to standards such as SQL (structured query language) and ODBC (open database connectivity), Netezza's not requiring customers to reformat their data or scrap the software or files they're already using. Indeed, the company has moved specifically to align itself with popular business intelligence applications from firms such as Business Objects, Microstrategy, SAS Institute, and others. Its appliance is also designed to work with existing data warehousing and application integration tools so that customers can rapidly load and process new data in real-time. In short, Netezza has taken a plug-and-play approach on a grand scale. Naturally enough, Netezza's target markets include large enterprises in transaction-heavy markets. Its database machines start at $625,000 and range to more than $2 million apiece. The most direct competition Netezza faces is not primarily from other startups but from Teradata (a unit of NCR, which we understand is considering a spin-out of Teradata into a separate company) and giants like IBM. Many customers are running analytics on clusters of Sun Microsystems servers and other general-purpose machines. Teradata has been notably successful in high-end data warehousing with a machine that gangs together scores or even hundreds of Pentium processors and dedicated disk drives. A so-called Y-net connection architecture aggregates results pulled from the disks. Teradata, active since the late 1980s, has done particularly well in retail and telecommunications markets, with companies such as Wal-Mart and AT&T being two of its early customers. Virgin territories Netezza's design, employing software and hardware standards that didn't exist when Teradata started out, should be extremely competitive in certain markets. It's un-likely that Teradata or Sun users would simply unplug their current installations in favor of new machines from a startup, no matter how impressive it is. So, Netezza is banking on initially winning customers that could well use but have not been able to afford the levels of performance its boxes can deliver. The starting point for its machines' performance is past the crunch point for many traditional solutions. In early tests with an unidentified financial services outfit, for example, Netezza says that the task of merging 250 million records for a credit-risk analysis took less than 5 minutes versus nearly 12 hours on a setup centered on a cluster of Sun servers. Now, we tend to discount database performance figures as so much marketing talk. Database benchmark tests are notoriously susceptible to tweaking and fine-tuning by vendors determined to make their products shine. But if Netezza can come even close to that kind of performance gains, it would appear to be quite well-positioned in a tough market. Helping, too, is that it has some good money behind it. The firm closed a series A round of more than $8 million from Matrix Partners and Charles River Ventures in December, 2000. The B round, of $20 million in January 2002, was led by Battery Ventures, with Matrix and Charles River also investing. Mr. Saxena, a co-founder and CEO, was founder and CEO of CRM vendor Applix, which he took public in 1994. Previous to that, he worked at Data General. Last May, Netezza named Ed Zander, Sun's just-departed president and chief operating officer and another Data General veteran, to its board. The company has also hired the former head of Compaq's High Performance Technical Computing unit as senior VP of product development, and its vice president of worldwide sales has joined from Teradata. Echoes of the past We've seen numerous database machines be proposed and even brought to market since as far back as the late 1970s, but none of them has done particularly well. (Only a company called Britton-Lee managed to survive for more than a year or two.) Evidently, although the hardware appliance strategy is well-accepted by now, there's a clear risk in building special-purpose machines. The history of computing is littered with companies, like Silicon Graphics (SGI), that staked out a high- performance market with proprietary hardware, enjoyed stunning success for a time, but eventually got overtaken by the inexorable gains of general-purpose designs. Indeed, the family tree of Netezza's own founders and technical team includes Data General, one of several Massachusetts-based victims of the now ubiquitous and general purpose hardware, the microprocessor. Standards time Netezza may be different. Investor and board member Ollie Curme, of Battery Ventures, notes that there's a new set of factors at play in the database game. Technologies solving CPU-bound problems, like SGI's graphics-processing, were directly in the line of fire as the standard microprocessor advanced. In Netezza's case, it's not the CPU that's a primary issue but the disk I/O and piping of data between disk storage, memory, and processors. The internal connection of these elements is the biggest potential bottleneck in any machine that attempts to handle so-called table joins and process complex queries against terabyte-scale databases. In fact, Netezza's use of standard processors and other components actually keeps its manufacturing costs to a minimum. Netezza has shipped two of its machnines so far, including one to an outfit called Epsilon that uses high-performance analytics to manage large-scale marketing campaigns for companies. The product is being made generally available starting today. Flood insurance In any case, this is one market where there's no turning back. International Data Corp. (IDC) reckons that the business intelligence market will reach $7.5 billion in revenues by 2006. But simple numbers don't tell the whole story. Nor does the old label of "business intelligence," which always struck us as a bit of a misnomer. (Where does that leave everything else?) Whether Netezza succeeds or fails, someone has to solve the problems brought on by the confluence of several key trends: ferocious declines in the price and availability of storage (even ordinary PCs can now store hundreds of gigabytes); the instrumentation of just about everything that moves, clicks, or blinks; and the advent of cheap bandwidth and processing power for collecting huge volumes of data. By some estimates, enterprises are accumulating 100,000 times more data per month today than they did just a decade ago. Indeed, Netezza points to a major enterprise that was trying to cope with the fact that it took them 26 hours to analyze 24 hours worth of data. That's a recipe for failure-and an opportunity for any vendor that can provide a solid, dependable solution. In short, this isn't a once-in-500-years flood on the Mississippi River, with the water levels expected eventually to recede. The water isn't going back down. For instance, the story of intelligence failures prior to the Sept. 11 attacks includes the painful fact that many potentially meaningful clues were buried in the mountains of data that U.S. agencies had previously collected, but no one had properly analyzed. Whether it's streaming multimedia entertainment, profiling airline passengers, collecting intelligence on communications traffic, or cross-selling financial products or toothpaste, virtually every technical and market trend points to collecting and analyzing more and more data, not less. Whether Netezza can deliver on its promise and break away from the competition won't be clear for some time. But it is already clear that the opportunity it has identified is huge and growing huger.
|
Register for
a free trial to ComputerLetter | |
VentureWire is a trademark and service mark of Technologic Partners ©2002 Technologic Partners | |||