Portals eNewsletters Web Seminars dataWarehouse.com DM Review Magazine
DM Review | Covering Business Intelligence, Integration & Analytics
   Covering Business Intelligence, Integration & Analytics Advanced Search
advertisement

RESOURCE PORTALS
View all Portals

WEB SEMINARS
Scheduled Events

RESEARCH VAULT
White Paper Library
Research Papers

CAREERZONE
View Job Listings
Post a job

Advertisement

INFORMATION CENTER
DM Review Home
Newsletters
Current Magazine Issue
Magazine Archives
Online Columnists
Ask the Experts
Industry News
Search DM Review

GENERAL RESOURCES
Bookstore
Buyer's Guide
Glossary
Industry Events Calendar
Monthly Product Guides
Software Demo Lab
Vendor Listings

DM REVIEW
About Us
Press Releases
Awards
Advertising/Media Kit
Reprints
Magazine Subscriptions
Editorial Calendar
Contact Us
Customer Service

How Appliances Benefit Grid Computing:
A Data Center Greater than the Sum of Its Parts

  Article published in DM Direct Newsletter
March 24, 2006 Issue
 
  By Bill Blake

A grid computing architecture is an optimally shared pool of hardware and software resources. The appeal of the grid approach to the enterprise is that it dynamically allocates - or "virtualizes" - computing resources so that they may be provisioned for specific tasks as demands require. By allocating tasks across all systems, the grid aligns computing power with business priorities and ensures that all resources are fully used.

As more data centers adopt grid computing, the anticipated benefits are increasingly being realized. These benefits include more effective use of IT resources and lower operating costs because of better capacity utilization, with a significant reduction in resource management and maintenance. However, organizations are finding that grids made up of only general-purpose components are inadequate to meet performance needs and user expectations for data-intensive initiatives such as business intelligence (BI).

Limitations of a General-Purpose Grid

A grid of standard systems works well for general-purpose computing. Unfortunately, these same standard systems have a basic disadvantage when querying massive amounts of data using a BI application: data must be moved from storage to server memory before any processing can be done on it. The result is a huge I/O bottleneck as billions or even hundreds of billions of rows of data shuttle back and forth across the grid - particularly when queries require iterative processing.

Thus, a grid of only general-purpose components doesn't provide any performance benefit for complex queries because the basic processing approach is still the same. Regardless of how good the individual components are that the grid will virtualize to handle complex analyses, the sum total is still slower than a system designed ecifically for the task.

The Benefits of an Appliance Approach

In order to optimize query performance for massive data sets, virtualization of resources must be done using an architecture that moves processing to where the data is stored, eliminating unnecessary data movement. This is accomplished by a data warehouse appliance, an innovative system approach that architecturally integrates database, server and storage in a "grid in a box," putting processing power right next to the data in the form of hundreds of "smart disks" operating in parallel.

This ability to bring processing to the data - rather than the other way around - eliminates traditional bottlenecks that occur when moving sets of records through a network to a processor. By bringing this optimized appliance into a grid environment, complex BI queries can be addressed with the right computing power, freeing up standard systems on the grid for general-purpose computing, and dramatically increasing the performance and flexibility of the entire grid. For example, applications such as clickstream and telecom call detail record (CDR) analysis show significant performance improvements when run on a data warehouse appliance. The performance benefit that a data warehouse appliance (or other purpose-built appliance) offers over general-purpose components is magnified when incorporated into a grid, where the specialized resources can be further leveraged by many applications and users.

Additionally, appliances in a grid environment take the grid's inherent management simplicity one step further. By definition, a "grid-enabled" appliance supports standard grid interfaces, allowing the appliance to be discovered, allocated and managed like any other component. Because of the appliance architecture, the encapsulated set of appliance components looks like a single resource, simplifying grid operations because there are fewer total components that need to be managed.

For example, a conventional data warehouse deployed across general-purpose components requires a grid management entity (GME) to allocate and manage the necessary servers, networks and disk space. The data warehouse appliance handles all this activity automatically. Therefore, using appliances results in what is effectively a grid of self-managed sub-grids. The simplicity of the appliance means that the GME is left to manage a much smaller number of components, easing overall grid administration.

The ability to optimize resource allocation and management via grid technologies allows organizations to cost-effectively deploy larger numbers of components than was previously possible. As a result, current computations are solved faster, enabling organizations to tackle larger and more complex problems. Data warehouse appliances further enhance this benefit with their ability to handle and analyze the massive amounts of data often associated with these complex business problems.

The Right Tool for the Job

A grid that combines appliances and general-purpose components provides an opportunity for best-of-breed resource allocation, as well as simplified resource management. That is, when deploying a multitier application that is the combination of many processes, each of the individual processes can be applied to the resource most suited for the job. For example, when deploying a BI application on a grid, the GME would assign the massive data storage to a data warehouse appliance of an appropriate size and set of capabilities. Similarly, the GME would assign the client and middle application tier functions to one or more SMP servers and an associated amount of disk storage. As the BI application executes, the data warehouse appliance performs the analysis on massive amounts of data, quickly returning only the data relevant to the query.

A Model Deployment

An example of the grid approach outlined might include a collection of interconnected (networked) components under the control of a GME. Within this model, grid components can include traditional resources (e.g., servers, network devices and disk arrays), purpose-built appliances, applications and services (e.g., databases and enterprise resource planning services) and the meta-resources and subcomponents that make up higher-level components. The GME is the logical entity that manages the grid components, the relationships between various grid components and their life cycles.

Figure 1 shows a high-level representation of this grid structure, including the service-oriented architecture (SOA) layers for applications and grid resources, the GME and the physical grid resources. Deployment of an application across such a grid involves allocating and provisioning resources, performing the actual work and then potentially decommissioning the resources so they can be used for other tasks.

Figure 1: Grid Structure

Consider, for example, the BI analysis of customer purchase transactions. Terabytes of data are collected in an operational database, extracted, cleansed and then loaded into an analytic data warehouse. Once the data is stored in the warehouse, follow-on tasks are invoked to analyze the data as well as back it up.

The GME uses service-level requirements and deployment parameters to commission the most appropriate resources for the tasks to be performed. For this particular example, the operational database might be deployed on general-purpose servers and storage while the data warehouse could be assigned to a data warehouse appliance. The BI application, cleansing tool and backup storage are similarly allocated to available and suitable resources. Once all tasks are complete, the grid resources are returned to the shared pool for allocation to other tasks/applications as needed.

At Home in the Grid

While there is much work ahead in developing standards and grid-enabling components, grid computing is clearly gaining acceptance and delivering benefits within the enterprise. However, the specific components that are assembled into a grid are as important as the grid itself, as they can have a big impact on performance and administration - especially for data-intensive operations. Building a grid with appliances - each a grid itself - and general-purpose components leverages the different technologies in a way that improves overall performance and resource management. The approach is cost-effective, easy to manage and provides an enterprise computing environment greater than the sum of its parts.

Bill Blake is the senior vice president of product development at Netezza, the market leader in enterprise-class data warehouse appliances.  He has more than two decades of industry experience in the product management and development of high performance computing systems. 

...............................................................................

For more information on related topics visit the following related portals...
DW Administration, Mgmt., Performance and Grid Computing.

Bill Blake is the senior vice president of product development at Netezza, the market leader in enterprise-class data warehouse appliances. He has more than two decades of industry experience in the product management and development of high performance computing systems.

Solutions Marketplace
Provided by IndustryBrains

Design Databases with ER/Studio: Free Trial
ER/Studio delivers next-generation data modeling. Multiple, distinct physical models based on a single logical model give you the tools you need to manage complex database environments and critical metadata in an intuitive user interface.

Data Validation Tools: FREE Trial
Protect against fraud, waste and excess marketing costs by cleaning your customer database of inaccurate, incomplete or undeliverable addresses. Add on phone check, name parsing and geo-coding as needed. FREE trial of Data Quality dev tools here.

Data Mining: Levels I, II & III
Learn how experts build and deploy predictive models by attending The Modeling Agency's vendor-neutral courses. Leverage valuable information hidden within your data through predictive analytics. Click through to view upcoming events.

Click here to advertise in this space


E-mail This Article E-Mail This Article
Printer Friendly Version Printer-Friendly Version
Related Content Related Content
Request Reprints Request Reprints
advertisement
Site Map Terms of Use Privacy Policy
SourceMedia (c) 2006 DM Review and SourceMedia, Inc. All rights reserved.
SourceMedia is an Investcorp company.
Use, duplication, or sale of this service, or data contained herein, is strictly prohibited.