|
Indeed, Picciano said, large ISVs (independent software vendors)
such as Nextance
and Justsystems have
shifted to using XML as an internal representation format over the
past year or two. "As we've introduced Viper to them, they've said,
'This is exactly what we were hoping somebody would step up and
do,'" he said.
If people's interest in XML has flagged, it's not because XML
isn't out there; rather, it's the fault of inelegant XML handling
databases, Picciano said. "Because today's generation of XML
handling databases has been woefully inefficient in handling XML
data, many customers have kept it separate," he said. "It's been
spread out across disk systems, not centrally managed as a data
asset. If you talk to some people in IT, they're unaware of the
amount of XML their organizations are using. … But I really have yet
to go to a client and talk to them about XML as a data technology
and have them say, 'We don't have any need for that.'"
Viper's XML power has gotten all the press. But two other biggies
catch the eye of Bloor Research's Howard: new compression
technologies, brought with Viper's "Venom" technology, and the fact
that Viper is IBM's stake in the ground in the data warehousing
space.
He pointed to IBM's Data
Warehousing BCU (Balanced Configuration Unit) as being the first
positive move to be taken by a data warehousing company in the face
of appliance vendors starting to muscle in. "Appliance vendors are
starting to get a lot of traction," he said. "It's hurting Teradata,
[and] it will hurt Oracle. IBM is the only mainstream vendor taking
steps to compete with the likes of DATAllegro … and Netezza."
The appliance companies offer bundled data warehouse appliances
that practically eschew administrative costs, Howard said, delivered
as preinstalled software on hardware platforms.
To deliver the BCU, IBM took its experience with data warehousing
and created a set of best practices, multidimensional clustering,
summary tables and more to preinstall DB2 in a data warehousing
environment.
It's a turnkey solution, Howard said, and it could help IBM grab
market share from its arch competitor, Oracle. "[IBM is trying] to
minimize the management overhead," he said. "I don't think that goes
the whole way to answer the threat of Netezza, but Oracle hasn't
moved at all to compete with [that threat]. IBM is bundling BCU with
a hardware platform, [whereas] Oracle doesn't have a hardware
platform. This is potentially a threat to Oracle."
Independent analyst and eWEEK contributing columnist Charlie
Garry disagreed with Howard, however, saying that the BCU is, in
fact, another instance of IBM playing catch-up.
"In the past, IBM simply had not standardized on a subset of
hardware and storage for their warehouse implementations with DB2 on
Unix/Win/Linux," Garry wrote in an e-mail exchange. "This meant a
great deal of configuration on-site and delayed successful
implementations of DB2 as a warehouse database. The BCU is simply a
way for IBM to sell a bundled set of hardware and software that they
have great experience with and can more accurately predict
performance for across a range of workloads. This helps to speed up
implementations.
IBM is not, in fact, the first vendor to come up with such a
solution, Garry said—Teradata got there first, and IBM is, wisely
enough, following suit. "Teradata has always operated in this
fashion," Garry wrote. "It should be pointed out that Teradata
competitors spread FUD about this approach, claiming the proprietary
nature of Teradata systems. No one could argue with the fast time to
value, however, and the share Teradata has taken over the past five
years. Now IBM is doing it and it is a good approach, but it is not
in any way an answer to the warehouse appliance vendors. We are not
talking about a single piece of hardware containing server, storage,
and software that you create tables on and load data. To put it more
succinctly, a warehouse appliance could be up and running in a
matter of hours after delivery while an IBM BCU or a Teradata system
would take much longer. The point is that the BCU is not an answer
to the appliance vendors.
"Is certainly not a turn-key solution," Garry wrote. "But then
again, no data warehouse is, appliance or not."
The separate XML storage engine is an interesting approach,
similar to the modular approach that MySQL has taken with its
storage engines. While IBM will attempt to convince customers they
need this, it is more likely that this technology is in DB2 to
support IBM's own content management and data integration strategy.
I see DB2 becoming an increasingly embedded solution for IBM in the
future versus a stand-alone database offering.
Many of the features are old relative to most other databases as
pointed out in the article. Multidimensional clustering was known as
a clustering index back in the day when I supported DB2 on the
mainframe. Range partitioning has been in the mainframe version for
perhaps 15 years. Now IBM trumpets the combination of those
technologies with the hash partitioning they already sold as the
data partitioning offering for DB2 on Unix/Win/Linux. These things
have improved performance and the reason we know this is because
other databases (even DB2 on zSeries) have used them before. The key
to success will be in the Design Advisor which helps administrators
make physical design decisions after the fact. If those
recommendations are good, if they can be implemented without
creating a great deal of effort and added expense, then IBM will
have something.
Beyond BCU, Viper packs loads of features that are
warehouse-friendly, Picciano said. The list includes improved large
database management and table partitioning. Table partitioning is a
data organization scheme in which table data is divided across
multiple storage objects called table partitions or ranges according
to values in one or more table columns. These storage objects can be
in different table spaces, in the same table space or a combination
of both.
The benefits of table partitioning include the ability to create
very large tables. A partitioned table can contain vastly more data
than an ordinary table. By dividing table data across multiple
storage objects, users can significantly increase the size of a
table.
Other warehouse-friendly capabilities include more-flexible
administration capabilities. Users can now perform administrative
tasks on individual data partitions, breaking down time-consuming
maintenance operations into a series of smaller operations.
Viper also comes with more granular control of index placement.
Indexes can be placed in different table spaces and managed
individually.
In addition, Viper brings fast, easy roll-in or roll-out of data.
This ability can be particularly useful in a data warehouse
environment where you often move data in and out to run
decision-support queries, Picciano said.
Meanwhile, Viper comes with improved query performance.
Separating data with table partitioning allows users to improve
query processing performance by avoiding scans of irrelevant data.
As far as Venom compression technology goes, Bloor Research's
Howard described IBM's approach as tokenization. The software looks
for patterns that occur in the data. So if you're looking at a
customer record and you see Michigan occur, you store a token that
indicates the string "Michigan." The token is stored in a lookup
table in the data dictionary, thus saving storage space.
IBM claims between 30 and 70 percent storage savings. That
depends on the application, of course, Howard pointed out, and on
how much repetitive data you're talking about.
Venom also raises one immediate issue, Howard said: Namely, if
you have to compress and then decompress data to access it, there
will be an overhead involved. Will that then lead to a performance
hit?
As it turns out, it doesn't, given Venom's reliance on use of
in-buffer data storage, as opposed to disk storage, along with its
compressed run-time. There's less back-and-forth to the disk, which
actually can result in slight performance improvements, Picciano
said.
"We have seen some modest performance gains, in the transactional
and analytical spaces," he said. "Mostly [performance is] the same,
with maybe a bit of advantage."
As far as storage savings go, Picciano said IBM is seeing
"tremendous results from customers and analysts," on the order of 55
percent direct disk savings.
Next
Page: The bottom
line. |