Click here to find out more! Click here to find out more!
Free Technology Newsletters
» All 33 InfoWorld Newsletters
Technology & Business Daily
 
InfoWorld
 

Dems score with better data

DNC's Linux warehousing project delivered on '50-state strategy'


November 15, 2006
 

Behind every big success these days, there's probably some darned good IT making it happen. That appears to be the case in the surprising electoral victory by the Democratic Party last week.

New data warehouse solutions commissioned by the Democratic National Committee (DNC) and also by Catalist, a for-profit group backed by a faction of leading Democratic players, are being credited for their part in the Party's strong performance in nationwide midterm elections. Those solutions may have helped Democrats close the gap with tech-savvy Republicans, according to a people involved with the projects and with the party's countrywide get-out-the-vote operation.

The DNC solution, which was commissioned one year ago by DNC Chairman Howard Dean, tapped a new generation of low-cost, Linux-based data warehouse technology to improve the quantity, quality, and availability of voter information used by state Democratic parties during the election turn-out effort. Those close to the project say the new system, part of Dean's so-called 50-state strategy, helped tip close races in the House and Senate in favor of the Democrats.

The solution was developed by Intelligent Integration Systems (IISi) of Boston, a company that develops datacenter solutions and uses a Netezza Performance Server data warehouse appliance to integrate information provided by 45 state-level Democratic parties on about 200 million voters, according to Paul Davis, IISi's CEO.

In addition to the Netezza back end and IISi code, the system uses data quality and cleansing tools from FirstLogic and enterprise integration software vendor Sunopsis, as well as data modeling tools from SPSS, according to a Netezza statement.

The new solution was hosted at a datacenter in Virginia and allowed the DNC to rapidly update so-called "voter files" as state-level party workers provided them with new information. The data was then cleaned up by comparing it to lists of known phone numbers and addresses. The DNC was also able to "overlay" the information and match it to data about individuals in the lists culled from various consumer data stores, Davis said.

Netezza, which makes the technology used by the DNC, is part of a new generation of data warehousing companies that are using commodity hardware such as Seagate hard drives, Intel processors, and hardened Linux operating systems to create low-cost, fast data warehouse appliances, according to Donald Feinberg, of Gartner.

Like incumbent data warehouse players such as Teradata (part of NCR), Netezza uses distributed database intelligence, in which data filtering, processing, and analysis is done on the same device that stores the data.

"They have code running on the hard drive, so you can parallelize the queries and do them as fast as you can lift the data off the hard drive. Fundamentally it results in a two order of magnitude improvement in speed," said Rich Zimmerman, IISi's CTO.

Parallelizing queries to databases is nothing new. However, running parallel queries on inexpensive hardware and software, like Linux and PostgreSQL, and being able to match what high-end vendors like Teradata can offer is new, said Feinberg.  Appliance-based products like Netezza's Performance Server are also easier to maintain, requiring less staff and keeping the cost to implement and run the data warehouse low, he said.

Motivating the DNC's data warehouse project was an effort to improve on the organization's 2004 voter targeting project, which was roundly criticized for providing state-level organizations with inaccurate data in a close race against a well organized Republican opposition.

Gus Bickford, a DNC National Committeeman and voter file expert, says that in the 2004 contest, the DNC had reams of data on voters, but had done little "modeling" to piece it together. "It was like buying all the pieces for a jet engine, but not telling anyone how to put it together."

For example, the DNC outsourced data "cleansing" of state-level voter information in 2004, but got shoddy results, with many records having incorrect phone numbers and lacking multiple addresses that are often necessary to locate voters. State parties that tried to use that data afterwards, in some cases abandoned it altogether because it was unreliable, he said.

Following the DNC playbook for that election year, the group also stopped cleansing data for all but about 18 "swing states," writing off the rest to focus resources where the party felt they mattered most, according to Bickford.

In comparison to 2004, the new system was fast enough to digest updated voter information from all 45 participating states and cleanse it two or three times before Election Day. That meant that state-level operatives got useable data for all 45 participating states, said Bickford and Sullivan.

"The thing I noticed while I was driving around from state to state is that the volunteers were so much happier," said Sullivan. "The difference between phone bank volunteers having, say, 45 percent of the phone numbers accurate versus 70 percent of the phone numbers being accurate is enormous."

Better voter file data from the DNC made a huge difference in the final days of the campaign, said Mark Sullivan, founder of Voter Activation Network (VAN), in Cambridge, Mass., which makes voter file tools that are used by Democrats and other progressive groups.

"There were vast improvements in phone number quality, address quality, and large amounts of consumer data," Sullivan said

While it's unclear whether the DNC data warehouse was a deciding factor in any race, Davis and others cite at least one example of where it came into play: The Florida State Democratic party's efforts to target "Snow Bird" voters from New York and New England states were a direct product of having more detailed voter information with multiple addresses.

"They were able to communicate with Florida voters earlier in the cycle and not wait until people came back in October," said Davis. They could search for people who had addresses in northern states. They never would have been able to do that before," he said.

"[The DNC] put people in areas where we didn't expect big battles to take place, and gave them the tools and [voter] modeling data," said Sullivan. "That played no small part in what happened, especially in some of those states that weren't on the radar."

The DNC database has been a source of controversy in recent months. Following the 2004 debacle, factions within the Democratic Party decided to start their own voter information project, dubbed Data Warehouse and now known as Catalist, under the guidance of former Clinton administration deputy chief of staff Harold Ickes. Ickes was beaten out by Dean to head up the Democratic National Committee.

The Catalist project also bore fruit last week, according to its CTO Vijay Ravindran, who worked for Amazon.com before joining Catalist.

The system, which relies on EMC storage hardware on the back end, was built using open source components such as Linux and MySQL and development frameworks such as Hibernate and Spring. Like the DNC voter file project, Catalist's Data Warehouse is used and is designed to support third-party applications such as VAN, he said.

As for the role Catalist's Data Warehouse played in last week's midterm election, Ravindran said the database of 150 million people was used for a variety of activities including mailings, phone banks, and get-out-the-vote and canvassing efforts in 26 states, affecting around 60 million voters.

Groups using the Catalist data included Emily's List, The Sierra Club, Moveon.org, labor organizations, and America Votes, an umbrella group representing 250 different organizations. Like the DNC, Ravindran pointed to races where the data may have swung outcomes: A group called Women's Voices, Women Vote used modeled data on single, unregistered women in Missouri to target a voter drive in support of Claire McCaskill's successful candidacy against Congressman Jim Talent.

The Sierra Club used the organization's data to target 310,000 infrequent voters who support environmental issues in 33 races across the country, including the successful effort to unseat California Congressman Richard Pombo, Ravindran said.

In general, the system performed well, though Catalist's CTO saw room for improvement.

"Having done this before at Amazon, I know that it doesn't mean anything until you go through your first Christmas. We did scale up for [2006], and we learned how to improve the system for 2007 and 2008," Ravindran said.

In published reports, Ickes has said the Data Warehouse project amounted to a vote of "no confidence" in the DNC effort, and that the newly funded Catalist organization would provide Democrats with voter targeting and data mining capabilities on par with the Republican Party's program (Findit/4689).

Despite the history between powerful figures like Ickes and Dean, Ravindran said that he doesn't pay much attention to the "political dynamic" or history between Catalist and the DNC. He anticipates cooperation in the future.

"We've done the first-tier job of getting [Data Warehouse] off the ground. My fond hope is that the DNC and other progressive groups with valuable data share that data," he said.

Bickford also said he doesn't see the two projects competing with each other. Two voter databases -- one associated with the Democratic Party, the other commercial -- may be fine.

"There are like-minded democratic organizations that will always be extremely happy to have organizations like [Catalist's] Data Warehouse, but would not be able to get information directly from the Democratic Party."

What all those involved with the Democrats get-out-the-vote effort agree on is that the Party has closed the technology gap that previously existed between their party and Republicans.

"The quality of the data has substantially improved. It was a huge step up. And with the overlay data, the state parties can do more, but it's not automated. What we want in the future is to automate that and be able to make intelligent decisions about who to contact," Davis said.

Sullivan agrees, and said that the media underreported the sophistication of the Democratic turnout effort in 2006, and overestimated the abilities of the GOP.

While Democrats now have detailed and accurate enough information to look at sub-areas within individual precincts, they lack the ability to target individuals in the way that Republicans can -- a process called "micro targeting," Davis said.

"The idea that Democrats are doing micro targeting is a myth," Davis said. "If you look at the close races, Republicans were able to do things to narrow the margins, even in this cycle. Their performance was impressive."

However, micro targeting is within reach, and Davis said that the data warehousing solution his company helped develop could work as a platform for it -- for example: using profiles of known voters to match up with other individuals who may be sympathetic, but infrequent poll-goers.

As both Democrats and Republicans reach parity on the technology front, the battle will shift to integrating the various data sources in a seamless manner, said Sullivan.

"We'd like to get the data as soon as its refreshed at DNC, then migrate it into our systems. That would reduce the number of human interventions from what we currently have," he said.

While platforms like Netezza are great for extracting data from huge numbers of records, they aren't well suited to the variety of tasks that enterprise databases perform, Gartner's Feinberg said, noting that Democrats may want to harmonize their competing data warehouse projects.

"The DNC does a lot more than identify voters in Florida or DC, or run a program every two years for an election," he said. "If you have a complete enterprise data warehouse, maybe you can take a subset of that data out and put it on the Netezza box for special functions, like doing targeted campaigns. That way you can run queries against it all day long and not hurt the DNC," he said.

"We're really scratching the surface with what we've done with techniques and technologies," said Catalist's Ravidran. "A decade from now, we're going to look at these first few years where we cleaned up voter lists just so we could do simple queries as the stone age. And they are."

 



TOP NEWS:


» Experts agree: Yahoo is spread too thin
Company's senior VP urges more focus, less spreading like "peanut butter"

» Microsoft: No shutdown switch for Office 2007
Antipiracy feature may emerge as an add-on

» Vodafone moves to boost mobile enterprise offerings
Vodafone makes two acquisitions to move beyond its traditional mobile phone operations

» School shoot-out spurs debate on violent PC games
Germany debates banning violent computer games after teenager shoots five people at school

» Citibank debuts biometric pay system
Citibank lets credit-card holders make payments using their fingerprints instead of credit cards

» JBoss signs research, services deal with Bull
Red Hat will also work with Bull on future development and extension of ObjectWeb




REDUCE THE RISK OF SOA FAILURE €� FEATURING GARTNER
The benefits of SOA are widely recognized, but its implementation isn't always easy. If deployed incorrectly, SOA can be a risk to your IT infrastructure--and your business. Attend our upcoming Webcast and learn best practices for SOA.

»  Click here to view this Webcast
  WAN ACCELERATION SOLUTIONS IT SPONSORED SOLUTION GUIDE
This new InfoWorld Sponsored Solutions guide contains a collection of educational articles and technical resources on WAN acceleration and its impact on disaster recovery, network backup, and WAN performance. Download now, compliments of Silver Peak.

» Click here to download now


- Special Advertising Partners -
WHITE PAPERS
 
>> WHITE PAPERS LIBRARY

WHITE PAPERS E-MAIL ALERT

Find out when the latest white paper is available:

 

»  2006 IT Compliance Benchmark Report - Learn about the various methods that have been identified to improve IT security and regulatory compliance by downloading a free copy of the 2006 IT Compliance Benchmark Report, compliments of ...
»  Solve wireless security worries - Wireless LANs offer enterprises incredible benefits in terms of efficiency, productivity and flexibility. But joining those advantages are a host of concerns, such as security, management, quality of ...
»  Email Security Reaches New Levels to Combat Evolving, Malicious Inbound Threats" - "2006 has seen startling growth in spam email and phishing attacks. Additionally, with organized crime now involved in spam and virus outbreaks organizations require more than a simple anti-spam ...
»  Novell and Intel: Bringing Virtualization to the Data Center
»  The ROI of Moving from UNIX to Linux
»  Using AES Encryption to protect your data

 
MORE STORAGE WHITE PAPERS


WHITE PAPERS BY TOPIC


Application development
Applications
Business
Hardware
Networking
Platforms
Security
Standards
Storage
Telecom
Web services
Wireless
» Solving The Remote Office Backup Dilemma
A solution for data center managers looking to control data protection for remote and branch ...
» Buyers Guide for Online Backup + Recovery for SMBs
Gartner Group projects that by 2008, the majority of data restores will occur from disk, not from ...
» DataScale: Business Overview
Download this white paper today to learn how you can reduce the costs of ongoing operations, ...
» TimeScale: Business Overview
Xiotech's networked storage solutions deliver data resilience across sites. Download this white ...
» Avoiding Big Risks for Midsized Businesses
Disaster recovery is vital. Download this white paper today to learn about a variety of DR ...

 
SPONSORED LINKS  

»  ISS - Shielding VoIP vulnerabilities from attack
»  Microsoft - Visual Studio 2005. Over 400 new features, the difference is obvious.
»  IBM - Take back control with IBM IT Optimization solutions.
»  Adobe - Start creating engaging elearning experiences with Adobe Captivate 2.
»  Oki Data - $150 Rebate NOW off of HD Color Printing- OKI Printing Solutions
 


INFOWORLD MARKETPLACE


» World's Fastest Storage -Speed Applications 2500%
Faster enterprise applications support more concurrent users and handle more simultaneous ...
» Cost effective SATA RAID for Reliable Storage
The NetStor WSS is a Windows Storage Server integrated with Double-Take software for continuous ...
» Six Steps to Success in Storage Consolidation
Six steps can ensure the success of your consolidation project, simplify your workload, and enhance ...
» Drastically Reduce Your Storage Costs
Learn more about the benefits of using an IP SAN with intelligent enterprise-class iSCSI switches.
» Stay connected even when you are out of the office
All you need is a web browser and a PC. NETGEARs ProSafe SSL VPN Concentrator 25 uses the SSL ...


 

FREE SUBSCRIPTION


Order today to get your FREE subscription (a $195 value!) to InfoWorld magazine, the weekly publication that provides indispensable product information to IT professionals.


NOTE: Complimentary subscriptions sent only to those applicants who qualify.

First Name:
Last Name:
Company Name:
Title:
Mailing Address:
City:
State/Province:
Zip/Postal Code:
Email Address:


NOTE: Offer valid in U.S. and Canada only
Non-U.S. click here


FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


Google News gets on the map
On the heels of last week's pact between Google, Microsoft and Yahoo on search with Sitemaps.org, Google has added Sitemaps for Google News, to give organizations more "transparency and control" over which of their content appears on the ...

Updated: Novell CEO talks patents
Novell CEO Ron Hovsepian has issued an open letter addressing concerns about the recent agreement between Novell and Microsoft and how it might impact Linux customers. The full text is reprinted here:Open Letter to the Community from Novell November ...

JON UDELL'S CORNER 


Jon Udell's Column and Blog XQuery and the power of learning by example
(InfoWorld) - If you set out to explore XQuery, the XML query language, you’ll soon encounter a collection...

Jon's Blog | Jon's Column

COLUMNISTS

Trouble in homicide: a network detective story
Anonymous 's Column and Blog (InfoWorld) - Several years ago i found myself working for a major metropolitan police department, mainly building specialty...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Open source databases at least 50% cheaper (TCO)
Forrester Research has discovered the obvious: open source databases are much cheaper than proprietary databases:Noel ...

Database Underground 
Is There a Future for SQLServerCentral.com?
By now it's no secret that SQLServerCentral.com has been bought out by Red-Gate. In fact, I even blogged on it last ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Jon's Radio
• Open Sources
• ProdBlog
• Real World SOA
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
High-Tech Defense Office Takes Lead On Telecommuting
Agency Spending $6.5 Million on Upgrades
Amateur Videos Are Putting Official Abuse in New Light



 HOME  NEWS  COLUMNS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS  CAREERS  IT EXEC-CONNECT   About | Advertise | Awards | Store | Contact Us 

Copyright © 2006, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

PC World :: Macworld :: ComputerWorld :: LinuxWorld :: Network World
JavaWorld :: CIO :: CSO :: Mac Central :: Playlist :: GamePro :: Games.net :: Gamerhelp :: IDG Connect
ITWorld Canada :: Computerwoche :: Techworld UK :: tecChannel :: IDG.se :: IDG.no :: IDG.pl