Parallel SQL databases perform up to 6.5 times faster than Google’s MapReduce data-crunching technology, concludes a new research paper by Microsoft technical fellow David DeWitt and Vertica Systems chief technology officer Michael Stonebraker. The paper, “A Comparison of Approaches to Large-Scale Data Analysis,” will be published by ACM in the June 29-July 2 issue of the SIGMOD Record. Google developed MapReduce to index the World Wide Web on its network of low-end PC servers, and as of January 2008 had used MapReduce to process 20 petabytes of data a day. Recent in-house tests, published in November, show that Google used MapReduce running on 1,000 servers to process 1TB of data in only 68 seconds. MapReduce and Hadoop, an open source version of the technology, have gained wide industry support. However, DeWitt and Stonebraker have argued that MapReduce lacks may important features available in databases, and was generally a “major step backward.” Their paper is expected to create controversy over the technical merits of each system. DeWitt and Stonebraker tested two 100-node parallel, “shared-nothing” database clusters against a similarly configured MapReduce cluster of the same size. The researchers found that databases were significantly faster and required less code to implement each task, though databases did take longer to tune and load the data. The researchers also note that MapReduce requires developers to write features or perform tasks manually that can be done automatically by most SQL databases.
For more information please visit: http://www.cpccci.com
Tags: google
This entry was posted on Thursday, April 16th, 2009 at 9:34 pm and is filed under Computer Science and Engineering News. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

