This is a collaborative study being conducted by MIT, University of Wisconsin, and Yale University. These three universities are using a National Science Foundation CLuE grants for a comparative study of approaches to cluster-based, large-scale data analysis. Both MapReduce and parallel database systems provide scalable data processing over hundreds to thousands of nodes, yet it's important for researchers to know the differences in performance and scalability of these two approaches to know which is more suitable when designing new data-intensive computing applications.
This project is engaged in systems research, much of which requires the ability to change the operating environment. Since this is not possible on the IBM/Google cluster, the project is also hosted on the Cloud Computing Testbed (CCT) operated by researchers at the University of Illinois, Urbana-Champaign (UIUC). The CCT is partially funded by the CLuE program in conjunction with HP, Intel, Yahoo and UIUC Open Cirrus Program.