A Comparative Study of Approaches to Cluster-Based Large Scale Data Analysis


This is a collaborative study being conducted by MIT, University of Wisconsin, and Yale University. These three universities are using a National Science Foundation CLuE grants for a comparative study of approaches to cluster-based, large-scale data analysis. Both MapReduce and parallel database systems provide scalable data processing over hundreds to thousands of nodes, yet it's important for researchers to know the differences in performance and scalability of these two approaches to know which is more suitable when designing new data-intensive computing applications.

This project is engaged in systems research, much of which requires the ability to change the operating environment. Since this is not possible on the IBM/Google cluster, the project is also hosted on the Cloud Computing Testbed (CCT) operated by researchers at the University of Illinois, Urbana-Champaign (UIUC). The CCT is partially funded by the CLuE program in conjunction with HP, Intel, Yahoo and UIUC Open Cirrus Program.

  •   Resources  
Presentation: A Performance and Usability Comparison of Hadoop and Relational Database Systems PDF
Share
A Performance and Usability Comparison of Hadoop and Relational Database Systems


Presentation: HadooDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads PDF
Share
HadooDB An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads


Paper: A Comparison of Approaches to Large-Scale Data Analysis PDF
Share
A Comparison of Approaches to Large-Scale Data Analysis


Share
contributing
research organizations