Purdue University
The Purdue University team is investigating linguistic extensions to MapReduce abstractions for programming modern, large-scale systems, with special focus on applications that manipulate large, unstructured graphs. They also provide a cloud computing testbed called Wispy to TeraGrid users.
|
Eager Maps and Lazy Folds for Graph Structured Applications |
|
This project is investigating linguistic extensions to MapReduce abstractions for programming modern, large-scale systems, with special focus on applications that manipulate large, unstructured graphs. This will impact a broad class of scientific applications.
|
|
Wispy |
|
Purdue University provides a cloud computing testbed called Wispy to TeraGrid users. It consists on one frontend and VM image storage nodes, and four dual-CPU VM host machines. The machines have one and a half gigabytes of available memory. The cloud supports virtual machines with real Internet addresses, so researchers are able to get running in the cloud with minimal complications.
|
Presentation: Relaxed Synchronization and Eager Scheduling in MapReduce  |
|
October 22 2009
|
Relaxed Synchronization and Eager Scheduling in MapReduse
|
|
Paper: Towards Optimizing Hadoop Provisioning in the Cloud  |
|
July 22 2009
|
By Karthik Kambatla and Abhinav Pathak, Purdue University; Himabindu Pucha, IBM Research Almaden.
Abstract: Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-based analytics are particularly synergistic with the pay-as-you-go model of a cloud platform. However, a key challenge facing end-users in this environment is the ability to provision MapReduce applications to minimize the incurred cost, while obtaining the best performance. This paper firstmotivates the importance of optimally provisioning a MapReduce job, and demonstrates that existing approaches can result in far from optimal provisioning. We then present a preliminary approach that improves MapReduce provisioning by analyzing and comparing resource consumption of the application at hand with a database of similar resource consumption signatures of other applications.
|
|