Purdue University


The Purdue University team is investigating linguistic extensions to MapReduce abstractions for programming modern, large-scale systems, with special focus on applications that manipulate large, unstructured graphs. They also provide a cloud computing testbed called Wispy to TeraGrid users.

  •   Research Projects  
  •   Resources  

Research Projects

Eager Maps and Lazy Folds for Graph Structured Applications
This project is investigating linguistic extensions to MapReduce abstractions for programming modern, large-scale systems, with special focus on applications that manipulate large, unstructured graphs. This will impact a broad class of scientific applications.
Wispy
Purdue University provides a cloud computing testbed called Wispy to TeraGrid users. It consists on one frontend and VM image storage nodes, and four dual-CPU VM host machines. The machines have one and a half gigabytes of available memory. The cloud supports virtual machines with real Internet addresses, so researchers are able to get running in the cloud with minimal complications.

Resources

Presentation: Relaxed Synchronization and Eager Scheduling in MapReduce PDF
Relaxed Synchronization and Eager Scheduling in MapReduse


Paper: Towards Optimizing Hadoop Provisioning in the Cloud PDF
By Karthik Kambatla and Abhinav Pathak, Purdue University; Himabindu Pucha, IBM Research Almaden.
Abstract: Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-based analytics are particularly synergistic with the pay-as-you-go model of a cloud platform. However, a key challenge facing end-users in this environment is the ability to provision MapReduce applications to minimize the incurred cost, while obtaining the best performance. This paper firstmotivates the importance of optimally provisioning a MapReduce job, and demonstrates that existing approaches can result in far from optimal provisioning. We then present a preliminary approach that improves MapReduce provisioning by analyzing and comparing resource consumption of the application at hand with a database of similar resource consumption signatures of other applications.