University of Maryland College Park


The research team at the University of Maryland's Cloud Computing Center at College Park is working on a range of projects funded by the National Science Foundation (NSF). They include: A Hadoop Toolkit for Distributed Text Retrieval; Data-Intensive Text Processing; Commodity Computing in Genomic Research; and a series of other independent studies.

  •   Research Projects  
  •   Resources  

Research Projects

A Hadoop Toolkit for Distributed Text Retrieval
Text search is a technology that is vital for modern information-based societies. Today's systems face the daunting challenge of handling quantities of text previously unimaginable. Cluster computing is the only practical solution for addressing the issue of scale. This project leverages the MapReduce framework (via the open-source Hadoop implementation) to tackle issues of robustness and scalability in processing large amounts of data for information retrieval applications.
Cloud Computing Center
The Cloud Computing Center at the University of Maryland, which started in the fall of 2009, consists of individuals drawn from multiple disciplines interested in collectively shaping the future of cloud computing through research, education, and outreach. Their work focuses on applications of cloud computing, cloud architectures and infrastructure, as well as broader societal issues.
Commodity Computing in Genomic Research
This NSF CLuE project focuses on developing parallel algorithms for analyzing the next generation of sequencing data. Scientists can now generate the rough equivalent of an entire human genome in just a few days with one single sequencing instrument. The analysis of this data is complicated by their size - a single run of a sequencing instrument yields terabytes of information, often requiring a significant scale-up of the existing computational infrastructure needed for analysis.
Data-Intensive Text Processing
The NSF CLuE initiative is funding a machine translation project that promises to bridge the language divide in today's multi-cultural and multi-faceted society. Systems capable of converting text from one language into another have the potential to transform how diverse individuals and organizations communicate.

Resources

Presentation: Research and Education with MapReduce/Hadoop: Data-Intensive Text Processing and Beyond PDF
Research and Education with MapReduce/Hadoop: Data-Intensive Text Processing and Beyond


Presentation: Commodity Computing in Genomic Research PDF
Commodity Computing in Genomic Research


Other: Cloud9: A MapReduce Library for Hadoop
Cloud9 was designed to serve as both a teaching tool and to support research in text processing. It was used in cloud computing courses at the University of Maryland. The library itself is available as a big tarball or via anonymous Subversion checkout. Like Hadoop itself, Cloud is distributed under the Apache License.


Video: An Interview with Mihai Pop about his CluE Initiative-Support Research
The assistant professor of computer science at the University of Maryland, Mihai Pop, discusses his research as part of the Clue initiative between Google, IBM and NSF. His research focuses on developing parallel algorithms for analyzing the next generation of sequencing data. Scientists can now generate the rough equivalent of an entire human genome in just a few days with one single sequencing instrument. The analysis of this data is complicated by their size - a single run of a sequencing instrument yields terabytes of information, often requiring a significant scale-up of the existing computational infrastructure needed for analysis.


Video: Research and Education in the Clouds: Experience at the University
In this talk Jimmy Lin, Associate Professor in the iSchool at the University of Maryland, presents Cloud Computing activities at the University of Maryland, in collaboration with Google/IBM under the Academic Cloud Computing Initiative and in partnership with Amazon Web Services. Efforts include semester-long Hadoop courses, as well as research projects in text processing and bio-informatics. He focuses on his attempts to integrate research with education, to the benefit of both.