Paper: Self-Tuning Virtual Machines for Predictable eScience  |
|
May 19 2009
|
By Sang-Min Park and Marty Humphrey.
Abstract: Unpredictable access to batch-mode HPC resources is a significant problem for emerging dynamic data-driven applications. Although efforts such as reservation or queue-time prediction have attempted to partially address this problem, the approaches strictly based on space-sharing impose fundamental limits on real-time predictability. In contrast, our earlier work investigated the use of feedback-controlled virtual machines (VMs), a time-sharing approach, to deliver predictable execution. However, our earlier work did not fully address usability and implementation efficiency. This paper presents an online, software-only version of feedback controlled VM, called self-tuning VM, which we argue is a practical approach for predictable HPC infrastructure. Our evaluation using five widely-used applications show our approach is both predictable and practical: by simply running time-dependent jobs with our tool, we meet a job’s deadline typically within 3% errors, and within 8% errors for the more challenging applications.
|
|
Paper: Feedback-Controlled Resource Sharing for Predictable eScience  |
|
October 22 2008
|
By Sang-Min Park and Marty Humphrey.
Abstract: The emerging class of adaptive, real-time, data-driven applications are a significant problem for today’s HPC systems. In general, it is extremely difficult for queuing-system-controlled HPC resources to make and guarantee a tightly-bounded prediction regarding the time at which a newly-submitted application will execute. While a reservation-based approach partially addresses the problem, it can create severe resource under-utilization (unused reservations, necessary scheduled idle slots, underutilized reservations, etc.) that resource providers are eager to avoid. In contrast, this paper presents a fundamentally different approach to guarantee predictable execution. By creating a virtualized application layer called the performance container, and opportunistically multiplexing concurrent performance containers through the application of formal feedback control theory, we regulate the job’s progress such that the job meets its deadline without requiring exclusive access to resources even in the presence of a wide class of unexpected disturbances. Our evaluation using two widely-used applications, WRF and BLAST, on an 8-core server show our approach is predictable and meets deadlines with 3.4 % of errors on average while achieving high overall utilization.
|
|