GSIC

住所 〒152-8550
東京都目黒区大岡山 2-12-1
Email こちらのアドレスまで

You are here

Rich Wolski 助教授による連続セミナー開催のお知らせ

2006.08.17
日本語

8/22, 23に Network Weather Serviceなどのグリッドなどの大規模分散環境でのモニタリング・スケジューリングや、種々の性能予測の仕事で著名な UC Santa Barbaraの Rich Wolski 助教授が初来日され、東工大・本センターに一週間滞在し、その間2回セミナーを開催します。参加は自由ですので、学生さんを含めてみなさまふるってご参加ください。

なお、第一回のbatch queue prediction システムは、TSUBAME/ 東工大キャンパスグリッドでも実験運用する予定ですので、センターのSEなどの技術関係者もぜひ聴講ください。

会場

第1回予定

  • 日時: 2006/8/22(火曜) 11:00-12:30
  • 題目: Predicting Bounds on the Batch Queuing Delay Experienced by Individual TeraGrid User Jobs in Real Time
  • 概要

第2回予定

  • 日時: 2006/8/23(水曜) 11:00-12:30
  • 題目: Modeling and Predicting Resource Availability in Federated Distributed Computing Environments
  • 概要

第1回概要


Predicting Bounds on the Batch Queuing Delay
Experienced by Individual TeraGrid User Jobs in Real Time


Rich Wolski
University of California, Santa Barbara

In this talk, we present a new method for providing end-users with real-time predictions of the bounds on queuing delay individual jobs will experience when waiting to be scheduled to a machine partition. Predicting the delay users will experience while waiting for their jobs to be be scheduled is a problem that has been studied both by the academic and commercial HPC communities for some time. Our approach, based on a new statistical methodology, predicts bounds on the waiting time (upper or lower) that individual jobs will experience with quantified confidence measures. Thus the predictions made by this system constitute a statistical guarantee of best-case and worst-case waiting delay where the confidence measure quantifies the quality of the guarantee.

We have implemented this new methodology as part of the Network Weather Service and deployed it on several large-scale systems (TegraGrid, Datastar at SDSC, Lonestar at TACC, etc.) where it currently provides real-time bounds predictions. In the talk we will report on the effectiveness of the system which has been in operation as a prototype for approximately 8 months. We will discuss the methodology and its evaluation using batch-queue logs spanning 10 years at the NSF and open DOE supercomputer centers. We will also demonstrate the web interface to the system and make "live" predictions of delay bounds during the presentation from the web page located at

http://nws.cs.ucsb.edu/batchq

and we will detail the operation of a set of command-line tools that are portable among all national Extended Terascale Facility (ETF) architectures.

Our results show that it is possible to predict delay bounds with specified confidence levels for individual jobs in different queues, and for jobs requesting different ranges of processor counts and different maximum execution delays Using these predictions, users with roaming allocations or with allocations at multiple sites can choose the machine that is most likely to minimize turn-around time. Users can also determine the probability that a job will meet a specified deadline in a particular queue. Finally, the system is portable to all ETF architectures making it possible for users to consider the use of heterogeneous resources, and to predict which is most likely to impose the shortest waiting time for their jobs.

第2回概要


Modeling and Predicting Resource Availability in
Federated Distributed Computing Environments


Rich Wolski
University of California, Santa Barbara

Computational Grids, global Internet computing, Autonomic, and Peer-to-peer systems promise new levels of computing performance and storage capacity by allowing users to harness globally distributed resources dynamically. To succeed, these systems require effective models of resource behavior that foster new algorithm designs, new simulation capabilities, and new dynamic scheduling techniques. In particular, modeling resource availability is of critical importance, both to the design and to the implementation of new global computing systems.

In this talk, we discuss the problems of modeling and predicting resource availability in federated and distributed computing environments (Grids, P2P systems, etc.) as well as a new approach we have developed for addressing these problems. Using data from an enterprise-wide compute setting, the Condor cycle-harvesting compute infrastructure, and an Internet host availability survey, we describe how accurately we can fit a member of the Weibull family of statistical distributions to the empirical behavior. We compare our results, in terms of modeling power, to the use of both exponential and Pareto distributions -- two of the currently prevalent methodologies -- and find that our fitting techniques provide significantly greater accuracy. We also compare the accuracy of these parametric approaches to non-parametric techniques for predicting bounds on availability. Finally, we discuss the ramifications for simulation, distributed algorithm design, and on-line predictions that these results may have.

PageTop