計算機科學技術名家講座
(2013-37)
講座題目:Hierarchical MapReduce: Towards Simplified Cross‐domain Data Processing
主 講 人:駱遠 博士
美國印第安納大學信息與計算學院
講座時間:2013年10月21日下午15:30-17:00
講座地點:前衛南校區計算機大樓A521報告廳
主辦單位:伟德国际BETVlCTOR
伟德国际BETVlCTOR計算機科學技術研究所
伟德国际BETVlCTOR軟件學院
符号計算與知識工程教育部重點實驗室
歡迎廣大師生踴躍參加!
Abstract:
MapReduce is a programming model well suited to processing large datasets using high-throughput parallelism running on a large number of compute resources. While it has proven useful on data-intensive high throughput applications, conventional MapReduce model limits itself to scheduling jobs within a single cluster. As job sizes become larger, single-cluster solutions grow increasingly inadequate. Additionally, the input dataset could be very large and widely distributed across multiple clusters. Feeding large datasets repeatedly to remote computing resources becomes the bottleneck. When mapping such data-intensive tasks to compute resources, scheduling algorithms need to determine whether to bring data to computation or bring computation to data. We present a Hierarchical MapReduce framework that gathers computation resources from different clusters and runs MapReduce jobs across them. The applications implemented in this framework adopt the Map-Reduce-GlobalReduce model where computations are expressed as three functions: Map, Reduce, and GlobalReduce. Two scheduling algorithms are introduced: Compute Capacity Aware Scheduling for compute-intensive jobs and Data Location Aware Scheduling for data-intensive jobs. Experimental evaluations using a molecule binding prediction tool, AutoDock, and grep demonstrate promising results for our framework.