MapReduce is a project funded by the French National Research Agency (ANR), ARPEGE 2010 call. Project number: ANR-10-SEGI-001.
Map-Reduce is a parallel programming paradigm successfully used by large Internet service providers to perform computations on massive amounts of data. The key strength of the Map-Reduce model is its inherently high degree of potential parallelism: it enables processing petabytes of data in a couple of hours, on large clusters consisting of several thousand nodes.
The storage layer is a key component of MapReduce frameworks. To enable massively parallel data processing to a high degree over a large number of nodes, the storage layer must meet a series of specific requirements: the storage layer is expected to provide efficient fine-grain access to the files, while sustaining a high throughput under heavy access concurrency.
This project aims to overcome the limitations of current Map-Reduce frameworks such as Hadoop, thereby enabling highly-scalable Map-Reduce-based data processing on various physical platforms such as clouds, desktop grids, or on hybrid infrastructures built by combining these two types of infrastructures.
To meet this global goal, several critical aspects will be investigated:
Our global goal is to explore how combining these techniques can improve the behavior of Map-Reduce-based applications on the target large-scale infrastructures. To this purpose, we will rely on recent preliminary contributions of the partners associated in this project, illustrated though the following main building blocks:
Project coordinator:
INRIA Rennes - Bretagne Atlantique
Campus de Beaulieu
35042 Rennes cedex
e-mail: gabriel(dot)antoniu(at)inria(dot)fr