Date Archives July 13, 2016

Hadoop – Mapper & Reducer

Definition

MapReduce is a processing technique and a program model for distributed computing based on java. The MapReduce algorithm contains two important tasks, namely Map and Reduce.

  • Map takes a set of data and converts it into another set of data, where individual elements are broken down into tuples (key/value pairs).
  • Reduce task takes the output from a map as an input and combines those data tuples into a smaller set of tuples.

As the sequence of the name MapReduce implies, the reduce task is always performed after the map job.

diagram

Example code is in my project.