MapReduce Example

Let’s use the MovieLens dataset as an example and find out how many movies did each user rated.

The MAPPER converts raw source data into key/value pairs

This is how the MovieLens looks like:

  • Map users to movies they watched:

  • Extract and organize data we care about.
  • The less data we put on the cluster, the better.

  • MapReduce sorts and groups the mapped data (“Shuffle and Sort”)

  • The REDUCER Processes each key’s values

  • To summarize:

  • Example on a cluster:

