Powered by GitBook

MapReduce Example

Let’s use the MovieLens dataset as an example and find out how many movies did each user rated.

The MAPPER converts raw source data into key/value pairs

This is how the MovieLens u.data looks like:

Map users to movies they watched:

Extract and organize data we care about.
The less data we put on the cluster, the better.

MapReduce sorts and groups the mapped data (“Shuffle and Sort”)

The REDUCER Processes each key’s values

To summarize:

Example on a cluster:

results matching ""

No results matching ""