This becomes exceptionally clean with chaining.
THe last map in the map chain, takes the builds a value out of the key and the value, such that it can be decomposed later.
The output key is the md5 or other stable hash of the original key.
The reducer, or a map in the reduce chain can decompose the value into the original key and value, and process. The results are in random order, AND items with the same key are actually grouped together. into the same reduce call.
You need to be a member of Hadoop Professionals to add comments!
Join Hadoop Professionals