Hello all,
I am going thru source code of mapreduce part. For experiment purpose, I try to retain the tmp directories created on node's local file system when a mapreduce job is running, i.e. "map.local.dir" + mapred/local/tasktracker/jobcache/job_xxxxxxxx/ .
So I commented out some functions for cleaning up, like TaskTracker.TaskInProgress.cleanup(), TaskTracker.startcleanupThreads(), Task.taskcleanup(). And I can retain all attempt folders, jars folder, work folder and job.xml.
The problem is the output folder in reduce attempt folders are always empty when the job finishes. That folder is supposed to contain all map outputs pulled by reduce task. I dig into the source code and found the problem is from the execution of the reduce function. In ReduceTask.runOldReducer(), there's a while loop going thru all keys in ReduceValuesIterator and execute the reduce function I defined. If I comment this loop out, map output files in the output folder will stay. Otherwise they would be deleted...
It seems weird here. I have no idea how this folder is cleaned while reduce is running rather than in clean up phase. And I couldn't find any code referring to this. Anyone has better idea on this?
Cheers,
Elton
Tags: cleanup, map, output
-
▶ Reply to This