Hadoop Professionals

A Community for Hadoop Users

How Hadoop MR delete maps output files in reduce's local disk when clean up?

Hello all,

I am going thru source code of mapreduce part. For experiment purpose, I try to retain the tmp directories created on node's local file system when a mapreduce job is running, i.e. "map.local.dir" + mapred/local/tasktracker/jobcache/job_xxxxxxxx/ .

So I commented out some functions for cleaning up, like TaskTracker.TaskInProgress.cleanup(), TaskTracker.startcleanupThreads(), Task.taskcleanup(). And I can retain all attempt folders, jars folder, work folder and job.xml.

The problem is the output folder in reduce attempt folders are always empty when the job finishes. That folder is supposed to contain all map outputs pulled by reduce task. I dig into the source code and found the problem is from the execution of the reduce function. In ReduceTask.runOldReducer(), there's a while loop going thru all keys in ReduceValuesIterator and execute the reduce function I defined. If I comment this loop out, map output files in the output folder will stay. Otherwise they would be deleted...

It seems weird here. I have no idea how this folder is cleaned while reduce is running rather than in clean up phase. And I couldn't find any code referring to this. Anyone has better idea on this?

Cheers,
Elton


Tags: cleanup, map, output

Views: 59

Reply to This

Replies to This Discussion

PS: I have tried to set "keep.task.files.pattern" to "attemp_*", in order to keep all the attemp folders in local/tasktracker/jobcache folder. Still the same ....

I am having a similar problem.  I'd like to simply keep the output of map tasks during a MapReduce job instead of them being deleted.  I want to look at the flow of data coming into my reducer.  I've tried "keep.task.files.pattern"=".*" set in mapred-site.xml just as an attempt to grab everything but it doesn't appear to work.  There should be a simple switch to set to stop the _temporary files from being deleted but I can't find it.  Any help if you've solved this problem would be great, thanks.

Reply to Discussion

RSS




Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service