A user asked me a question today,
he has a cluster with 16 reduce slots over a number of machines, and when he runs a reduce with 12 reduces, multiple reduces end up on single machines, and some machines are idle.
At present the only way to work around this that I am aware of is to force the cluster level parameter mapred.tasktracker.reduce.tasks.maximum to 1, and restart the cluster.
You need to be a member of Hadoop Professionals to add comments!
Join Hadoop Professionals