Hadoop Professionals

A Community for Hadoop Users

I want to know for a job If I have know the datasize, how many reducers do I need to set? How to count the suitable value of reducers number in my program? Is there any formulation of it? Thanks!

Reply to This

Replies to This Discussion

Generally speaking the number of reducers you choose is dependent on what your are going to do with the final output, the reduce capacity of your cluster, the amount of data needing to be reduced, and the time needed to perform the reduce.

For me I usually either set my reducers to the reduce capacity of my cluster, unless my cluster is very large, or I need a very specific number of output files, the usual case of a specific number being 1.

At the current time there are only rough guidelines as peoples hardware and data flows vary so substantially.

Reply to This

Reply to This

RSS

Groups

© 2010   Created by Jason Venner.   Powered by .

Badges  |  Report an Issue  |  Terms of Service

Sign in to chat!