Hadoop Professionals

A Community for Hadoop Users

Wang zhengkui's Friends

wang zhengkui's Groups

wang zhengkui's Discussions

How many reducers should I set suitably?
1 Reply

I want to know for a job If I have know the datasize, how many reducers do I need to set? How to count the suitable value of reducers number in my program? Is there any formulation of it? Thanks!

Started this discussion. Last reply by Jason Venner Oct. 30, 2009.

How to disable sort in hadoop
3 Replies

Dear all, If in my application, I do not need the hadoop to sort the intermediate result for me. How can I disable the sort in the application? Because sorting needs time. But actually, I don`t want…

Started this discussion. Last reply by Jason Venner Oct. 9, 2009.

Two requirements for Hadoop
2 Replies

There are two requirements which I want to implement based on Hadoop. But , by now, I do not think that hadoop support them now. I am looking forward to your suggestion how to implement these. First…

Started this discussion. Last reply by amogh vasekar Sep. 23, 2009.

 

wang zhengkui's Page

Gifts Received

Gift

wang zhengkui has not received any gifts yet

Give wang zhengkui a Gift

Latest Activity

A group for HBase users to share use cases, solutions and problems.
February 3
Generally speaking the number of reducers you choose is dependent on what your are going to do with the final output, the reduce capacity of your cluster, the amount of data needing to be reduced, and the time needed to perform the reduce. For me I…
October 29, 2009
wang zhengkui added a discussion
I want to know for a job If I have know the datasize, how many reducers do I need to set? How to count the suitable value of reducers number in my program? Is there any formulation of it? Thanks!
October 22, 2009
If your number of reduce tasks is not 0, the hadoop framework will sort your results. there is no way around it.
October 8, 2009
Thanks Jason. Does this mean, if my numReduceTask doesn`t equal to 0, hadoop must sort the intermediate result? If my reduce number doesn`t equal to 0, is there anyway I do not let it sort my intermediate result?
October 8, 2009
If you set the number of reduce tasks to 0, there will be no sorting. There will also be no reduce phase. In hadoop through 19, the JobConf object provides a method setNumReduceTasks, and the parameter behind it is mapred.reduce.tasks. I do not kno…
October 8, 2009
wang zhengkui added a discussion
Dear all, If in my application, I do not need the hadoop to sort the intermediate result for me. How can I disable the sort in the application? Because sorting needs time. But actually, I don`t want it to be sorted. Thanks!
October 8, 2009
October 5, 2009
To write into multiple partitions, please look at pig's skewed join implementation of partitioner. I believe they do something pretty similar. However, .20onwards reducers will have to be set. Hence, it might break your implementation. Coming back…
September 23, 2009
What you can do is, in your mapper open additional files than are input, which you may output anywhere. As an alternative you could write all of your map outputs via a MultipleFileOutput format in the map task, and only output the filenames to the…
September 21, 2009
wang zhengkui added a blog post
There are two requirements which I want to implement based on Hadoop. But , by now, I do not think that hadoop support them now. I am looking forward to your suggestion how to implement these. Firstly, if I want to let the reducers to fetch more pa…
September 16, 2009
wang zhengkui added a discussion
There are two requirements which I want to implement based on Hadoop. But , by now, I do not think that hadoop support them now. I am looking forward to your suggestion how to implement these. Firstly, if I want to let the reducers to fetch more pa…
September 16, 2009
Jason Venner and wang zhengkui are now friends
September 16, 2009
wang zhengkui updated their profile photo
September 15, 2009
September 15, 2009
wang zhengkui is now a member of Hadoop Professionals
September 15, 2009

Profile Information

Hadoop Experience Level
Intermediate
Available for Consulting
Yes

Wang zhengkui's Blog

wang zhengkui

Two requirements on Hadoop

There are two requirements which I want to implement based on Hadoop. But , by now, I do not think that hadoop support them now. I am looking forward to your suggestion how to implement these.

Firstly, if I want to let the reducers to fetch more partitions files from map out put, is that ok? For instance, now reducer one can fetch all the partition 1 from mappers, how I implement that reducer one can fetch all the partition 1 and also 2 to go to reducer 1? If can , How could I implement that?… Continue

Posted on September 16, 2009 at 7:04am —

Comment Wall (1 comment)

You need to be a member of Hadoop Professionals to add comments!

Join Hadoop Professionals

At 11:50pm on October 5, 2009, Jason Venner said…
Is there a reason you want to get multiple map outputs to a single reduce task?
Do you want the data to be fully sorted and grouped by key?

The simplest way is to change the partitioner class so that you get all of the data you want in one single map output.

There is nothing stopping you from creating multiple output files in hdfs or other shared file system, in your map tasks, and passing the names of these files to your reducer via the output collect, or some other mechanism.

You would loose out on the framework handling the sorting for you.

As another alternative that is somewhat io expensive, is to have 2 map/reduce jobs, one of which has only a map output,
the other only a reduce, where the parttioner assigns the reduce task based on the output file form the previous job, such that you get all of the outputs you want in each of your reduces.

The downside of this is that the output data comes from hdfs through the map task back to hdfs
then back into the identity mapper, through the local disk, then http to the reducers.
so you have two extra passes through hdfs and an extra pass through the identitymapper.
 
 
 

© 2010   Created by Jason Venner on Ning.   Create a Ning Network!

Badges  |  Report an Issue  |  Privacy  |  Terms of Service

Sign in to chat!