A Community for Hadoop Users
Scenario:
I have one subset of database and one dataware house. I have bring this both things on HDFS. I want to analyse the result based on subset and datawarehouse. (In short, for one record in subset I have to scan each and every record in dataware house).
Question:
I want to do this task using Map-Reduce algo. I am not getting that how to take both files as a input in mapper and also how to handle both files in map phase of map-reduce. Pls suggest me some idea so that I can able to perform it?
Comment
Comment by Bhavesh Shah on January 5, 2012 at 8:52pm Hello Jason,
I have tried in the same way but it is taking more time to process.....
Comment by Jason Venner on January 5, 2012 at 5:53pm If I understand you, you have to data sets, A & B, and for each record of A, you have to operate on every record of B.
The simplest way would be to use A as the input data set for your map reduce job, and to open and scan through B be in side of your Map task, once for each input record.
Depending on the size of B and the resources available to you caching B locally will help improve throughput.
6 members
4 members
11 members
1 member
9 members
© 2012 Created by Jason Venner.
You need to be a member of Hadoop Professionals to add comments!
Join Hadoop Professionals