Hadoop Professionals

A Community for Hadoop Users

Scenario:

I have one subset of database and one dataware house. I have bring this both things on HDFS. I want to analyse the result based on subset and datawarehouse. (In short, for one record in subset I have to scan each and every record in dataware house).

Question:

I want to do this task using Map-Reduce algo. I am not getting that how to take both files as a input in mapper and also how to handle both files in map phase of map-reduce. Pls suggest me some idea so that I can able to perform it?

Views: 45

Tags: Hadoop, MapReduce

Comment

You need to be a member of Hadoop Professionals to add comments!

Join Hadoop Professionals

Comment by Bhavesh Shah on January 5, 2012 at 8:52pm

Hello Jason,

I have tried in the same way but it is taking more time to process.....

Comment by Jason Venner on January 5, 2012 at 5:53pm

If I understand you, you have to data sets, A & B, and for each record of A, you have to operate on every record of B.

The simplest way would be to use A as the input data set for your map reduce job, and to open and scan through B be in side of your Map task, once for each input record.

Depending on the size of B and the resources available to you caching B locally will help improve throughput.




Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service