Hadoop Professionals

A Community for Hadoop Users

Jason,

What is the best practice using Hadoop to migrate data from one Oracle remote database to an Oracle local database assuming wre have dblink to the remote database.

I saw examples to read data from a database. But, I feel we should be able to read the data from the remote database, prcosess it in the map and write it down to the local database in the reduce job and this should be done using one map reduce program. Can I have a sample program for this scenario if you have any?

Thanks for the help in advance.

Regards

Ram

Please do let me know your thoughts.

Regards

Ram

Views: 1

Reply to This

Replies to This Discussion

My small sample of experience using hadoop map tasks to read from a database or to write to a database resulted in the database falling over almost immediately due to excessive loading. On the flip side the db was mysql running on a single 8way linux box.

There have been a series of patches for people using the database input format, to support various databases.

This mail thread has oracle DBInputFormat references: http://markmail.org/message/eomox6uvleyhf62e
This patch has some fixes for oracle in 19: https://issues.apache.org/jira/browse/HADOOP-5616
The cloudera blog has an article on it also: http://www.cloudera.com/blog/tag/dbinputformat/

From my perspective, once the mechanics of oracle are sorted out as listed above, the key is ensuring that the db doesn't get totally hammered and that contention is managed.

The default nature of input splits in hadoop imply that multiple tasks will be beating on sections of one table or join at the same time, and it they are transforming and writing this opens the door for hidous lock contention
Cloudera just announced an opensource package for bulk loading of data from from databases for use by Map Reduce applications. The application is called sqoop, written primarily by aaron kimbal.
http://www.cloudera.com/hadoop-sqoop
Yes Jason. I did attend the sumit. I came to know about it. Thanks for the information.

Regards

Ram

Jason Venner said:
Cloudera just announced an opensource package for bulk loading of data from from databases for use by Map Reduce applications. The application is called sqoop, written primarily by aaron kimbal.
http://www.cloudera.com/hadoop-sqoop

Reply to Discussion

RSS




Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service