Hadoop Professionals

A Community for Hadoop Users

Hi,

 

I'm trying to move my legacy data procfessing code to hadoop. My issue is the legacy code relies on local file system - it both reads and writes meta data. When the code access local data it typically uses relative path, like this: "meta-dir/group/my-meta.xml". From the O'Reilly book (Tom White), I'm thinking of using distributed cache to copy the local files to the task nodes. For example I could zip the entire meta data dirrectory tree and use

 

-archives mymeta.zip

 

My question is: How do I make hadoop to keep the path info, so when the legacy code accesses local file:

 

a/relative/path/to/my/file.xml

 

hadoop can still find the file from (I assume) the HDFS?

 

Many thanks in advance,

Tags: cache, distributed

Views: 18

Reply to This

Replies to This Discussion

If you pass -archives mymeta.zip
there will be a symbolic link in the current working directory for the map or reduce task mymeta.zip, which points to the directory that the archive was unpacked in.
so if you use ./mymeta.zip/path_in_archive/file.xml

it should work.

You may have to play around with the paths a little bit
So that means I'd need to modify the legacy code, i.e., change the hard coded:

"a/relative/path/to/my/file.xml"

to:

"./mymeta.zip/a/relative/path/to/my/file.xml"

Is there a way at all to NOT change the legacy code?
You can build your own symbolic link by running a command from java, you just need to verify where the data is unpacked, and then build a link to it.

A quick search turned up the following page for sample java code for you: http://www.giannistsakiris.com/index.php/2009/01/04/creating-symbol...

Reply to Discussion

RSS




Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service