Hadoop Professionals

A Community for Hadoop Users

Elton Tian
  • Blog Posts
  • Discussions (8)
  • Events
  • Groups
  • Photos
  • Photo Albums

Elton Tian's Discussions

Set LOG.Info in FileInputFormat.java doesn't work!?
1 Reply

Hello everyone,problem here. I am trying to get some runtime info in FileInputFormat.java and I try to use LOG.info(). However it never seems write anything in log file. I tried to grep the logs dir…Continue

Started this discussion. Last reply by Elton Tian Jun 15, 2010.

How Hadoop MR delete maps output files in reduce's local disk when clean up?
1 Reply

Hello all,I am going thru source code of mapreduce part. For experiment purpose, I try to retain the tmp directories created on node's local file system when a mapreduce job is running, i.e.…Continue

Tags: output, map, cleanup

Started this discussion. Last reply by Elton Tian Jun 2, 2010.

What it's gonna happend when it comes to large number of maps?
2 Replies

Hello everyone, I know when map func generates intermediate output, reduce func will pull data directly from all maps' local disk. Although we can usecombiner func to minimize the amount of data,…Continue

Started this discussion. Last reply by Elton Tian Apr 19, 2010.

Question: One instance of Hadoop can only run one Map Reduce Job at once?

I got a question as title. It just popped up in my head. And I think it's right.Unless, Hadoop instance can start multiple JobTrackers, and there's some resource manager coordinates JobTrackers from…Continue

Started Mar 17, 2010

 

Elton Tian's Page

Gifts Received

Gift

Elton Tian has not received any gifts yet

Give Elton Tian a Gift

Latest Activity

Profile Icon
Elton Tian replied to Elton Tian's discussion 'Set LOG.Info in FileInputFormat.java doesn't work!?'
so... there's no one else run into this before? Is there any other way we can get value of variables at runtime? please advice Cheers
Jun 15, 2010
Profile Icon

Set LOG.Info in FileInputFormat.java doesn't work!?

Hello everyone,problem here. I am trying to get some runtime info in FileInputFormat.java and I try to use LOG.info(). However it never seems write anything in log file. I tried to grep the logs dir and could not find the info supposed to be printed out.Any idea on this?Regards,EltonSee More
Discussion posted by Elton Tian Jun 4, 2010
Profile Icon
Elton Tian replied to Elton Tian's discussion 'How Hadoop MR delete maps output files in reduce's local disk when clean up?'
PS: I have tried to set "keep.task.files.pattern" to "attemp_*", in order to keep all the attemp folders in local/tasktracker/jobcache folder. Still the same ....
Jun 2, 2010
Profile Icon

How Hadoop MR delete maps output files in reduce's local disk when clean up?

Hello all,I am going thru source code of mapreduce part. For experiment purpose, I try to retain the tmp directories created on node's local file system when a mapreduce job is running, i.e. "map.local.dir" + mapred/local/tasktracker/jobcache/job_xxxxxxxx/ . So I commented out some functions for cleaning up, like TaskTracker.TaskInProgress.cleanup(), TaskTracker.startcleanupThreads(), Task.taskcleanup(). And I can retain all attempt folders, jars folder, work folder and job.xml. The problem is…See More
Discussion posted by Elton Tian Jun 1, 2010
Profile Icon
Elton Tian replied to Elton Tian's discussion 'What it's gonna happend when it comes to large number of maps?'
Thanks for reply Jason, Hmmm... so I think that point worths some optimization. Maybe the combiner can be extended a bit to, say, rack level, so intermediate output produced from nodes on the same rack can be merged and stored (somewhere?) before…
Apr 19, 2010
Profile Icon
Jason Venner replied to Elton Tian's discussion 'What it's gonna happend when it comes to large number of maps?'
I regularly run jobs with 20k map tasks. The shuffle can take quite a while, and if the jobs pass a lot of data to the reduce phase, load down the networking layer substantially. It does just work though.
Apr 16, 2010
Profile Icon

What it's gonna happend when it comes to large number of maps?

Hello everyone, I know when map func generates intermediate output, reduce func will pull data directly from all maps' local disk. Although we can usecombiner func to minimize the amount of data, when we have many mappers,say 10,000, that will be a crazy IO headache. And that dosen't seemright.Can anyone highlighten me on this?Regards,EltonSee More
Discussion posted by Elton Tian Apr 16, 2010
Profile Icon

Question: One instance of Hadoop can only run one Map Reduce Job at once?

I got a question as title. It just popped up in my head. And I think it's right.Unless, Hadoop instance can start multiple JobTrackers, and there's some resource manager coordinates JobTrackers from running into each other.Please correct me if I am wrong.Cheers,Elton See More
Discussion posted by Elton Tian Mar 17, 2010
Profile Icon

When should I jump on HBase rather than RDBMS?

Hello everyone,I read through some literature and end up with some ideas on HBase and RDBMS. Please correct me if I am wrong:* Use HBase if the application is going to handle large datasets, like Petabytes. That means when scalability is a big concern; Also, because HDFS replicates data autoamtically, we have reliability;* Correspondingly, we can just use RDBMS when the dataset is not huge enough to worry about scalability often. Because anyway, RDBMS has more functionalities we can take…See More
Discussion posted by Elton Tian Mar 16, 2010
Profile Icon
Elton Tian is now a member of Hadoop Professionals Mar 16, 2010

Profile Information

Hadoop Experience Level
Beginner
Interests
Distributed Computing, Parallel Computing, Cloud Computing
Available for Consulting
Yes

Comment Wall

You need to be a member of Hadoop Professionals to add comments!

Join Hadoop Professionals

  • No comments yet!
 
 
 



Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service