Hadoop Professionals

A Community for Hadoop Users

I set hadoop cluster follow chapter3-4 in Pro Hadoop. I use 4 machines; 1 master and 3 slaves , Ubuntu os , hadoop 0.19.2. I use command bin/hadoop namenode -format and bin/start-all.sh and jps will show namenode datanode jobtracker tasktracker secondarynamenode and jps . And in hadoop-site.xml I copy from book. But I running wordcount it cannot to run. I don't understand to confige file .xml. I need to configuration hadoop cluster step by step, please.

Thank you.

Views: 137

Reply to This

Replies to This Discussion

my very first question is:
as the user that the hadoop server processes run, on the machine that runs the namenode/jobtracker
can you ssh to each of your datanode/task tracker machines without needing to enter a password.
Yes I can set ssh to access each machines and I use command bin/hadoop namenode -format and bin/start-all.sh at Master node and use command jps it show Jps Namenode Jobtracker and SecondaryNamenode. And I use http:masternode:50030 it show State RUNNING at table Cluster Summary in Nodes show 3 , Map Task Capacity 3, Reduce Task Capacity 9 ,Avg.Tasks/Node 4.00 ( I use 1 master and 3 slaves) and http:masternode:50030 in Cluster Summarly in Live Nodes show 0. And I access each slaves after use command bin/start-all.sh and use command jps will show Jps and TaskTracker but no DataNodes. You are hope man because I use hadoop for analysis log file of fire wall this is Computer Project and I will graduation in this term about Febuary. And I interest hadoop for develope application. Help me please. Thank you.
What is the error when you run the word count example?
Now I'm can run word count on hadoop cluster successful but used long time. But run word count on single- node is less than.Single node used 14 seconds and Multinode used 2 minutes 17 seconds. (In file .txt about 38,000 words)
Did you have many task failures?

Saranyu Netmanee said:
Now I'm can run word count on hadoop cluster successful but used long time. But run word count on single- node is less than.Single node used 14 seconds and Multinode used 2 minutes 17 seconds. (In file .txt about 38,000 words)
No task failures. After run wordcount success will show in table Completes.
I'am sorry. Have error when wordcount file size about 800 MB. But run wordcount file 1 MB no problems. I attach error result. And please check configuration in folder conf.
Attachments:
Now I try run word count on single node file 800 MB use 5.3 Minutes but run on cluster ( 1 master and 1 slave ) use 9.18 minutes. I want to known how to develop application for run on hadoop cluster. And I want to known WordCount example can run on hadoop cluster give time less than single node?
It appears that the reduce tasks are unable to fetch the map outputs.

10/01/07 14:39:29 WARN mapred.JobClient: Error reading task outputSlaves-02
10/01/07 14:39:29 WARN mapred.JobClient: Error reading task outputSlaves-02
10/01/07 14:39:29 INFO mapred.JobClient: Task Id : attempt_201001071329_0004_m_000003_0, Status : FAILED
Too many fetch-failures


There may be a firewall rule in place that prevents connections on ports in the 50000 range?

Saranyu Netmanee said:
I'am sorry. Have error when wordcount file size about 800 MB. But run wordcount file 1 MB no problems. I attach error result. And please check configuration in folder conf.
I'll check firewall again. And file in folder me configure correct or incorrect ? I thought wordcount program is good for run on single node because had low complexity then use time for run less than. But when run on multi node then use very ? And how to implementation MapReduce program in hadoop for high complexity and what tool use for development ?
The other reason this can happen is if the hostname to ip address translation for the various task tracker nodes return incorrect ip addresses.
verify that on each task tracker, the lookup of the hostnames of the other tasktrackers actually returns the correct ip address.
Sometimes people end up with an external (outside thefirewall) ip address, or 127.0.0.1, or simply a wrong address.

Saranyu Netmanee said:
I'am sorry. Have error when wordcount file size about 800 MB. But run wordcount file 1 MB no problems. I attach error result. And please check configuration in folder conf.
I have run the wordcount program on single node setup and multiple node setup. The file size was approximately 800 MB. The multinode set up comprises 1 master and 3 slave machines connected to a single L2 switch with 10Mbps links. Each machine has a 3 GHz Pentium 4 CPU with 1 GB RAM.

The time took by the multiple node calculation was slower than the single node by approximately 4 minutes. From your experience, does this seem correct/normal? If this seems incorrect, what should be the main reasons for this.Thank you in advance for your answering.

Reply to Discussion

RSS




Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service