Hey there I have a hadoop cluster build on 2 servers (2 laptops). One node (A)
contains the namenode, a datanode, the jobtraker and a tasktraker.
The other node(B) just has a datanode and a tasktraker.
I set up correctly hdfs with ./start-hdfs.sh
When I try to set up MapReduce with ./start-mapred.sh the
TaskTraker of node (B) can not connect to the namenode. The tasktracker log will
keep throwing:
INFO org.apache.hadoop.ipc.Client: Retrying conne…
Continue
Added by Marc Sturlese on February 15, 2010 at 7:00am —
3 Comments
Added by nijil on February 14, 2010 at 6:40am —
1 Comment
Hello all,
Hope all is well in the community. I am inquiring on how to apply hadoop to retrieve information from various blogs, news feeds, etc.. in a particular fashion.
I have identified three groups of word pairs that are valuable to me. I would like to explore the clustering patterns among particular URL's of these particular word pairs in their respective blog spaces, news feeds, etc.
So, given that I have an expec
…
Continue
Added by Mark Cejas on February 13, 2010 at 10:41am —
2 Comments
Hello all,
I hope that the holidays are going well,
I finally have my graduate school work behind me and have more time to learn about this wonderful Hadoop tool. I work on a Fedora 11 distribution and upon getting my JAVA_HOME and HADOOP_HOME paths set, I started to encouter the following error. The error is is observed upon establishing root user as follows:
[rasaan@rasaan ~]$ su
Password:
bash: /root/.bashrc: line 9: unexpected EOF while looking for matching `)'
bash: /root/.bashrc: line 14…
Continue
Added by Mark Cejas on December 31, 2009 at 12:23pm —
1 Comment
Jason Rutherglen will be providing the in depth lucene/solr pieces.
Hope to see you there.
Continue
Added by Jason Venner on November 17, 2009 at 12:57pm —
No Comments
Hi Hadoopers
You are welcome to join us for the next bay area hadoop user groups at the Yahoo! Sunnyvale Campus - Wed, Nov 18th at 6PM.
We have some interesting talks planed:
*Katta, Solr, Lucene and Hadoop - Searching at scale, Jason Rutherglen and Jason Venner
*Walking through the New File system API, Sanjay Radia, Yahoo!
* Keep your data in Jute but still use it in python, Paul Tarjan, Yahoo!
Please RSVP here:
http://www.meetup.com/hadoop/calendar/11724002/
see you there
Dekel
Continue
Added by dekel tankel on November 9, 2009 at 9:29am —
No Comments
There were good discussions on Katta, Solr machine learning and general machine performance
Continue
Added by Jason Venner on September 30, 2009 at 7:29am —
No Comments
Per Michael Stack,
Our Andrew Purtell working with Chad Metcalf over at Cloudera have added HBase to the CDH2 Cloudera distribution. Andrew has a guest blog over on Cloudera here: http://su.pr/27zIMw St.Ack
Enjoy!
Continue
Added by Jason Venner on September 29, 2009 at 8:22am —
No Comments
There are two requirements which I want to implement based on Hadoop. But , by now, I do not think that hadoop support them now. I am looking forward to your suggestion how to implement these.
Firstly, if I want to let the reducers to fetch more partitions files from map out put, is that ok? For instance, now reducer one can fetch all the partition 1 from mappers, how I implement that reducer one can fetch all the partition 1 and also 2 to go to reducer 1? If can , How could I implement that?…
Continue
Added by wang zhengkui on September 16, 2009 at 7:04am —
No Comments
I somehow missed including the Perl scripts for the aggregate streaming in chapter 8 and various shell scripts from earlier chapters.
I have attached them in scripts.zip
Continue
Added by Jason Venner on July 24, 2009 at 6:48am —
No Comments
I am overly booked and not getting back to people
Continue
Added by Jason Venner on July 21, 2009 at 10:58pm —
No Comments
Hi all,
I have the Fedora 11 distribution and am having problems installing Java SE development kit. If there is anyone out there who can direct me, I would greatly appreciate it.
Thanks, Mark
Continue
Added by Mark Cejas on July 18, 2009 at 3:14pm —
8 Comments
This block written by Aaron Kimbal of Cloudera to core-user.
Aaron Kimball
to core-user
Reply
Follow up message
A FileSplit is merely a description of the boundaries. e.g., "bytes 0 to
9999" and "bytes 10000 to 19999". The Mapper then interprets the boundaries
described by a FileSplit in a way that makes sense at the data level. The
FileSplit does not actually physically contain the data to be mapped over.
So mapper 1 will open a file via the InputFormat and start reading at byte
0, and…
Continue
Added by Jason Venner on June 11, 2009 at 7:00am —
No Comments
This is from a post on the hbase mailing list:
The key piece is that you must have a cygwin installation on the machine, and include the cygwin installation's bin directory in your windows system PATH environment variable. (Control Panel|System|Advanced|Environment Variables|System variables|Path
There is always a constant confusion between the paths on the windows side (as seen by the jvm) and by the paths seen by the hadoop scripts through cygwin.
You have to run the hadoop scripts from the…
Continue
Added by Jason Venner on June 11, 2009 at 6:57am —
No Comments
A user is running into an interesting problem.
hadoop jar ../contrib/streaming/hadoop-0.19.1-streaming.jar -mapper "/usr/bin/perl /home/hadoop/scripts/map_parse_log_r2.pl" -reducer "/usr/bin/perl /home/hadoop/scripts/reduce_parse_log.pl" -input /logs/*.log -output test9
The code I have works when given a small set of input files. However, I get the following error when attempting to run the code on a large set of input files:
hadoop-hadoop-jobtracker-testdw0b00.log.2009-06-09:2009-06-09 15:43…
Continue
Added by Jason Venner on June 10, 2009 at 11:14pm —
No Comments
I hung out with some friends and made some new ones.
I ran into my old buddys Htin and Sagar from Attributor.com, it was really good to see them.
I also got to hang with a bunch friends, from Cloudera folk, Ted Dunning of DeepDyve, Michael Stack of Powerset, and some friends from Ning. A very good time was had.
I listened to a number of fun presentations, and passed out lots of flyers for my book.
There were over 700 people at the summit, including people form overseas. This is a big jump fr…
Continue
Added by Jason Venner on June 10, 2009 at 11:10pm —
No Comments
A user asked me a question today,
he has a cluster with 16 reduce slots over a number of machines, and when he runs a reduce with 12 reduces, multiple reduces end up on single machines, and some machines are idle.
At present the only way to work around this that I am aware of is to force the cluster level parameter mapred.tasktracker.reduce.tasks.maximum to 1, and restart the cluster.
Continue
Added by Jason Venner on May 26, 2009 at 8:34pm —
No Comments
You have 3 basic ways to set the defaults.
you can set them in your -site.xml file (hadoop-site.xml, or mapred-site.xml and hdfs-site.xml file for 20+).
you can set them on the command line
if you are using the bin/hadoop script of your distribution and running your jobs via the jar command,
-fs file_system_url or -jt jobtracker_host:port will override whatever the default is, when placed on the command line after the jar and before the actual jar_file.
Finally you can set them on the configu…
Continue
Added by Jason Venner on May 25, 2009 at 12:55pm —
No Comments
Cloudera has posted a very nice summary,
http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/
The big user visible change is the addition of the new API's for MapReduce where a context object is passed in to the configure, map/reduce and close that contains the jobconf equivalent, the output collector and the reporter and any other data people pack in.
There are some issues with this at present so it is not ready for prime time.
The preparation for the split of MapRe…
Continue
Added by Jason Venner on May 22, 2009 at 8:30pm —
No Comments
This becomes exceptionally clean with chaining.
THe last map in the map chain, takes the builds a value out of the key and the value, such that it can be decomposed later.
The output key is the md5 or other stable hash of the original key.
The reducer, or a map in the reduce chain can decompose the value into the original key and value, and process. The results are in random order, AND items with the same key are actually grouped together. into the same reduce call.
Continue
Added by Jason Venner on May 21, 2009 at 7:52pm —
No Comments