Hadoop Professionals

A Community for Hadoop Users

All Blog Posts (25)

Louisa Landry Problem with activation

Hi there, I dont know if I am writing in a proper board but I have got a problem with activation, link i receive in email is not working...
http://www.prohadoopbook.com/?09a262b1d40cab9a32dd7416e68, Continue

Added by Louisa Landry on March 23, 2010 at 8:45am — No Comments

Marc Sturlese datanode can not connect to the namenode in a small hadoop cluster

Hey there I have a hadoop cluster build on 2 servers (2 laptops). One node (A)
contains the namenode, a datanode, the jobtraker and a tasktraker.

The other node(B) just has a datanode and a tasktraker.
I set up correctly hdfs with ./start-hdfs.sh

When I try to set up MapReduce with ./start-mapred.sh the TaskTraker of node (B) can not connect to the namenode. The tasktracker log will
keep throwing:


INFO org.apache.hadoop.ipc.Client: Retrying conne… Continue

Added by Marc Sturlese on February 15, 2010 at 7:00am — 3 Comments

Mark Cejas seeking advice on word vectors

Hello all,

Hope all is well in the community. I am inquiring on how to apply hadoop to retrieve information from various blogs, news feeds, etc.. in a particular fashion.

I have identified three groups of word pairs that are valuable to me. I would like to explore the clustering patterns among particular URL's of these particular word pairs in their respective blog spaces, news feeds, etc.

So, given that I have an expec
Continue

Added by Mark Cejas on February 13, 2010 at 10:41am — 2 Comments

Mark Cejas .bashrc file error

Hello all, I hope that the holidays are going well, I finally have my graduate school work behind me and have more time to learn about this wonderful Hadoop tool. I work on a Fedora 11 distribution and upon getting my JAVA_HOME and HADOOP_HOME paths set, I started to encouter the following error. The error is is observed upon establishing root user as follows: [rasaan@rasaan ~]$ su Password: bash: /root/.bashrc: line 9: unexpected EOF while looking for matching `)' bash: /root/.bashrc: line 14… Continue

Added by Mark Cejas on December 31, 2009 at 12:23pm — 1 Comment

Jason Venner I am giving a talk at the HUG on Wed, scaling search with hadoop, katta and solr

Jason Rutherglen will be providing the in depth lucene/solr pieces. Hope to see you there. Continue

Added by Jason Venner on November 17, 2009 at 12:57pm — No Comments

dekel tankel Hadoop Bay Area User Group - November 18th at Yahoo!

Hi Hadoopers You are welcome to join us for the next bay area hadoop user groups at the Yahoo! Sunnyvale Campus - Wed, Nov 18th at 6PM. We have some interesting talks planed: *Katta, Solr, Lucene and Hadoop - Searching at scale, Jason Rutherglen and Jason Venner *Walking through the New File system API, Sanjay Radia, Yahoo! * Keep your data in Jute but still use it in python, Paul Tarjan, Yahoo! Please RSVP here: http://www.meetup.com/hadoop/calendar/11724002/ see you there Dekel Continue

Added by dekel tankel on November 9, 2009 at 9:29am — No Comments

Jason Venner Thanks to Stephane for a fun Katta Meetup last night.

There were good discussions on Katta, Solr machine learning and general machine performance Continue

Added by Jason Venner on September 30, 2009 at 7:29am — No Comments

Jason Venner Cloudera folds Hbase into their 0.20 hadoop distribution

Per Michael Stack, Our Andrew Purtell working with Chad Metcalf over at Cloudera have added HBase to the CDH2 Cloudera distribution. Andrew has a guest blog over on Cloudera here: http://su.pr/27zIMw St.Ack Enjoy! Continue

Added by Jason Venner on September 29, 2009 at 8:22am — No Comments

wang zhengkui Two requirements on Hadoop

There are two requirements which I want to implement based on Hadoop. But , by now, I do not think that hadoop support them now. I am looking forward to your suggestion how to implement these. Firstly, if I want to let the reducers to fetch more partitions files from map out put, is that ok? For instance, now reducer one can fetch all the partition 1 from mappers, how I implement that reducer one can fetch all the partition 1 and also 2 to go to reducer 1? If can , How could I implement that?… Continue

Added by wang zhengkui on September 16, 2009 at 7:04am — No Comments

Jason Venner Scripts that are missing from the source code bundle

I somehow missed including the Perl scripts for the aggregate streaming in chapter 8 and various shell scripts from earlier chapters. I have attached them in scripts.zip Continue

Added by Jason Venner on July 24, 2009 at 6:48am — No Comments

Jason Venner Slow responding this week

I am overly booked and not getting back to people Continue

Added by Jason Venner on July 21, 2009 at 10:58pm — No Comments

Mark Cejas Installing JDK 6 Update 14 with yum

Hi all, I have the Fedora 11 distribution and am having problems installing Java SE development kit. If there is anyone out there who can direct me, I would greatly appreciate it. Thanks, Mark Continue

Added by Mark Cejas on July 18, 2009 at 3:14pm — 8 Comments

Jason Venner A little more detail on how line oriented FileSplits work

This block written by Aaron Kimbal of Cloudera to core-user. Aaron Kimball to core-user Reply Follow up message A FileSplit is merely a description of the boundaries. e.g., "bytes 0 to 9999" and "bytes 10000 to 19999". The Mapper then interprets the boundaries described by a FileSplit in a way that makes sense at the data level. The FileSplit does not actually physically contain the data to be mapped over. So mapper 1 will open a file via the InputFormat and start reading at byte 0, andContinue

Added by Jason Venner on June 11, 2009 at 7:00am — No Comments

Jason Venner Running Hadoop and Friends under windows.

This is from a post on the hbase mailing list: The key piece is that you must have a cygwin installation on the machine, and include the cygwin installation's bin directory in your windows system PATH environment variable. (Control Panel|System|Advanced|Environment Variables|System variables|Path There is always a constant confusion between the paths on the windows side (as seen by the jvm) and by the paths seen by the hadoop scripts through cygwin. You have to run the hadoop scripts from the… Continue

Added by Jason Venner on June 11, 2009 at 6:57am — No Comments

Jason Venner A Streaming Question from the Hadoop Core list.

A user is running into an interesting problem. hadoop jar ../contrib/streaming/hadoop-0.19.1-streaming.jar -mapper "/usr/bin/perl /home/hadoop/scripts/map_parse_log_r2.pl" -reducer "/usr/bin/perl /home/hadoop/scripts/reduce_parse_log.pl" -input /logs/*.log -output test9 The code I have works when given a small set of input files. However, I get the following error when attempting to run the code on a large set of input files: hadoop-hadoop-jobtracker-testdw0b00.log.2009-06-09:2009-06-09 15:43… Continue

Added by Jason Venner on June 10, 2009 at 11:14pm — No Comments

Jason Venner I had a wonderful time at the Hadoop Summit

I hung out with some friends and made some new ones. I ran into my old buddys Htin and Sagar from Attributor.com, it was really good to see them. I also got to hang with a bunch friends, from Cloudera folk, Ted Dunning of DeepDyve, Michael Stack of Powerset, and some friends from Ning. A very good time was had. I listened to a number of fun presentations, and passed out lots of flyers for my book. There were over 700 people at the summit, including people form overseas. This is a big jump fr… Continue

Added by Jason Venner on June 10, 2009 at 11:10pm — No Comments

Jason Venner Ensuring that your reduces are distributed across all of your machines

A user asked me a question today, he has a cluster with 16 reduce slots over a number of machines, and when he runs a reduce with 12 reduces, multiple reduces end up on single machines, and some machines are idle. At present the only way to work around this that I am aware of is to force the cluster level parameter mapred.tasktracker.reduce.tasks.maximum to 1, and restart the cluster. Continue

Added by Jason Venner on May 26, 2009 at 8:34pm — No Comments

Jason Venner Setting the default file system or job tracker for a hadoop job.

You have 3 basic ways to set the defaults. you can set them in your -site.xml file (hadoop-site.xml, or mapred-site.xml and hdfs-site.xml file for 20+). you can set them on the command line if you are using the bin/hadoop script of your distribution and running your jobs via the jar command, -fs file_system_url or -jt jobtracker_host:port will override whatever the default is, when placed on the command line after the jar and before the actual jar_file. Finally you can set them on the configu… Continue

Added by Jason Venner on May 25, 2009 at 12:55pm — No Comments

Jason Venner What is coming in Hadoop 20.

Cloudera has posted a very nice summary, http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/ The big user visible change is the addition of the new API's for MapReduce where a context object is passed in to the configure, map/reduce and close that contains the jobconf equivalent, the output collector and the reporter and any other data people pack in. There are some issues with this at present so it is not ready for prime time. The preparation for the split of MapRe… Continue

Added by Jason Venner on May 22, 2009 at 8:30pm — No Comments

Groups

© 2010   Created by Jason Venner.   Powered by .

Badges  |  Report an Issue  |  Terms of Service

Sign in to chat!