Hadoop Professionals

A Community for Hadoop Users

Jason Venner
  • United States
Share
Share on Twitter
Share on Facebook

Jason Venner's Friends

Jason Venner's Groups

Jason Venner's Discussions

Please post job openings here.

Started Nov. 17, 2009

Katta and Solr
12 Replies

Started this discussion. Last reply by Saju K K May 2.

 

Jason Venner's Page

Gifts Received

Gift

Jason Venner has not received any gifts yet

Give Jason Venner a Gift

Latest Activity

A group for discussion various distributed random access datastores that work well with the hadoop ecosystem tools
on Wednesday
There does not appear to be any simple way to split the pools. I have had similar job constraints - building solr indexes in the reduce step. There may be a way to do this with the scheduling system. The first solution that comes to mind is to run…
July 27
I don't believe there is a way to customize the memory use per task within a job on an individual task tracker using the existing hadoop framework, however I am not that familiar at present with the 20.2. I took a quick scan of http://hadoop.apache.…
July 1
The default reducer is the Identity Reducer, which simply orders the output of your Map phase. Is it possible you have a miss configuration and the Reducer class you have specified is not being engaged. You could put some log messages into your red…
July 1
Your choices are to: * pass it via the job conf - great for small things * pass it via the distributed config serivces - can take a little setup and slow job start down, for large things * embed it in one of your jars - decent for things that don't…
June 28
If you are getting an empty output file, your final step (map or reduce) is not producing any output. The Map output records=0 line in your output clearly indicates that the map task is not producing any output records. It is clear that 34 records w…
June 22
I haven't tried the code with the newer versions of Katta. Due to changing personal requirements I haven't been active with this code in some time. It looks like Thomas Koche updated the katta patches to work with 0.6 on 2010-03-26 09:43 AM. check…
May 20
Jason Venner updated their profile
May 18
The map side join code in hadoop is merge join. You may wish to check the pig or hive code, for hash join support.
May 16
I don't speak nutch, but perhaps, nutch crawl url -dir crawl2 -depth 2, might work.
May 15
When I am using hadoop as a web crawler, I use the multi-threaded mapper, http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html As each individual web get takes some real wall clock time while…
May 15
There are a number of articles on using zookeeper as a distributed queue manager: http://www.cloudera.com/blog/2009/05/building-a-distributed-concurrent-queue-with-apache-zookeeper/ 100,000 per day is pretty trivial with zookeeper. If the individu…
May 15
Tuning the reduce phase on a cluster is not a trivial problem. The common case is that the reduce phase is primarily disk IO bound, and you run roughly one reduce per seek arm on a machine. If the disk io is not the bounding point, you have a coup…
May 15
Jason Venner added 2 events
May 13
A group for discussion various distributed random access datastores that work well with the hadoop ecosystem tools
May 7
A unique fetch fails, when a reduce task is unable to retrieve it's specific partition of a map tasks' output. The job tracker gives the reduce task the list of map tasks, including the slave name that the map ran on. The reduce task, will, for eac…
May 5

Profile Information

Hadoop Experience Level
Expert
Interests
Science Fiction, Spirituality, Aviation, Physics, Biology
Expertise
Hadoop, Java, Linux, Performance Tuning, Scaling, Architecture
Past Projects
Web scale media crawling,fingerprinting and matching.
Current Project
Search Performance & Stability at Scale
Available for Consulting
No
Your Website
http://www.brokerage.com
Search Expertise
Intermediate
HBase Expertise
Novice
Machine Learning Expertise
Novice

Jason Venner's Blog

Jason Venner

I am giving a talk at the HUG on Wed, scaling search with hadoop, katta and solr

Jason Rutherglen will be providing the in depth lucene/solr pieces.

Hope to see you there.

Posted on November 17, 2009 at 12:57pm —

Jason Venner

Thanks to Stephane for a fun Katta Meetup last night.

There were good discussions on Katta, Solr machine learning and general machine performance

Posted on September 30, 2009 at 7:29am —

Jason Venner

Cloudera folds Hbase into their 0.20 hadoop distribution

Per Michael Stack,

Our Andrew Purtell working with Chad Metcalf over at Cloudera have added HBase to the CDH2 Cloudera distribution. Andrew has a guest blog over on Cloudera here: http://su.pr/27zIMw St.Ack

Enjoy!

Posted on September 29, 2009 at 8:22am —

Jason Venner

Scripts that are missing from the source code bundle

I somehow missed including the Perl scripts for the aggregate streaming in chapter 8 and various shell scripts from earlier chapters.
I have attached them in scripts.zip

Posted on July 24, 2009 at 6:48am —

Jason Venner

Slow responding this week

I am overly booked and not getting back to people

Posted on July 21, 2009 at 10:58pm —

Comment Wall (20 comments)

You need to be a member of Hadoop Professionals to add comments!

Join Hadoop Professionals

At 10:16am on January 26, 2010, G Sondeep said…
Thankyou So Much Jason
I really appreciate your response
At 10:09am on January 26, 2010, Jason Venner said…
Not looking Sondeep
At 10:08am on January 26, 2010, G Sondeep said…
Please accept my apologies in case you feel that you have received this message in error

Thankyou
At 10:06am on January 26, 2010, G Sondeep said…
Hi Jason

GoodDay!

I am Sandeep from a staffing company would like to speak to you regarding a Job opportunity with my direct client

Sandeep
510-493-2104X625
At 12:29pm on December 13, 2009, sheeraz mughal said…
Hi,
Thank you very much for your reply and its was really informative. I have been given a responsiblity to head a research group in one of the leading Universities in Pakistan to create and head a research group on behalf of universities funding. I am very much interested in Hadoop and related products so i was thinking to do research work in Hadoop and related technologies like Hive map reduce etc.

so in the regard i request if you could give me some solid ideas or directions towards any problem areas in hadoop or releated tech so that i can jump into it and start the research. I thank you very much and it would be an honour for me if you leave something for me as early as possible.
At 10:50pm on December 7, 2009, Wade Xiao said…
thank you~ I'm a student and just doing some research on Hadoop. Currently I'm interested in the storage of MapReduce applications, including HDFS and HBase.
At 10:03am on November 13, 2009, sheeraz mughal said…
Hi,
Can we encrypt the data file in HDFS with any triple DES compatible algorithm or any else and then decrypt the input while map methods before passing it onwards for any business logic and then further to reducer??? I am like beginner in Hadoop technology so if my question sounds stupid ;) then please do comment and correct me and guide me about any Security model hadoop is having???
Thanks
sheeraz
At 9:54pm on November 8, 2009, sheeraz mughal said…
hi,
To all Hadoop professionals and others i am working in a organization having world's largest Biometric Database and by keeping this fact in mind kindly let me know what can be built on top of it using hadoop where hadoop's core significance could shine as compare to other technologies. I request you all to please suggest any idea as i have to submit my research proposal in few days. Thanks a lot
sheeraz
At 11:30pm on October 5, 2009, Stefan Groschupf said…
Thanks! ... sure I will. :)
At 8:33pm on September 15, 2009, wang zhengkui said…
After reading your book, I generate some questions which I want to know. Firstly, if I want to let the reducers to fetch more partitions files from map out put, is that ok? For instance, now reducer one can fetch all the partition 1 from mappers, how I implement that reducer one can fetch all the partition 1 and also 2 to go to reducer 1? If can , How could I implement that?
Secondly, in the map phase, one recorder can only be written into one partition file according to the partitioner function. If I want to write one recorder to multi-partition files, how can I do that? For example, there are M reducers and there should be M partition files in map phase. Now one recorder can only be output to one of M partition files. If I want to output one recorder to multi-partition files, is there any way to do this?
 
 
 

Groups

© 2010   Created by Jason Venner.   Powered by .

Badges  |  Report an Issue  |  Terms of Service

Sign in to chat!