Hadoop Professionals

A Community for Hadoop Users

Jason Venner
  • United States
Share 
Twitter
Facebook

Jason Venner's Friends

Jason Venner's Groups

Jason Venner's Discussions

Please post job openings here.

Started Nov. 17, 2009

Katta and Solr
11 Replies

Started this discussion. Last reply by Jason Venner Oct. 2, 2009.

 

Jason Venner's Page

Gifts Received

Gift

Jason Venner has not received any gifts yet

Give Jason Venner a Gift

Latest Activity

You can build your own symbolic link by running a command from java, you just need to verify where the data is unpacked, and then build a link to it. A quick search turned up the following page for sample java code for you: http://www.giannistsakir
10 hours ago
If your data is highly relational, your users will have a simpler time accessing it if it is stored in a more traditional data warehouse. The sizes you are talking about are very small, I have some of the higher end solid state devices for storage,…
10 hours ago
If you pass -archives mymeta.zip there will be a symbolic link in the current working directory for the map or reduce task mymeta.zip, which points to the directory that the archive was unpacked in. so if you use ./mymeta.zip/path_in_archive/file.xm…
on Sunday
In the code bundle for the book ProHadoop is a full eclipse environment for running with either hadoop 18.3, 19.0 or 19.1. At present I typically use that and maven for building my production code
on Sunday
There are a number of areas that hadoop could use some help in. The number one tool would be a setup verification tool, that actually launches tasks on all of the cluster machines and verifies that all of the required communication paths are open a…
February 21
Jason Venner added a group
A group for users of Karmasphere to share tips and help
February 20
My inclination is to thing this is a PATH environment variable issue or an issue with the command string you are trying to execute. When I have to debug this class of problem, I pass a simple shell script via the distributed cache -file test.sh whi…
February 20
I doubt that the hadoop script is in your path at task execution time. You can check the value of the PATH environment variable. You may also try giving an explicit path starting from /. In addition most of the commands you are trying to invoke via…
February 19
Hadoop runs quite happily on top of any shared file system, if the file system provides information about the locality of data, hadoop will attempt to schedule tasks close to the source of the data. I have personally run hadoop on top of a lustre f…
February 19
Can you verify that the Namenode, and Datanode server processes are running.
February 15
I don't really understand your question.
February 15
For webcrawling for your data collection, perhaps nutch or heritrix. The mahout project provides a rich set of tools for clustering. Carrot2 provides some decent visualization tools for small data sets.
February 15
Jason Venner and karthik are now friends
February 10
Jason Venner added an event
Bay Area Hadoop User Group (HUG) February Meetup at Yahoo! Sunnyvale Campus Building E - Classrooms 9 and 10
February 17, 2010 from 6pm to 9pm
Hello Hadoopers RSVPs is open for the February Bay Area Hadoop user group at Yahoo!'s Sunnyvale campus. Agenda: 6:00 - 6:15 - Socializing and Beers 6:15 - 7:00 - LZO Compression and Protocol Buffers: Efficient, Flexible Data Processing with Had…
February 8
Unless you explicitly set it, you will get TextInputFormat for your inputformat, the keys are LongWritable. If you want a text key text writable, job.setInputFormat(KeyValueTextInputFormat.class) in your main. or change the key that your mapper t…
February 7
A group for HBase users to share use cases, solutions and problems.
February 3

Profile Information

Hadoop Experience Level
Expert
Interests
Science Fiction, Spirituality, Aviation, Physics, Biology
Expertise
Hadoop, Java, Linux, Performance Tuning, Scaling, Architecture
Past Projects
Web scale media crawling,fingerprinting and matching.
Current Project
Search Performance & Stability at Scale
Available for Consulting
No
Your Website
http://www.brokerage.com
Search Expertise
Beginner
HBase Expertise
Novice
Machine Learning Expertise
Novice

Jason Venner's Blog

Jason Venner

I am giving a talk at the HUG on Wed, scaling search with hadoop, katta and solr

Jason Rutherglen will be providing the in depth lucene/solr pieces.

Hope to see you there.

Posted on November 17, 2009 at 12:57pm —

Jason Venner

Thanks to Stephane for a fun Katta Meetup last night.

There were good discussions on Katta, Solr machine learning and general machine performance

Posted on September 30, 2009 at 7:29am —

Jason Venner

Cloudera folds Hbase into their 0.20 hadoop distribution

Per Michael Stack,

Our Andrew Purtell working with Chad Metcalf over at Cloudera have added HBase to the CDH2 Cloudera distribution. Andrew has a guest blog over on Cloudera here: http://su.pr/27zIMw St.Ack

Enjoy!

Posted on September 29, 2009 at 8:22am —

Jason Venner

Scripts that are missing from the source code bundle

I somehow missed including the Perl scripts for the aggregate streaming in chapter 8 and various shell scripts from earlier chapters.
I have attached them in scripts.zip

Posted on July 24, 2009 at 6:48am —

Jason Venner

Slow responding this week

I am overly booked and not getting back to people

Posted on July 21, 2009 at 10:58pm —

Comment Wall (20 comments)

You need to be a member of Hadoop Professionals to add comments!

Join Hadoop Professionals

At 10:16am on January 26, 2010, G Sondeep said…
Thankyou So Much Jason
I really appreciate your response
At 10:09am on January 26, 2010, Jason Venner said…
Not looking Sondeep
At 10:08am on January 26, 2010, G Sondeep said…
Please accept my apologies in case you feel that you have received this message in error

Thankyou
At 10:06am on January 26, 2010, G Sondeep said…
Hi Jason

GoodDay!

I am Sandeep from a staffing company would like to speak to you regarding a Job opportunity with my direct client

Sandeep
510-493-2104X625
At 12:29pm on December 13, 2009, sheeraz mughal said…
Hi,
Thank you very much for your reply and its was really informative. I have been given a responsiblity to head a research group in one of the leading Universities in Pakistan to create and head a research group on behalf of universities funding. I am very much interested in Hadoop and related products so i was thinking to do research work in Hadoop and related technologies like Hive map reduce etc.

so in the regard i request if you could give me some solid ideas or directions towards any problem areas in hadoop or releated tech so that i can jump into it and start the research. I thank you very much and it would be an honour for me if you leave something for me as early as possible.
At 10:50pm on December 7, 2009, Wade Xiao said…
thank you~ I'm a student and just doing some research on Hadoop. Currently I'm interested in the storage of MapReduce applications, including HDFS and HBase.
At 10:03am on November 13, 2009, sheeraz mughal said…
Hi,
Can we encrypt the data file in HDFS with any triple DES compatible algorithm or any else and then decrypt the input while map methods before passing it onwards for any business logic and then further to reducer??? I am like beginner in Hadoop technology so if my question sounds stupid ;) then please do comment and correct me and guide me about any Security model hadoop is having???
Thanks
sheeraz
At 9:54pm on November 8, 2009, sheeraz mughal said…
hi,
To all Hadoop professionals and others i am working in a organization having world's largest Biometric Database and by keeping this fact in mind kindly let me know what can be built on top of it using hadoop where hadoop's core significance could shine as compare to other technologies. I request you all to please suggest any idea as i have to submit my research proposal in few days. Thanks a lot
sheeraz
At 11:30pm on October 5, 2009, Stefan Groschupf said…
Thanks! ... sure I will. :)
At 8:33pm on September 15, 2009, wang zhengkui said…
After reading your book, I generate some questions which I want to know. Firstly, if I want to let the reducers to fetch more partitions files from map out put, is that ok? For instance, now reducer one can fetch all the partition 1 from mappers, how I implement that reducer one can fetch all the partition 1 and also 2 to go to reducer 1? If can , How could I implement that?
Secondly, in the map phase, one recorder can only be written into one partition file according to the partitioner function. If I want to write one recorder to multi-partition files, how can I do that? For example, there are M reducers and there should be M partition files in map phase. Now one recorder can only be output to one of M partition files. If I want to output one recorder to multi-partition files, is there any way to do this?
 
 
 

© 2010   Created by Jason Venner on Ning.   Create a Ning Network!

Badges  |  Report an Issue  |  Privacy  |  Terms of Service

Sign in to chat!