Hadoop Professionals

A Community for Hadoop Users

This network is a place to discuss and learn Hadoop, Solr, Katta, Map Reduce, Machine Learning and Big Data

Members

  • Darsh
  • vijender
  • Jason Venner
  • Ranadip
  • vinay
  • arup sarkar
  • Prabha Satya
  • Zoltan Gabor
  • John Yard
  • Alex Gauthier
  • Abhi
  • Chaula Ganatra
  • Greg
  • Chris Tilton
  • biddyweb
  • sadhna

Latest Activity

Profile Icon
ThumbnailThumbnail
Darsh and vijender joined Hadoop Professionals 7 hours ago
Profile Icon

Setting up JAVA_HOME in hadoop-env.sh

Hi: I am in the process of installing hadoop on non-clustered single node windows xp. SSH installed and running. Java version: java version "1.5.0_15" it is installed in c:\Program Files\Java\jdk1.5.0_15 hadoop-env.sh export JAVA_HOME=/cygdrive/c/Program\ Files/Java/jdk1.5.0_15 When I am executing the command bin/hadoop namenode -format I am getting the following error, how to escape "Program Files"? Error: ============== $ bin/hadoop namenode -format cygpath: can't convert empty…See More
Discussion posted by arup sarkar Tuesday
Profile Icon
ThumbnailThumbnail
vinay and Prabha Satya joined Hadoop Professionals Tuesday
Profile Icon

hardware failures in hadoop

Hi all,  We are doing a project in hadoop"A hadoop compatible framework for detecting network topology and detecting and diagnosing hardware failures".   We need to know which distribution of hdfs is helpfull to work on.See More
Discussion posted by Prabha Satya Tuesday
Profile Icon
ThumbnailThumbnail
Zoltan Gabor and Alex Gauthier joined Hadoop Professionals Jan 20
Profile Icon

Turning off Speculative Execution

 I am running hadoop 0.20 and hive 07 , and my conf/mapred-site.xml says speculative execution is false. Yet when I examine the joblog xml of the running job it says speculative execution is true , and I see occasions where it has started mutiple executions of a reduce .How can I turn speculative execution off given my mapred-site.xml ?John YardtechnicolorSee More
Discussion posted by John Yard Jan 19
Profile Icon

Incompatible namespaceIDs after formatting namenode

Hi,We just started implementation of cloudera hadoop on our system for the first time. After reformatting a namenode for a few times, DataNode is not coming up with error "Incompatible namespaceIDs" I found a note http://pages.cs.brandeis.edu/~cs147a/lab/hadoop-troubleshooting/ but I'm really not sure about removing data node directories. How is it possible that data will not be erased…See More
Discussion posted by Greg Jan 15
Profile Icon
ThumbnailThumbnail
Greg and Abhi joined Hadoop Professionals Jan 15
Profile Icon
ThumbnailThumbnail
Chris Tilton and pengcheng liu joined Hadoop Professionals Jan 13
Profile Icon
Blog posts by Chaula Ganatra Jan 13
Profile Icon

Bulk Load to hbase using HFileOutputFormat

Hi, guys I am new to Hadoop and hbase.I need to use improve my loading rates by using bulk load. Based my searching, hbase now support a way that you can use MR to configure a job to out put a HFileOutputFormat file and directly put those hfiles on the HDFS. I understand the concept and also found several sample code in the following links:…See More
Discussion posted by pengcheng liu Jan 12
Profile Icon
ThumbnailThumbnailThumbnailThumbnailThumbnailThumbnail
ZOUAOUI MOHAMED, Ranjana, sadhna and 3 more joined Hadoop Professionals Jan 12
Profile Icon
Deepthi is now a member of Hadoop Professionals Jan 10
Profile Icon
ThumbnailThumbnailThumbnail
Monis, Surendra and Beetle joined Hadoop Professionals Jan 6
Profile Icon
Bhavesh Shah commented on Bhavesh Shah's blog post 'Query related Hadoop's Map-reduce'
Hello Jason, I have tried in the same way but it is taking more time to process.....
Jan 6
Profile Icon
Jason Venner commented on Bhavesh Shah's blog post 'Query related Hadoop's Map-reduce'
If I understand you, you have to data sets, A & B, and for each record of A, you have to operate on every record of B. The simplest way would be to use A as the input data set for your map reduce job, and to open and scan through B be in side…
Jan 6

Photos

Loading…
  • Add Photos
  • View All
 

Help With Hadoop

A great place to learn Hadoop, and to tune your map reduce jobs.

Ask specific Hadoop questions here to get help from an expert :)

Forum

arup sarkar

Setting up JAVA_HOME in hadoop-env.sh

Started by arup sarkar on Tuesday.

Prabha Satya

hardware failures in hadoop

Started by Prabha Satya on Tuesday.

John Yard

Turning off Speculative Execution

Started by John Yard Jan 19.

Events

Blog Posts

Chaula Ganatra

Hadoop MapReduce to compare Relational Data Stored in text File

Hi All,

We are reading two txt files which contains relational data and then we generate java objects from these data.

Then we compare the objects of two files. We want to find which objects differ in both the files.

Note : We can not compare Strings (line by line) of one file to another file because one line contains reference of another line of the same file and this way it has tree of references.

Again each line may have different data structure (we…

Continue

Posted by Chaula Ganatra on January 13, 2012 at 1:37am

Bhavesh Shah

Query related Hadoop's Map-reduce

Scenario:

I have one subset of database and one dataware house. I have bring this both things on HDFS. I want to analyse the result based on subset and datawarehouse. (In short, for one record in subset I have to scan each and every record in dataware house).

Question:

I want to do this task using Map-Reduce algo. I am not getting that how to take both files as a input in mapper and also how to handle both files in map phase of map-reduce. Pls suggest me some idea so…

Continue

Posted by Bhavesh Shah on January 3, 2012 at 12:35am — 2 Comments

radhakrishnan_cse

Need help ?? Can u say this ??

I am currently doing project in energy efficiency and reliability in hadoop.. Can any one tell me which class in hadoop doing block spliting and block allocation and block replication.. Also whether is it possible to dynamically add nodes in cluster..if it is possible tell me the steps.. Can any one give the answer..? Plz soon..

Posted by radhakrishnan_cse on December 28, 2011 at 8:30am — 1 Comment

Dmitriy Goldin

Hadoop architect/developer needed for a premier brokerage firm

A major brokerage firm is looking for a senior Hadoop architect/developer. You will work with a variety of financial data, trading and back-office-related. Security Master knowledge is helpful.

You will work on the infrastructure end, working on logical and physical architecture. Your main responsibility will be working with Terabytes of data and organizing it into data domains. You should have extensive experience with data implementations, data storage, data access and…

Continue

Posted by Dmitriy Goldin on June 20, 2011 at 6:55am

Muhammed Irshad

Loading tables Using Serde

Can any one give an explanation on Serde option of loading data into tables ....

Posted by Muhammed Irshad on May 31, 2011 at 10:12pm

Yahoo Hadoop Developer Blog

Hadoop Summit 2011 – A Different Approach

Hadoop Summit 2011 is over. If you saw this tweet ”#hadoopsummit planned for 1,500. upped on demand to 1,600. finally accommodated 1,700. ran out of space, good problem to have. :-),” then you probably got an idea of how exciting and mobbed the conference was this year. With folks dropping by from coast-to-coast, and quite [...]

Fourth Annual Hadoop Summit: The Countdown Begins!

On June 29, Yahoo! will host the 4th annual Hadoop Summit at the Santa Clara Convention Center. Hadoop Summit 2011 brings together some of the most influential thought leaders in the space - from Yahoo, Facebook, IBM, NetApp, and others. Jay Rossiter, Senior Vice President of the Yahoo! Cloud Platform Group will open the show [...]

Slides from eric14 talks @ #IbmBigData

Hi Folks, Here are my slides from the IBM big data symposium. This was a good event. IBM announced a new release of their Apache Hadoop based Big Insights platform. It is great to hear their commitment to Apache. Yahoo was there talking about our experiences and uses of Hadoop. I got a lot of [...]

Hadoop Summit CFP closing tomorrow!

Stack and I are the track organizers for the community track at the Hadoop Summit this year. The community track is for presentations on roadmap, developments and features in Apache Hadoop. So if you've added a new feature to Hadoop and want to publicize it to the world's largest and most important Hadoop conference, please [...]

Call for participation in the Hadoop Summit Research Track

Hadoop Summit is a great annual gathering of developers to talk about all things Hadoop. The attendance is great, we are expecting 2000 this year; the presentations are excellent; and the hallway conversations are a great way to meet new people and come up with new ideas. This environment is especially great if you have [...]

Cloudera Hadoop Blog

January 2012 Bay Area HBase User Group meetup summary + HBaseCon announcement

More than 150 people attended the San Francisco Bay Area HBase User Group meetup last Thursday, January 19th, at eBay headquarters in San Jose, California.  Presenters from StumbleUpon, Facebook, eBay and MapR shared a wealth of information about Apache HBase operations and optimizations, gleaned from their experience running HBase in production environments. One special item of note: [...]

Seismic Data Science: Reflection Seismology and Hadoop

When most people first hear about data science, it’s usually in the context of how prominent web companies work with very large data sets in order to predict clickthrough rates, make personalized recommendations, or analyze UI experiments. The solutions to these problems require expertise with statistics and machine learning, and so there is a general [...]

Apache HBase 0.92.0 has been released

Today the Apache HBase community has proudly released Apache HBase 0.92.0, a major new version of the scalable distributed data store inspired by Google’s BigTable.  Over 670 issues were addressed, so in this post I’ll highlight some of the major features and enhancements and describe what they mean for HBase users, admins, and developers. User Features While the [...]

Hadoop World 2011 Videos and Slides Available

Last November in New York City, Hadoop World, the largest conference of Apache Hadoop practitioners, developers, business executives, industry luminaries and innovative companies took place. The enthusiasm for the possibilities in Big Data management and analytics with Hadoop was palpable across the conference. Cloudera CEO, Mike Olson, summarizes Hadoop World 2011 in these final remarks. [...]

Apache Sqoop: Highlights of Sqoop 2

This blog was originally posted on the Apache Blog: https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop Apache Sqoop (incubating) was created to efficiently transfer bulk data between Hadoop and external structured datastores, such as RDBMS and data warehouses, because databases are not easily accessible by Hadoop. Sqoop is currently undergoing incubation at The Apache Software Foundation. More information on this project [...]
 
 
 



Groups

Badge

Loading…

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service