A user is running into an interesting problem.
hadoop jar ../contrib/streaming/hadoop-0.19.1-streaming.jar -mapper "/usr/bin/perl /home/hadoop/scripts/map_parse_log_r2.pl" -reducer "/usr/bin/perl /home/hadoop/scripts/reduce_parse_log.pl" -input /logs/*.log -output test9
The code I have works when given a small set of input files. However, I get the following error when attempting to run the code on a large set of input files:
hadoop-hadoop-jobtracker-testdw0b00.log.2009-06-09:2009-06-09 15:43:00,905 WARN org.apache.hadoop.mapred.JobInProgress: No room for reduce task. Node tracker_testdw0b00:localhost.localdomain/127.0.0.1:53245 has 2004049920 bytes free; but we expect reduce input to take 22138478392
I have suggested he either run more reducers, and merge sort the resulting 2 tb file, or to try using compression for the map outputs.
-D mapred.compress.map.output=true -D mapred.output.compression.type=BLOCK
I believe 0.19.1 does not uncompress the the data, but it may only be possible if the compresion.type=RECORD
You need to be a member of Hadoop Professionals to add comments!
Join Hadoop Professionals