Hadoop Professionals

A Community for Hadoop Users

Nikolay

Hadoop... Text.toString() conversion problems

Hi everyone,

I hope this is the right place for my question. If not, please, feel free to ignore it  ;) and I'm sorry for any inconvenience made :(



I'm writing a simple program for enumerating triangles in directed graphs for my project. First, for each input arc (e.g. a b, b c, c a, note: a tab symbol serves as a delimiter) I want my map function output
the following pairs ([a, to_b], [b, from_a], [a_b, -1]):


 public void map(LongWritable key, Text value,

OutputCollector/span>Text, Text> output,

Reporter reporter) throws IOException {


String line = value.toString();

String [] tokens = line.split(" ");


output
.collect(new Text(tokens[0]), new Text("to_"+tokens[1]));

output
.collect(new Text(tokens[1]), new Text("from_"+tokens[0]));

output
.collect(new Text(tokens[0]+"_"+tokens[1]), new Text("-1"));


}

Now my reduce function is supposed to cross join all pairs that have both to_'s and from_'s and to simply propogate any other pairs whose keys contain "_".


      public void reduce(Text key, Iterator/span>Text> values,

OutputCollector/span>Text, Text> output,

Reporter reporter) throws IOException {


String key_s = key.toString();


if (key_s.indexOf("_")>0)

output
.collect(key, new Text("completed"));


else {

HashMap /span>String, ArrayList/span>String>> lists = new HashMap /span>String, ArrayList/span>String>> ();


while (values.hasNext()) {


String line = values.next().toString();


String[] tokens = line.split("_");

if (!lists.containsKey(tokens[0])) {

lists
.put(tokens[0], new ArrayList/span>String>());

}
lists
.get(tokens[0]).add(tokens[1]);

}

for (String t : lists.get("to"))

for (String f : lists.get("from"))

output
.collect(new Text(t+"_"+f), key);



}

}

And this is where the most exciting stuff happens. tokens[1] yields an ArrayOutOfBounds exception. If you scroll up, you can see that by this point the iterator should give values like "to_a", "from_b",
"to_b", etc... if I just output these values, everything looks ok and I
have "to_a", "from_b". But split() don't work at all, moreover
line.length() is always 1 and indexOf("_") returns -1! The very same
indexOf WORKS PERFECTLY for keys... where we have pairs whose keys
contain "_"
and look like "a_b", "b_c"


I'm really puzzled with all this. MapReduce is supposed to save lives making everything simple. Instead I spent several hours to just spot  this...


I'd really appreciate your help, guys!!! Thanks in advance!


Views: 8

Reply to This




Groups

© 2012   Created by Jason Venner.

Badges  |  Report an Issue  |  Terms of Service