A Community for Hadoop Users

Time: March 11, 2010 from 6pm to 7pm
Location: Yahoo Campus Building C, Second Floor, Classroom 5
Street: 701 First Avenue
City/Town: Sunnyvale CA 94089
Website or Map: http://www.meetup.com/hadoop/…
Event Type: meetp
Organized By: Dekel Tankel
Latest Activity: Mar 11, 2010
Building C, Second Floor, Classroom 5
It's in the same campus, just cross the street and walk pass building D to Building C
6:00 - 6:20 - Socializing and Beers
6:20 - 6:50 - Preview to the Hadoop Security Release
Owen O'Malley, Yahoo!
6:50 - 7:20 - MapReduce Online
Tyson Condie University of California, Berkeley
7:20 - 7:50 - High level distributed programming with Clojure, Cascading, and Hadoop
Bradford Cross, Flightcaster
QnA and Open Discussion
MapReduce Online
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to disk before it is consumed. In this talk, I will describe a modified MapReduce architecture that allows data to be pipelined between operators. This extends the MapReduce programming model beyond batch processing, and can reduce completion times and improve system utilization for batch jobs as well. The Hadoop Online Prototype (HOP) is our modified version of the Hadoop MapReduce framework with pipelining support. It enables online aggregation, which allows users to see "early returns" from a job as it is being computed. HOP also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing. HOP retains the fault tolerance properties of Hadoop, and can run unmodified user-defined MapReduce programs in both pipelined and traditional blocking modes.
Bio:
Tyson Condie is a Ph.D. student at the University of California, Berkeley, whose research focuses on data management and distributed systems. He has been advised by Prof. Joseph M. Hellerstein since entering the Berkeley Ph.D. program in 2004. His thesis at Berkeley focuses on designing and developing distributed system software in a high-level declarative language. Prior to Berkeley graduate school he was at Stanford University where he earned a Masters degree in Computer Science under Prof. Hector Garcia-Molina. His industry experience includes research internship positions at Intel and Yahoo! as well as full-time development positions at Sybase and Oracle.
High level distributed programing with Clojure, Cascading, and Hadoop
Presenter: Bradford Cross
Flightcaster built a scalable machine learning system in Clojure wrapping Cascading and Hadoop. The infrastructure that wraps Cascading/Hadoop and its configuration/deployment to EC2 clusters is all written in Clojure. Come and see how much simpler and more fun your life can be.
6 members
4 members
11 members
1 member
9 members
© 2012 Created by Jason Venner.