Speakers | |
---|---|
Claudio Martella | |
Schedule | |
Day | Sunday |
Room | AW1.125 |
Capacity | 76 |
Start time | 09:30 |
End time | 10:15 |
Duration | 00:45 |
Info | |
Track | Graph Processing Devroom |
Apache Giraph: distributed graph processing in the cloud
Web and online social graphs have been rapidly growing in size and scale during the past decade. In 2008, Google estimated that the number of web pages reached over a trillion. Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future. Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
The Apache Giraph [1] project is a fault-tolerant in-memory distributed graph processing system which runs on top of a standard Hadoop [2] cluster and is capable of running any standard Bulk Synchronous Parallel (BSP) operation over any large generic data set which can be represented as a graph. Apache Giraph is a loose implementation of Google Pregel but can be added to any Hadoop job pipeline as a normal MapReduce job. Giraph entered the ASF Incubator in July 2011, where it has enlisted the aid of committers from Yahoo!, Facebook, LinkedIn, and Twitter.
The talk will describe why running iterative MapReduce jobs for graph processing is not well suited for typical MapReduce jobs, introducing the reason why Google designed Pregel at first place. Next, the BSP model and how it is applied to graph processing will be explained. The last part of the talk will be dedicated to Apache Giraph, with a description of the programming model (i.e. the API, some typical examples such as PageRank and Single Source Shortest Path) along with a technical overview of how the architecture of Giraph works and how it leverages the Hadoop infrastructure.
Concurrent events:
Next (up to 3) talks in the same room (AW1.125):
When | Event | Track |
---|---|---|
10:20-11:05 | Using Cascalog and Hadoop for rapid graph processing and exploration | Graph Processing |
11:10-11:55 | Birds of a feather - Graph processing, future trends! | Graph Processing |
12:00-12:35 | Works with persistent graphs using OrientDB | Graph Processing |
Events that start after this one (within 30 minutes):
When | Event | Track | Where |
---|---|---|---|
10:20-11:05 | Using Cascalog and Hadoop for rapid graph processing and exploration | Graph Processing | AW1.125 |
10:20-10:35 | Threat Modeling Revolutionized! | Lightning Talks | Ferrer |
10:30-10:55 | MariaDB 5.3's query optimizer: taking the dolphin to where he's never been before | MySQL and Friends | H.1309 |
10:30-11:00 | Tracking Firefox performance via Telemetry | Mozilla | UD2.218A |
10:30-11:00 | JRuby | Free Java | K.4.401 |
10:30-11:10 | Mobicents TelScale and RestComm | Telephony and Communications | H.2213 |
10:30-11:30 | OBS Cross Build | CrossDistribution | H.1301 |
10:30-12:15 | LPI Exam Session 3 | Certification | Guillissen |
10:40-10:55 | An introduction to EclipseRT | Lightning Talks | Ferrer |
10:45-11:15 | Boxes, use other systems with ease | CrossDesktop | H.1308 |