Speakers
	Frank Scholten
Schedule
Day	Saturday
Room	AW1.124
Capacity	59
Start time	13:45
End time	14:15
Duration	00:30
Info
Track	Data Analytics devroom

Introduction to Clustering with Mahout

Analyze and understand large corpora of text data using Apache Hadoop for distributed computation and Mahout as distributed machine learning toolkit.

Clustering is a popular technique to analyze and understand large corpora and is a key feature of for instance Google News text. Google News automatically clusters news articles in distinct clusters so visitors can quickly find what they're looking for. This technique is also one of the key features in Apache Mahout, an Open Source framework for scalable data analysis intended to run on Hadoop. This talk introduces you to clustering, how it's implemented in Mahout and it will show you step-by-step, how to cluster text documents using Mahout's command line interface. Additionally, this talk explains how to tweak the clustering process and how this affects the generated set of clusters. This will be a beginner talk to introduce people to clustering in general and Mahout in particular.

Concurrent events:

When	Event	Track	Where
13:00-14:00	ZFS in Open Source Operating Systems	BSD	AW1.126
13:00-14:00	Arduino/AVR: interactive development on the controller with amforth	Embedded	Lameere
13:10-13:50	Welcome F# to the MonoDevelop family	Mono	AW1.120
13:10-14:00	State of OpenJDK	Free Java	AW1.125
13:15-14:00	KDE Education	Crossdesktop	H.1309
13:30-13:55	HandlerSocket and similar technologies - NoSQL for MySQL	MySQL & friends	H.2213
13:30-14:00	Building a fantastic Rube Goldberg device with Jabber-RPC!	Jabber & XMPP	AW1.121
13:30-14:00	COMAR	CrossDistro	H.1302
13:30-14:00	How to be a good downstream	CrossDistro	H.1308
13:30-14:15	Étoilé: What has been done over the past year and what's next?	World of GNUstep	AW1.117
13:40-13:55	Libre Graphics Magazine: Bringing F/LOSS Designers Together, One Dead Tree at a Time	Lightning Talks	Ferrer
13:45-14:15	Firefox 4	Mozilla	H.1301
13:45-14:15	SSH libraries: SSH vs TLS; libssh	Security & hardware crypto	AW1.105
14:00-14:15	Coreboot: x86 system boot and initialization	Lightning Talks	Ferrer
14:00-14:25	Boosting Enterprise MySQL performance: implementing I/O prefetch for InnoDB	MySQL & friends	H.2213
14:00-14:25	Manos: web apps for the lazy hacker	Mono	AW1.120
14:00-14:30	XMPP and Security	Jabber & XMPP	AW1.121
14:00-14:30	IcedRobot: The GNUlization of Android	Free Java	AW1.125
14:00-14:40	Dynamic hacking with Guile	GNU	H.2214
14:00-14:45	Community Anti-patterns	Crossdesktop	H.1309
14:00-14:50	DevOps? - More than Marketing	System	Janson
14:00-14:50	The life of a Firefox feature	Web Browsing	Chavanne
14:00-15:00	mk-configure	BSD	AW1.126
14:00-15:00	Using NixOS for declarative deployment and testing	CrossDistro	H.1302
14:00-15:00	Swimming Upstream	CrossDistro	H.1308
14:00-15:00	Advanced Experiments with XMOS Multicore Embedded Hardware.	Embedded	Lameere
14:00-15:45	LPI Exam 1	Certification	Guillissen
14:00-16:00	TYPO3 Exam Session	Certification	UA2.114
14:00-16:00	BSD Associate Exam Session	Certification	UA2.114

Next (up to 3) talks in the same room (AW1.124):

When	Event	Track
14:15-14:45	Mapping Wikileaks' Cablegate using Python, mongoDB, Neo4J and Gephi	Data Analytics
14:45-15:00	Tools and Methods for Web Data Extraction	Data Analytics
15:00-15:15	Datalift, A catalyser for the Web of data	Data Analytics

Events that start after this one (within 30 minutes):

When	Event	Track	Where
14:15-14:45	Mapping Wikileaks' Cablegate using Python, mongoDB, Neo4J and Gephi	Data Analytics	AW1.124
14:15-14:45	LanguageKit - supporting Smalltalk and JavaScript dialects on the Objective-C runtime - what's hard, what's easy, and why developers and users should care.	World of GNUstep	AW1.117
14:15-14:45	libcurl: Supporting seven SSL libraries and one SSH library	Security & hardware crypto	AW1.105
14:20-14:35	flashrom: Run your BIOS/EFI/firmware updates under any free OS	Lightning Talks	Ferrer
14:30-14:55	The Web Objects Kitchen	Mono	AW1.120
14:30-14:55	Over 20,000QPS, XtraDB performance show	MySQL & friends	H.2213
14:30-15:00	In-tab UI	Mozilla	H.1301
14:30-15:00	Stump the XMPP Experts! Open Q&A	Jabber & XMPP	AW1.121
14:40-14:55	0MQ: Multithreading magic	Lightning Talks	Ferrer
14:45-15:00	Tools and Methods for Web Data Extraction	Data Analytics	AW1.124
14:45-15:00	CyaSSL	Security & hardware crypto	AW1.105
14:45-15:00	EtoileText	World of GNUstep	AW1.117
14:45-15:30	Gallium state trackers applied to 2D rendering libraries	Crossdesktop	H.1309

fosdem.org

User login

Introduction to Clustering with Mahout

Concurrent events:

Next (up to 3) talks in the same room (AW1.124):

Events that start after this one (within 30 minutes):