Brussels / 30 & 31 January 2016

schedule

Converting LiquidThreads to Flow

Or, how I learned to stop worrying and love the batch


Flow is a structured discussion system for MediaWiki wikis, including Wikipedia. Development is led by the Collaboration team at the Wikimedia Foundation, and we have gradually begun using it on Wikimedia projects.

LiquidThreads, an older structured discussion system for MediaWiki wikis, is still in use on some Wikimedia wikis.

Both projects aim to foster Wikimedia collaboration.

We have been converting discussions from LiquidThreads to Flow, to reduce technical debt and make Flow's new features and design available to more users. We have implemented resumable batch software to complete this work.

This talk will be a playful, guided walk through Wikimedia's conversion, exploring how batch software for a complex system has been developed, debugged, and (repeatedly) troubleshooted.

Come hear about the batch architecture and the conversion process. Topics will include performance issues with long-running batch jobs and the challenges of mapping between systems with different data models. Plus, hear why a post about broken JavaScript broke our node.js service, what characters are (really) allowed in HTML 5 and XML, why "Ciudad de México" was the last page I converted in Mexico City, and more.

Flow is a structured discussion system for MediaWiki wikis, including Wikipedia. Development is led by the Collaboration team at the Wikimedia Foundation, and we have gradually begun using it on Wikimedia projects. Flow has its own data model and storage for discussions, posts, and revisions.

Unlike other discussion mechanisms on Wikipedia, Flow has a modern user experience. We also plan to extend Flow to support workflows. Workflows would, for example, allow people to use a structured process to discuss elevating an article to “featured” quality.

LiquidThreads, an older structured discussion system for MediaWiki wikis, is still used on some Wikimedia wikis. It uses the standard page- and revision-based storage system. LiquidThreads has a somewhat different feature set from Flow.

Both projects aim to foster Wikimedia collaboration, especially for new users. The Collaboration team has been converting discussions from LiquidThreads to Flow, to reduce technical debt and make Flow's new features and design available to projects that want it. We have implemented resumable batch software to complete this work.

This talk will be a playful, guided walk through Wikimedia's conversion, exploring how batch software for a complex system has been designed, developed, debugged, and (repeatedly) troubleshooted.

Come hear about the batch architecture and the conversion process. I'll briefly explain how the pluggable conversion architecture we developed initially supports converting from LiqudThreads, but could be used to convert from other discussion systems in the future. I'll cover some of the challenges of mapping between systems with different data models. The talk will also explain how we tested against production data locally.

I'll also discuss how we staged such a large conversion, and the ordering of pages we chose. Converting certain pages first allowed us to reduce user impact while bugs were ironed out.

The talk will also consider performance issues with long-running batch jobs, particularly memory leaks.

Plus:

  • Why a post about broken JavaScript broke our JavaScript service
  • Why a post about broken XML broke our XML parser (and what characters are (really) allowed in HTML 5 and XML)
  • Why "Ciudad de México" was the last page I converted in Mexico City.
  • Why bash is not the best Unicode toolkit available.
  • And more

Speakers

Matt Flaschen

Attachments

Links