Brussels / 4 & 5 February 2017


Making Wiki Gardening Tasks Easier Using Big Data and NLP

I have been involved with Fedora Community Operations Team where I mostly contribute on community-metrics related tasks. This talk will be about a NLP-based tool I have built for Fedora wiki to make wiki-gardening tasks easier for contributors, the methods I used for building it and how it can be scaled to any other wiki.

Fedora wiki is a community documentation space about different projects, initiatives, contributors in Fedora Project. However, due to large size of the project, the Fedora wiki has been growing in size and has become difficult to manage. As a result, wiki gardening tasks were born. These tasks generally involve not only updating current information on wiki pages but also identifying pages with redundant information to merge them, marking pages with old and outdated information accordingly etc.Most of these wiki gardening tasks are a part of one-day FADs, hackathons or sprints and suitable for new contributors looking to contribute to the project as they have a low entry-barrier. While there has been a category traditionally to mark wiki pages which need to be worked on, this task is mostly manual(needs to be done by some contributor prior to the hackathon or event) and also doesn't cover the whole wiki as it is manually difficult due to it's large size.

My tool uses Natural Language Processing-based techniques to 1.Identify pages with redundant information so that contributors can merge/delete them 2.Identify pages in a specific category/topic which need to be worked on. 3.Identify old pages which might need to be updated or categorized as out-of-date

To make the tool faster, I have used Parallel Processing techniques. In this talk, I would like to describe the tool and it's functionality, the methods I used for building it and how it can be scaled to any other wiki. I would also like to hear feedback on other wiki gardening tasks this tool could possibly be used for, and how to scale it to automatically merge pages with redundant information.


Photo of Bee Padalkar Bee Padalkar