Brussels / 3 & 4 February 2018


GrimoireLab: free software for software development analytics

The talk will explain how to analyze software development with GrimoireLab. It will show with simple code how easy it is to retrieve data from git, Bugzilla, GitHub, mailing lists, StackOverflow, Gerrit, IRC, Slack, and many other repositories. Then, with the same toolkit, the data will be organized in ElasticSearch indexes, visualized in actionable dashboards, and summarized in reports. Some advanced analysis will also be presented on how to exploit the data using Python/Pandas and IPython/Jupyter Notebooks. The talk will be complemented with interesting insights on real FOSS projects.

Many free / open source software (FOSS) projects feature an open development model, with public software development repositories which anyone can browse. These repositories are normally used to find specific information, such a certain commit or a particular bug report. But they can also be mined to extract all relevant data, so that it can be analyzed to learn about any aspect of the project. This talk will explain the GrimoireLab method for doing that, which is based on organizing all that information in a database, which can be later analyzed. This approach allows for minimal impact on the project infrastructure, since data is retrieved only once, even if it later analyzed many times. It allows as well for efficiency and comfort when mining data for an analysis, since the results are readily available, databases can be shared and replicated at will, and queried them with any kind of tools is easy.

The tools that retrieve information from the repositories are grouped in the GrimoireLab toolset. It includes mature, widely tested programs capable of extracting information from most repositories used by FOSS projects of any scale. Many of them are agnostic with respect to the database used, although currently ElasticSearch is the best supported.

The produced databases can be exploited in several ways, of which two will be explained during the talk: using Python/Pandas to produce IPython/Jupyter Notebooks which analyze some aspect of the project; and using Python to feed a ElasticSearch cluster, with a Kibana front-end for visualizing in a flexible, powerful dashboard.

All these approaches can be used to understand general aspects of the project, such as how efficient are the code review or bug fixing processes, how diverse are contributions to the git repository, or how conversations in mailing lists or StackOverflow are shaped. But they can be used as well to drill down, and analyze the contributions by a certain developer, or the longer code review processes, or the contents of the most lively email and QA threads.

The talk will explain the whole process from data retrieval to visualization, and will show some specific cases of real world use, such as the dashboards produced for Eclipse, OPNFV, MediaWiki and many others. Some of the contents of the talk are described in detail in the online book GrimoireLab Training.

GrimoireLab is on of the systems produced by the CHAOSS Collaborative Project.


Photo of Jesus M. Gonzalez-Barahona Jesus M. Gonzalez-Barahona