Brussels / 2 & 3 February 2019


Mining Source Code^3

Mining Idioms, Usages and Edits

Code is an incredible source of information. Indeed, mining software repositories can tell us whether code is natural, how to use a new framework, or how to identify similar changes.

In this talk, I will present state-of-the-art and limitations on the usage of graph-based algorithms to mine software repository.

In particular, I will cover how program analysis, pattern mining and pattern matching can help developers to identify:

  1. conventions and idioms, using syntactic information commonly represented as Abstract Syntax Trees;
  2. framework usages, relying on semantic information such as control and data dependencies represented as Program Dependency Graphs;
  3. similar modifications in multiple different locations, computed using change loggers or change distillers.

Last I will present INTiMALS, an ongoing industry-university collaboration to develop a language-parametric framework for mining this information in legacy systems.


Photo of Dario Di Nucci Dario Di Nucci