Online / 6 & 7 February 2021

visit

DepClean: Automatically revealing bloated software dependencies in Maven projects


The talk introduces DepClean, an open-source tool that we developed to automatically determine the presence of bloated dependencies in Maven artifacts. DepClean performs a deep static analysis of the dependency network and suggests direct and transitive dependencies to be removed or excluded. Given an application and its build file, DepClean collects the complete dependency tree (the list of dependencies declared in the pom.xml, as well as the transitive dependencies) and analyzes the bytecode of the artifact and all its dependencies to determine the presence of bloated dependencies. DepClean also generates a clean variant of the build file in which bloated dependencies are removed.

This talk focuses on one specific type of software dependency: bloated dependencies. They are libraries that are packaged with the application's compiled code but that are actually not necessary to build and execute the application. In other words, they are libraries declared as dependencies in a build file, which can be removed from the file and the build still successfully passes. As a consequence of bloated dependencies, the binary file includes more code than necessary. An artificially large binary is an issue when the application is sent over the network (e.g., web applications) or it is deployed on small devices (e.g., embedded systems). In addition, bloated dependencies embed vulnerable code that can be exploited while being actually useless for the application. Overall, bloated dependencies needlessly increase the difficulty of managing and evolving software applications.

We present our analysis of 9639 Java artifacts hosted on Maven Central, including 723444 dependency relationships. Our key result is as follows: 2,7% of the dependencies directly declared are bloated, 15,4% of the inherited dependencies are bloated, and 57% of the transitive dependencies of the studied artifacts are bloated. Based on these results, we distill and discuss two possible causes: the cascade of unwanted transitive dependencies induced by direct dependencies and the multi-module Maven projects' dependency heritage mechanism.

Our qualitative assessment of DepClean involves 30 notable open-source projects. For each project, we used DepClean to generate a pom.xml file without bloated dependencies and submitted the changes as a pull request to the project. Notably, our work yielded 21 merged pull requests by open-source developers, and 140 bloated dependencies were removed. In summary, our results indicate that developers pay attention to their dependencies when they are notified of the problem, which stresses the need to engineer, i.e., analyze, maintain, and test POM files.

Speakers

Photo of César Soto Valero César Soto Valero

Links