Tools for large-scale collection and analysis of source code repositories
Open source Git repository collection pipeline
- Track: HPC, Big Data, and Data Science devroom
- Room: H.1302 (Depage)
- Day: Sunday
- Start: 12:30
- End: 12:40
There are 10s of millions Git repositories publicly available over the Internet, but what kind of tools would one need to be able to treat all this code as a Big Dataset? I will talk about new and existing OSS tools that were built and used, in order to allow collection and analysis of millions of Git repositories on commodity hardware clusters.
Speakers
| Alexander Bezzubov |