Tools for large-scale collection and analysis of source code repositories
Open source Git repository collection pipeline
- Track: HPC, Big Data, and Data Science devroom
- Room: H.1302 (Depage)
- Day: Sunday
- Start: 12:30
- End: 12:40
There are 10s of millions Git repositories publicly available over the Internet, but what kind of tools would one need to be able to treat all this code as a Big Dataset? I will talk about new and existing OSS tools that were built and used, in order to allow collection and analysis of millions of Git repositories on commodity hardware clusters.
Speakers
Alexander Bezzubov |