Brussels / 31 January & 1 February 2015


Finding Bad Needles in Worldwide Haystacks

Experience of using Go for a large-scale web security scanner

Over the last year, we have been using Go to improve the accuracy and quality of our web security scanning system that has become part of our Continuous Development pipeline and now checks all Yahoo websites and changes to them. We would like to present several components of the scanner and share some of our experiences, results, and lessons from using Go for web scanning at a large scale.

Go has caught our attention after the 1.0 release with the approachable learning curve, clean syntax, good HTTP library and built-in concurrency/syncronization primitives and overall a promise of a good match for our tasks. As a warm-up and initial study, we wrote a dead link scanner to plow through thousands of sites and find a number of "bad needles" (404s) - all while running on a single server. Encouraged, we proceeded with converting several parts of the security scanning system to Go.

We started with "webseclab" - set of tests for the scanner and framework to do experiments and proof of concepts/demos. It is similar in spirit to the recently released "firing range", webgoat or DVWA ("Damn Vulnerable Web Application") but thanks to the Go implementation may have special or even unique features.It is extremely easy to fire up (no dependencies and no Tomcat setup or such needed) which allows you to turn any random VM or host into a functional web security "lab". Also, switching between Go's text and html templates, we get "for free" - with just a few lines of extra code - a set of "fixed" tests where the injections are neutralized by proper output escaping which comes especially useful for training and communicating with developers. The webseclab is optimized for rapid addition of new tests and cases - we have been using it to "resurrect" already fixed issues reported through the BugBounty program and quickly convert HTTP dumps into realistic tests ready to be used for scanner improvement.

The next stage of the still ongoing conversion was rewrite of the analysis engine. We named it "contextdetect" as the task of defining the HTML context is crucial for that piece, and here Go has helped us to a break-through. Gone are painful and fragile layers of regexps and scripting logic that were used to simulate HTML parsing and context detection - but still lead to bouts of either false positives or false negatives. We use net/html HTML5 parser [1] to define the HTML context where the injection happens and evaluate it accordingly. We also use Robert Krimen's Javascript parser "otto" [1] to check whether an injection breaks the Javascript (likely bad) or not. Using real parsers allowed us to reduce the number of annoying false positives close to zero and make the life of the tool's users more pleasant - while still finding the real issues (as continously verified with the webseclab). We also implemented in Go a web services wrapper that distributes the work to a number of redundant scanning workers and used "go test" for functional smoke test of the scanner.

In the process of development in our CD environment, we had several challenges with the build process as the standard "go get" approach was off-limits for us and we have a strict requirement to mirror all the 3rd party code used for builds internally. Currently we are using the Android REPO tool to maintain description of the dependencies as well as specify the GOPATH-compatible workspace - we will discuss our considerations of the involved tradeoffs.

While not everyone may have to be as obsessed with defending against XSS and other badness as we are, we hope that our experience will be helpful and encouraging for others who need to do large-scale web or network verification, quality assurance, or investigations. Let's go make the web a better and safer place! :)

[1] [2]


Dmitry Savintsev