Interview: David Chisnall

David Chisnall will give a talk about "Implementing Domain-Specific Languages with LLVM" at FOSDEM 2012.

Could you briefly introduce yourself?

I'm a freelance developer, writer, problem solver, or whatever else looks fun. I'm the author now of four books, one on the Xen internals and two on Objective-C programming. My latest one, due out just after FOSDEM, is about Go, a new language from Google. I sometimes visit my old university to do some lecturing.

In the hippyware world, I contribute to GNUstep (which implements the same Objective-C APIs as Cocoa on Mac OS X), LLVM/Clang, Étoilé and FreeBSD. A lot of these are related. Étoilé is built on top of GNUstep, so most of my contributions to GNUstep are motivated by needing something for Étoilé. I wrote the GNUstep Objective-C runtime and got involved with Clang because I wanted to be able to use modern Objective-C features for Étoilé development. With version 1.6 of the runtime and 3.0 of Clang, we now have a superset of the Objective-C features available on OS X and iOS, which is a very nice position to be in. A few years ago, Objective-C was in a pretty sorry state on non-Apple platforms. Features like declared properties (all of the 'Objective-C 2.0' stuff) were only in the Apple branch and so were not supported by the main release of GCC. I looked at the GCC code to see if I could add them, but it was a mess of spaghetti code. I looked at Clang, and it had full support for parsing but no support for code generation. The code was sufficiently clean that it took about two weeks to get it supporting more Objective-C than GCC and I wrote a new Objective-C runtime to go with the improved compiler support and to add things that were useful for other languages.

I've been using FreeBSD as my main development platform for years - I ditched Linux around 2001 when having two applications play sound at the same time didn't work under Linux but did under FreeBSD (the FreeBSD sound system has had some major upgrades since then and in FreeBSD 8.x and later is really impressive). Things like ZFS and DTrace made me stay.

What will your talk be about, exactly?

I'm actually giving four talks this year. One will be in the Smalltalk dev room, with a topic yet to be finalised, but something related to the Smalltalk implementation that I wrote for Étoilé. This uses LLVM for JIT and static compilation and emits classes and categories that are ABI-compatible with the GNUstep Objective-C runtime, so you can have a single object with some methods written in Smalltalk and some in Objective-C, with no bridging.

I shall be giving a talk in the GNUstep dev room about the new features in the Objective-C language that are now available to GNUstep developers. There are quite a few of these, the biggest of which is Automatic Reference Counting (ARC), which takes a huge amount of tedious work out of Objective-C development. There are also some performance improvements that I'm going to talk about.

One of my current projects is integrating a GNU-free C++ stack into FreeBSD. I've worked on various parts of this: I wrote libcxxrt, which implements the ABI features required for things like exception handling and run-time type information, and contributed to the compiler (Clang) and did the FreeBSD port of the STL implementation: libc++. FreeBSD was the second platform, after Darwin, to get a libc++ port and, last time I checked, it was passing more of the test suite on FreeBSD than Darwin. I'll be giving a talk in the BSD devroom about this new stack, aimed both at C++ developers and at people from other BSD systems who may want to adopt the same stack.

The talk that I hope most people will be interested in (since it's in the biggest room) is about implementing domain-specific languages with LLVM. I plan on putting together a couple of examples that people can download and play with for this.

What do you hope to accomplish by giving this talk? What do you expect?

Two things. First, I want to get more people playing with the LLVM libraries. Lots of people look at clang and see a GCC replacement, but it's so much more than that. Clang has half a dozen of its own libraries and uses a load more from LLVM. The actual clang executable is only a few thousand lines of code. All the rest is reusable. Even things like optimisations are modular - I wrote some for Objective-C, but because they work on LLVM bitcode they also work on code generated from the Smalltalk compiler.

Lots of projects implement some kind of language. Sometimes it's obviously a programming language. EMACS Lisp or JavaScript in FireFox are both general-purpose languages that are used for embedded scripting. Other projects have domain-specific languages that are more specialised. These tend to be more common in proprietary software because things like AppleScript and VBScript give some of the benefits of open source (i.e. the ability for users to modify and extend the programs) but without needing to release the source code. They're still useful in open applications because a specialised language can allow modifications to be a lot simpler. For example, someone wanting to write a simple script for an office suite written in C should not need to know about C memory management because that means that most office suite users would not be able to extend their own tool.

Any complex application eventually needs some scripting capability, and these often have ad-hoc interpreters or (very rarely) JIT compilers. Interpreters are, by nature, slow and that limits the code that you can write in a language that is going to be interpreted. When web browsers started to ship with fast JIT compilers for JavaScript, we saw an explosion in the things people were doing with web apps (not always in a good way) because giving people a fast high-level language that runs without a slow compile / link / run cycle significantly lowers the barrier to entry. If the language is something more specialised than JavaScript - something targeted more towards how users of a particular type of program think - then you will see more people using it.

I would love to see more people using LLVM to implement fast domain-specific languages that make it easy for users to extend their programs. Free Software is not just about giving people access to the code, or even about giving them the legal right to modify it. These are both useless if the user doesn't have the ability to make the modifications. From the perspective of a user, a proprietary software package that has an integrated development environment with a simple scripting language is more free than something that has a baroque C++ extension API, because they actually have the ability to make modifications. If we want people to care about Free Software, then we need to make it easy and attractive for them to exercise the freedoms that come with the code.

You are also a founding member and core developer of the Étoilé desktop environment. In what ways are you using LLVM for this project?

I wrote LanguageKit, which is a framework for implementing dynamic object-oriented languages. It has its own interpreter and can also use LLVM for JIT and static compilation. Code compiled with LLVM, even without much optimisation effort, is easily 40 times faster than the interpreter, but we can also take the same code for static compilation, so people can use it to write stand-alone applications and frameworks that don't depend on LanguageKit at run time.

I wrote a Smalltalk front end for LanguageKit, which we use for high-level development in Étoilé. Someone else is currently working on a parser generator inspired by OMeta and PetitParser, written in Smalltalk and using LanguageKit to generate classes from grammars. I hope this combination will make it easy for people who want to experiment with languages to quickly throw something together that has an existing library of frameworks (from Étoilé and GNUstep) and achieves good performance.

One of the goals of Étoilé is to achieve a very high level of modularity and code density. We want individual applications to be under 10,000 lines of unique code (i.e. not counting reused libraries) and that's much easier to achieve when you have expressive languages that encourage loose coupling.

Can you give some other examples of notable projects that are using the LLVM libraries to improve the performance of embedded scripts?

There's a little project from a company in Mountain View called something like Robot which uses LLVM to JIT compile animation rendering sequences for mobile device UIs, using a domain-specific language called RenderScript. The same company also uses it to allow you to ship platform-independent code to their web browser, compile it to native code, and run it in a sandbox. A fruit company in Cupertino uses LLVM everywhere - they use clang as their default compiler, the LLVM libraries in their OpenCL and GLSL implementations, and so on.

Those are pretty well-known, but there are quite a few other big users. Adobe has a few projects involving LLVM, and old-time Mac users may be interested to know that REAL BASIC now uses LLVM. A lot of languages now have variants that use LLVM, including Mono, Python, Java, Ruby, Pure, and Lua.

The BSD-like license means that you'll find LLVM embedded in a lot of places and only find out about it when people from that company start contributing patches back for features that they need.

How easy is it for someone not experienced with compiler technology to embed LLVM in his application?

That depends on what they want to do. Most people using the LLVM libraries directly will be wanting to implement some kind of compiler. The nice thing about LLVM is that it exposes everything, but doesn't require you to use everything. You can start by just converting your abstract syntax tree from an existing interpreter to LLVM's intermediate representation (IR), running the default set of optimisations and JIT compiling it. This can be very simple.

Once you have that working, you can get into the guts of LLVM - the optimisation engine - and start adding some optimisation passes designed for your language. For example, I've written some that do things like adding caching to Objective-C method lookups or even speculatively inline methods. Other people have written ones that significantly improve the performance of Objective-C reference counting by removing redundant reference count manipulations. Writing optimisations that are specific to a single language (or language family) is still quite easy, but requires a bit more understanding of how compilers work.

Someone who passed an undergraduate compilers course and is familiar with the tasteful subset of C++ that LLVM uses should have no problems embedding LLVM.

Have you enjoyed previous FOSDEM editions?

Yes, certainly. I came to FOSDEM in 2010 and 2011. In 2009 I gave a talk via a video conferencing linkup. In 2010 I gave a few talks in the GNUstep devroom. 2011 was my first main track talk, when I managed to talk about GNUstep, Étoilé, and LLVM, all in the same talk. I've been to some informative talks both years and met some interesting people. FOSDEM is the only time when I get to meet a lot of the people that I work with all year round.

This interview is licensed under a Creative Commons Attribution 2.0 Belgium License.