Anil Madhavapeddy
Event | Track | Day | Room | Start time | Duration |
---|---|---|---|---|---|
The Wild West of UNIX I/O | Network and IO Track | Sunday | K.1.105 | 12:00 | 00:50 |
The Wild West of Modern I/O (and how we might tame it)
Inter-process communication and remote procedure call facilities have existed in operating systems for many decades. Ever since the first parallel applications ran on time-sharing machines, programmers have sought
ways to communicate between processes running on a single machine, and the first networked applications introduced the concept of sending a message to trigger a remote action.
And today, these primitives are more relevant than ever before: parallel programming on clusters of machines relies heavily on facilities to pass data between processes and hosts. On a higher level, data-flow frameworks for parallel processing of large data sets (such as Hadoop or CIEL) depend on passing data between different tasks, which may run anywhere, including local to a machine, on a networked cluster, or far away in a virtualised wide-area "cloud".
And yet, we are stuck with UNIX communication APIs closely coupled to the underlying mechanisms used to implement them: the programmer choice of sockets, pipes or shared memory constitutes an implicit choice of a whole set of assumptions about the relative locations of the communicating parties, as well as how the message is to be delivered. Worse even, the implicit trade-offs may not be the same in a different environment, and thus the programmer's choice of API depends on assumptions about the runtime environment (hardware, software and setup) in addition to the characteristics inherent to a mechanism implied.
This talk will firstly discuss the impossibility of using current APIs efficiently (via benchmarks on a diverse set of hardware (from many-core AMDs to the experimental Intel SCC). Finally, I will describe our work on introducing a hierarchical name system and extended socket API that adds support for automatic transport selection and reconfigurable sockets. This permits many NUMA-related optimisations on single hosts, for VMs to switch to shared memory communication if on the same physical host, and for seamless network-wide protocol upgrades to multi path TCP or TCPcrypt.