Virtualisation seems to be becoming a buzzword, so I thought I'd take a little time to look into some of the features in the upcoming Linux 2.6.25 release. In particular, it should soon be quite possible to build an open source version of Amazon's EC2 on vanilla Linux - giving the benefits of hardware/network proximity, and avoiding provider lock-in.
Amazon's cluster is built on Xen (AIUI), which is still not quite as good as VMS in some regards - you can't upgrade the kernel version of a running system, for instance. It also trades simplicity of implementation for various inefficiencies, by design. By comparison, Linux namespaces are designed to never leak any information about the host of the namespace into userland, and this is the key feature for allowing live upgrades. This capability in turn leads to the possibility of strong checkpointing, replicated (tandem) computing, etc. Namespaces can also be used in ways that only virtualise the parts of the system required for the task at hand - making overall system throughput higher by avoiding unnecessary duplicate structures.
Building on Linux containers, all going well, should mean VMS could actually be replaced in environments that are looking for application uptimes of decades. It might take a few years of uptime before people trust that, however.
There are many systems built atop of the various different virtualisation systems for Linux, each with their own strengths and weaknesses. These systems will probably all need porting to use the new mainline-integrated system calls - most of which are built around the old clone(2) and unshare(2) calls. Or, perhaps someone will build something around a tool such as Puppet that takes care of the system tuning and capacity tracking / management stuff automatically - so that people can grab all systems, plug them all into a network, and consider their computing resources to be the "fluid capacity" that is the key selling point of Amazon's service (update: ok, so it's also a bit more - thanks Don, it's a useful low-down on what I missed out on!)
In Amazon's favour, their system works already, and they were first to market with this very useful idea.
Anyway, enough of that, here's a brief list of the major feature groups, in order of merging into the mainline kernel:
Filesystem namespaces
This was the first namespaces-related feature added to the kernel, which hit way back in 2.4.1x somewhere - roughly 2001 IIRC. This allows groups of processes on the same system to have differing mount tables, and is why /proc/mounts became a symlink to /proc/self/mounts
UTS, IPC namespaces
New to 2.6.23 or so, this feature allows groups of processes to have independently adjustable hostnames, machine types, etc. The IPC namespaces support means that you can isolate things like signals between groups of processes, or tie things like global semaphores to the group of processes that created them. Note by "group of processes", we are normally talking about virtual servers.
PID namespaces
New to 2.6.25, this may be one of the more pivotal pieces of integrated code. It allows for process IDs on the system to be non-global, it allows for an even greater level of separation between virtual servers. It also means that processes are transportable between systems; you don't need to worry about PIDs changing, as there can be no conflicts. It also meant that all places within the kernel that currently used process IDs are forced to tidy up and refer to the kernel object for the task, or to the fully qualified name. For massive systems, it should also mean that you don't have to use PIDs greater than 16 bits (pesky things that they are) when hosting many virtual servers.
Network namespaces
Integration of this feature is very important, as it means that good support for virtualisation in areas such as the iptables stack has arrived. It also means that you can assign virtual servers their own network interfaces using any supported network protocol, without fear of IP addresses used within a machine colliding.
Zones and Controllers
As I understand it, "Zones" was the term used to control groups of processes as these apply to scheduling. The "Controllers" are the schedulers that govern the various subsystems. For instance, many of the existing Linux systems already sport these - especially the network stack, recently (~2.6.14 iirc) IO schedulers, etc. These Controllers wrap these concepts up.
EC2 rip-off HOWTO
- Make sure you have a flexible library that supports the new mainline namespaces features
- Make sure you know how to get good information from the Linux kernel about the system utilisation of a "zone". Anatomical term: build a set of good receptors for the system
- Make sure you know how the various subsystems of the Linux kernel's scheduling works. Anatomical term: build a set of good effectors
- Write a program to govern them, create them, etc. Anatomical term: build a control centre
Easy, right?
Closing notes
There are many areas that I have yet to cover - sadly close involvement in this area is something I've had to cut back on due to creeping projectitis. And this treatment is really based primarily on discussions which are quite dated. For the best idea, get a fresh copy of the Linus kernel tree, and use something like gitk v2.6.24..v2.6.25-rc1 to see what the newest features are. One of the people leading the effort I respect the most is Eric Biederderm, so be sure to check the commits that he's written or signed off - they probably are related.


Recent comments
11 hours 41 min ago
2 days 16 hours ago
4 days 20 hours ago
5 days 1 hour ago
5 days 23 hours ago
6 days 21 hours ago
1 week 1 hour ago
2 weeks 1 day ago
2 weeks 1 day ago
2 weeks 1 day ago