As mentioned in an earlier post, server virtualization was a hot topic at this year’s LinuxWorld. This post will discuss some of the advantages and disadvantages of virtualization, and the various types of virtualization solutions in use.
What is Virtualization?
On a non-virtualized system, only one operating system (OS) can be running at a time. Virtualization allows a system to run one or more “guest” OSs on top of the “host” OS. Virtualization software tricks the guest OS’s into thinking they are running directly on hardware, when in fact they are running within a Virtual Machine (VM).
For example, a Windows XP system could have a copy of RedHat in a VM, and Windows Server in another VM, assuming the hardware is powerful enough to support having three OS’s running at the same time.
Initially, virtualization was used mostly by engineers for development and QA, because virtualization was a big time saver. For instance, since a guest OS’s entire disk image can be a regular file in the host OS, you can clone VMs easily. Thus, a QA engineer could be guaranteed an exactly identical system each time they ran a regression test. Also, testing a server with multiple OSs became much easier — instead of having 10 physical client systems (Win95, Win98, Win2k, WinXP, MacOS X, etc), a QA engineer could have 10 different VM images on one physical system.
What is Server Virtualization?
Server Virtualization is when VMs are used to host production services, such as external web sites, email servers, file sharing, etc. Server Virtualization is different from development/QA system virtualization in serveral major ways:
- Performance and reliability are paramount.
- Server Management is more important, especially cross-server management. If you have 30 physical host systems, the management software must let you view all of their status info at the same time.
- Each physical host must be able to support a significant number of VMs at a time.
- The VM software must support high-availability features such as failover, moving VMs from system to system “live”, and load balancing of VMs.
Why Server Virtualization?
Application Isolation
Installing multiple server applications on a single server without virtualization leads to several issues. First, there is the possibility of application conflict. For instance, app A may require a particular patch that app B won’t work with. Second, system downtime has to be approved by all the app owners. If the owner of app A only wants downtime 8pm – midnight, and the owner of app B only wants downtime between 2am – 4am, getting system downtime approval becomes a nightmare. Also, any problems will usually be blamed on the other app. App A is slow? Must be App B’s fault!
Installing each app onto its own system solves this issue, but is wasteful. What are the odds that App A needs even 10% of a modern system’s CPU?
Virtualization solves this issue by giving each app its own OS instance. Each OS instance can have different patches installed, can be brought down independently of the others, and provides isolation from the other applications. However, all the OS instances can share the same hardware, leading to efficient hardware usage. If the underlying Host OS needs to be brought down, the VMs can be migrated “live” to another Host system for the duration of the outage, with no downtime required.
Scaling
Most applications can’t take advantage of more than one or two processor cores, or if they do, performance doesn’t scale very well. By running multiple one-CPU VMs on a multi-core server, with a copy of the application running in each VM, the application can take full advantage of a multi-core system. For instance, running Apache httpd within 8 1-CPU VMs on an 8-core host system will provide better performance than running Apache httpd directly on top of an 8-core server.
Hardware Independance & Fault Tolerance
Any VM can run on top of any hardware, as long as the hardware is running the same virtualization software. Most virtualization solutions allow VMs to be moved from host system to host system “live”, with no interruption to the guest OS, as long as the VM’s disk is on shared storage (such as a NAS or SAN). This has serveral advantages:
If the underlying hardware or host OS needs maintenance, VMs can be moved off of the system beforehand, eliminating any service interruption.
When an application outgrows its current hardware, it can be migrated to more powerful hardware without any downtime, much less any reinstallation and data migration pains.
If a host system fails unexpectedly, any other host system can run the VM, making failure recovery much quicker. In fact, some virtualization solutions allow two host systems to run the same VM in lockstep, so if one host system fails unexpectedly, the other can take over with no service interruption.
Security
By controlling a VM’s access to disk, network, and memory resources, virtualization software can help keep VMs secure. Any virus or root kit that modifies the guest OS to hide itself would still be fully visible to the virtualization software. Also, the guest OS could request that certain memory regions or disk resources be made irrevocably read-only on boot, preventing malware from writing to those regions.
Server Virtualization Challenges
Complexity
The number one downside of virtualization is complexity. Complexity always makes things harder to manage, harder to understand, and harder to troubleshoot. A well-designed virtualization infrastructure manages the complexity by imposing standards and procedures, and documenting everything. A poorly-designed virtualization infrastructure quickly becomes very fragile and impossible to manage.
If a VM is running slow, is it the application? The guest OS? The virtualization software? The host OS? The host hardware? Shared disk storage? Did the VM move to a different host server? If your virtualization software can automatically move VMs among host servers, do you even know which host server was running the VM when the issue appeared?
More Things Can Go Wrong
The virtualization software is one more thing that needs to be learned, installed, patched, managed, upgraded, and troubleshot. While it would be nice if the virtualization software never had bugs or glitches, that’s certainly not the case.
More OS Instances to Manage
Each OS instance in a VM is one more OS instance that needs management, such as security patches, anti-virus software, etc. If your current patch strategy is to run Windows Update by hand on each system, virtualization will kill you.
Performance Overhead
Virtualization software imposes a performance penalty, especially for disk and network I/O. Also, because each VM is running a copy of the OS, each running VM imposes memory overhead. Full virtualization (described below) has the highest overhead.
There are several ways to mitigate these issues. Using container-based virtualization (described below) or paravirtualization (also described below) reduces overhead. Also, manufacturers are beginning to release virtualization-aware network and disk controllers that speed up I/O from within VMs. Finally, Intel and AMD have added virtualization-specific CPU instructions in their newer CPUs that reduce virtualization’s performance overhead even with full virtualization.
Cost
Purchasing commercial virtualization software is not cheap. If you go with a free solution, you may save on licensing costs, but will need to spend more time implementing the various management tools you would have gotten with the commercial software. Also, server virtualization requires better OS, application, and performance management tools, which you need to either purchase or implement.
Security
It’s possible that the virtualization software or your configuration has a bug that allows hostile software in a VM to “escape” into the host system. Now, it has full control of all the VMs on that host system. Virtualization software vendors take security seriously, so this is relatively unlikely, but…
Virtualization Types
Full virtualization. In full virtualization, the guest OS is completely unaware that it’s running within a VM. This is the most flexible type of virtualization, as it can run any OS unmodified, but it also has the greatest performance hit because the VM has to fully emulate hardware.
Paravirtualization. In paravirtualization, the guest OS is aware that it’s running within a VM. Instead of talking directly to hardware or protected memory, it will talk to the virtualization software. This eliminates the need for full hardware emulation in the virtualization software, greatly improving performance. The downside is that the selection of guest OS is limited to those that support paravirtualization with your virtualization software.
Containers. A container is closer to a chroot’d tree on steroids than a full VM. The software running within the container can only see the files, memory, and processes within the container; however, the kernel is shared among all the containers. Therefore, all the containers are necessarily running the same OS. Since there is really only one OS running on the whole system, containers have the lowest overhead and best scalability, but they are much more limited in their flexibility.
Virtualization Software
VMware. VMware introduced the first real virtualization solution, and have maintained a significant lead over their competitors since. VMware has a great set of tools to manage VMs, including Lab Manager (managing groups of VMs together), VMmotion (migrating VMs from host to host), and Infrastructure Client (a great view into all the VMs on a set of host servers).
VMware’s biggest downside is cost. Also, they have had several issues with their licensing tools, ranging from the inability to issue a license key for purchased software to updates that caused VMs to not start due to spurious license errors.
The general consensus I’ve heard is that if you can afford VMware, they are the best option for large-scale server virtualization.
Xen. Xen is an open source virtualization solution that most closely competes with VMware. On paper it looks very similar to VMware. In practice the toolset is much less mature, and the product has a lot of rough edges.
KVM. KVM is virtualization software implemented as a Linux kernel module. Because it is fully Linux, many Linux distributations have announced that KVM will be their preferred virtualization solution going forward. Today, it is still a work in progress, and not yet ready for datacenter deployment. KVM supports
OpenVZ. OpenVZ is a container-only solution. If your virtualization needs can be satisfied by containers, OpenVZ is worth considering. For most virtualization needs, though, OpenVZ is not enough.
Hyper-V. Hyper-V is Microsoft’s server virtualization solution. I don’t know much about it, and it was (unsurprisingly) not talked about much at LinuxWorld.