Case Study: VMware

The Early History of VMware

Based on work at Stanford in the late 1990s.
The founders realized that rather than solving existing problems in large, complex, operating systems, one could innovate in a layer below the OS.
1998:

VMware Workstation for Linux
Ran on top of Linux.
VMware Workstation for Windows
Ran on top of Windows.

2001: ESX Server
- Aimed at the server consolidation market.
- Prior practice: buy a server for each email, web, DB server application.
- Machines were often at 10% capacity!
2002: Virtual Center / vSphere:
Manage 1000s of virtual machines from one application.
VMotion:
Live migrate servers.

Live migrating a server from Raleigh to Amsterdam.

VMware Workstation

The first virtualization product for 32-bit x86.
The "WinTel" platform was very different from vertically integrated mainframes:
1. Intel and AMD build the chips.
2. Microsoft (Windows) and open source (Linux) provide the OS.
3. A third group of companies build peripherals.
4. A fourth group of integrators (Dell, HP) build systems for retail sale.
What's more, there was no hardware support for virtualization.
So VMware had to use existing techniques of virtualization, borrow techniques from other areas, and invent some.

Video: What Is VMware Workstation?

A video on VMware Workstation

More on VMware Workstation

Challenges in Bringing Virtualization to the x86

Hypervisors add a level of indirection to the domain of computer hardware. They provide the abstraction of a virtual machine: each one thinks if is "king of the hill," and has a whole machine to itself. Ideally, the VMs should be just like the emulated machine, as fast as the emulated machine, and completely isolated from each other.
VMware had these goals (general to most virtualization):

Compatibility:
Any x86 OS, and all of its applications, should be able to run on without modifications on the VM.
Performance:
The overhead of the hypervisor had to be low enough the users could use a VM as their primary machine.
Ideally, things run as fast as on a native OS, but at least as fast as the previous chip generation.
Isolation:
The hypervisor had to ensure complete isolation of each VM, i.e., be completely in charge of the real physical resources. A VM might be infected with malicious code: this will not impact any other VM.

There was tension between the requirements. E.g., total compatibility might need to be sacrificed for performance. But the designers held isolation as paramount.
The primary challenges were:

The x86 architecture did not support virtualization.
( Popek and Goldberg requirements for virtualization )
Example: POPF (pop flags) would fail silently in user mode.
The x86 architecture was of daunting complexity.
Decades of "cruft" built up due to backwards compatibility goal. Four modes: real, protected, v8086, and system management.
x86 operating modes.
x86 machines had diverse peripherals.
The need for a simple user experience.
The users would be doing the installs themselves, not (e.g.) an IBM technician.

VMware Workstation: Solution Overview

Virtualizing the x86 Architecture

VMM (Virtual Machine Monitor): runs the actual virtual machine.
VMX: interacts with host OS.

Possible approaches:

Trap-and-emulate

Rely on hardware support for virtualization to trap-and-emulte privileged instructions.
Not available on x86 until 2005.
Trap
Dynamic binary translation: the VMM emulates all instructions.
Problem: too slow for most uses. (5x)

The solution:

Trap-and-emulate can be used when user programs are running.
In other cases, resort to binary translation.
Run an algorithm to decide which to do.
This doesn't need to examine code, just registers!

Binary translation must be used if:

The virtual machine is running in kernel mode (ring 0).
The virtual machine can disable interrupts and issue I/O instructions.
The virtual machine is running in real mode, a legacy 16-bit mode used by BIOS.

VMware can speed up binary translation to near-native speeds because it sets the hardware to run the code instead of translating it in software.
Runs at 80% of native speed, instead of 20%.

High-level components of the VMware virtual machine monitor.

A Guest Operating System Centric Strategy

Ideally, we want the hypervisor to emulate the hardware so successfully that any OS that runs on that hardware will run on the hypervisor.
With the x86 family, this was not possible: no hardware support for virtualization, too complex.
So the VMware engineers focused on just a few, like Linux, Windows 3.1, 95/98 and NT. (But Minix ran as well, by accident.)
Only OS/2 ever used x86 rings 1 and 2, so VMware would just shut down the VM if it tried to enter those rings.

The Virtual Hardware Platform

Two layers:

Software model that "looks like" the device to the guest OS.
A back-end that communicates with the host OS.

Example: the "Lance" 10-Mbps ethernet card. VMware "supported" this card long after the real thing was off the market, and eventually could run 10x faster.
The actual hardware did not have to be what the guest OS thought was there! It just talked to the VMware drivers, and they could be coupled with different back-ends.

Virtual hardware configuration of VMware workstation in 2000.

The Role of the Host Operating System

By creating a type 2 hypervisor, VMware could be installed like a normal program.
It could use the host's drivers to handle the problem of multiple peripherals.

But VMware needed to do fancy things an ordinary application could not.
And many of those things an ordinary kernel-level device driver shouldn't do either.

So, create three components:

VMX: a user-space program the user interacts with: one per VM.
VMX driver: A small kernel-mode device driver that can suspend the host OS for the...
VMM: multiplexes the CPU and memory; contains trap-and-emulate, device drivers, shadow paging module, binary translator.
Runs in kernel mode, but not "in" the host OS.

VMX runs as an OS process. But the VMM is a peer. The VMX suspends the host OS and gives the VMM full control of the machine. This is a world switch.
The VMM and the host OS have entirely different address spaces.
Although earlier described as very time consuming, here the book says the world switch only takes 45 instructions!

The difference between a normal context switch and a world switch.

The Evolution of the VMware Workstation

The VMM / hhost OS architecture remains the same.
But today, VMware Workstation can rely on:

Trap-and-emulate all the time
Nested hardware page tables instead of the shadow page table

Using VMware Workstation

ESX Server: VMware's type 1 Hypervisor

Not having a host OS to rely upon means ESX has more work to do than VMware Workstation. But in a situation where IT organizations are trying to run 1000s of virtual machines, a type 1 hypervisor makes sense: it will run significantly faster.

The CPU scheduler ensures that each virtual machine gets a fair share of the CPU: no starvation.
Scalability: VMs run efficiently even when they need more memory than is actually available.
Ballooning and transparent page sharing introduced.
An optimized I/O subsystem: device drivers run directly within the ESX hypervisor, with no world switch required.
ESX uses a file system (VMFS) optimized to store virtual machine images. A single ESX Server can issue over 1 million disk operations per second.
The workstations were aimed at developers: one could experiment with new OS releases inside a VM.
ESX Server made it easy to implement new capabilities.
VMotion: live migrate a VM from one box running ESX Server to another. This required the coordination of the memory manager, the CPU scheduler, and the networking stack.

Video on Building a VMware Home Lab

VMware home lab

Quiz

A key attribute of an ideal virtual machine would be

it runs as fast as the real machine
it runs just like the real machine
it is completely isolated from other VMs
all of the above.

Live migrating a VM between physical devices requires coordination of

the file system and the stack pointer
the CPU scheduler the memory manager and the network stack
the RAID array
the number of applications running on each VM.

A motivation for VMware was

the desire to copy IBM
the need for a research grant
the fact that no one had ever created a VM before
the difficulty in innovating in complex modern operatings systems.

One factor making VMs easier to implement on mainframes than on PCs was

vertical integration in the mainframe world
Microsoft's opposition to virtualization
the complexity of mainframe design
the lack of UNIX versions on PCs

Another factor making virtualization difficult on the WinTel platform was

the overly simple chip architecture
the amazing diversity of peripherals
the competition from IBM
all of the above

A virtualization approach called "trap-and-emulate" involves

a switch to the hypervisor when certain instructions are executed by the VM
faking the guest into "thinking" is has really executed certain instructions
allowing most instructions to run directly on the hardware
all of the above

VMware must use binary translation to handle

graphics programs
playing video
executing privileged instructions
floating point mathematics

VMware manages to interact with the host OS by

creating a kernel-mode device driver
relying on a re-written version of the host OS
using binary translation
using direct execution

Ballooning consists in

pumping up each virtual machine to believe it is in charge of the hardware
creating a process inside a virtual machine that can reclaim memory for the hypervisor

On an x86 machine privileged instructions

make the user who runs them the superuser
are ignored by virtual machines
can only be used by the rich
can only be executed in kernel mode

Answers

1. d; 2. b; 3. d; 4. a; 5. b; 6. d; 7. c; 8. a; 9. b; 10. d;

Credits

"VMware in the cloud" graphic by Hany R. Michael.