The problem is that each guest OS thinks it is "president
of the university," and can map virtual pages to real pages
as it wishes to.
But it isn't president: the hypervisor is. And some other
guest OS may be using those physical pages already.
Solution: a shadow page table.
The hypervisor traps on the sensitive instruction that
loads a hardware register to point to the page tables.
(On a type 1 hypervisor with VT.) It maps the guest OS
virtual pages to their real hardware addresses.
But the hypervisor will need to update the shadow page
table every time the guest OS page tables update. How?
The guest OS just writes to memory to update these: no
sensitive instruction is used.
Page Faults
Two possible ways to handle this:
1) Make the guest OS page tables read-only,
so any attempt to access them creates a page fault.
2) Let the guest OS modify its page tables at will:
then attempts to access those pages will create page
faults.
Both methods will create lots of... page faults!
And page faults are expensive.
We distinguish two types of page faults:
1) guest-induced page faults, which involve
pages actually swapped out of RAM; and
2) hypervisor-induced page faults, which occur
in order to keep the shadow page table up to date.
Page faults are extra expensive in virtualized
environments, because they lead to VM exits.
In a VM exit, the hypervisor regains control. This
involves saving and restoring lots of state, and may
tens of thousands of cycles.
With paravirtualization, the situation is different.
The guest OS knows it is virtualized, and knows to
notify the hypervisor when its page tables have been
updated.
Hardware Support for Nested Page Tables
There is now hardware support for nested page
tables (AMD) or Extended Page Tables
(Intel).
With
virtual memory, the OS is already mapping
between virtual pages and physical pages. Nested page
tables simply extend that scheme so that we can have
several layers of mapping.
We now have guest virtual addresses, guest
physical addresses, and host physical
addresses.
No need to maintain shadow page tables.
No need for VM exits.
Switching virtual machines changes the mapping the same
way an OS does when switching processes.
Reclaiming Memory
The hypervisor may need to reclaim memory at times.
Why?
Overcommitment is the allocation of more virtual
memory to VMs than there is actual physical memory,
e.g., on a 32GB machine, running three 16GB VMs.
Deduplication is when certain pages are shared
between VMs, e.g., the Linux kernel.
The hypervisor can't really page out guest pages,
because it has no clue which ones should be kept in
memory.
Quiz
For each virtual machine a hypervisor needs to create
lottery scheduling
clock driven interrupts
hypervisor-induced page faults
a shadow page table
The problem with shadow page tables is
they create lots of page faults
the hypervisor itself doesn't know where pages really are in memory
they suck up too much RAM
they are illegal on Intel CPUs
A hypervisor-induced page fault occurs when
a page has been swapped to disk
the shadow page table must be updated
the hypervisor tries to access a page not in memory
all of the above
A VM exit is when
the virtual machine crashes
the hypervisor crashes
the virtual machine is done working
control returns to the hypervisor
Deduplication is when
some resources (like the Linux kernel) are shared between VMs