%% %% Default settings %% %deffont "standard" xfont "lucida-medium-r" %deffont "thick" xfont "lucida-bold-r" %deffont "typewriter" xfont "lucidatypewriter" %% %default 1 area 90 90, fore "black", back "white", leftfill, vgap 40 %default 1 font "thick", size 5, vgap 10, prefix " " %default 2 size 2, bar "gray70", vgap 10 %default 3 size 4, fore "black", vgap 40, prefix " ", font "standard" %% %% Default settings that are applied to TAB-indented lines. %% %tab 1 size 4, vgap 30, prefix " ", icon box "blue" 40 %tab 2 size 4, vgap 30, prefix " ", icon arc "purple" 40 %tab 3 size 4, vgap 30, prefix " ", icon delta3 "black" 40 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %area 90 90, fore "white", back "black", leftfill %charset "iso8859-1" %center, font "thick", size 5 %% This image is from: http://homepage.mac.com/sgestrella/europe99/14rodin.html %center, image "FRrodinlepenseur.jpg" Le Penseur %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %area 90 90, fore "black", back "white", leftfill %charset "iso8859-1" %center, font "thick", size 5 Virtual Memory Management, A New Approach for the Hurd on the L4 Microkernel %font "standard", size 5 Neal H. Walfield neal@cs.uml.edu %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Summary Introduction Design Issues Traditional Virtual Memory Management Systems A GNU Approach Conclusion %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Introduction `You are not attending!' said the Mouse to Alice severely. `What are you thinking of?' `I beg your pardon,' said Alice very humbly: `you had got to the fifth bend, I think?' %right -- Alice's Adeventures in Wonderland by Lewis Carol %left %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Introduction Memory Woes Never enough Too slow Too expensive The more there is, the more applications use Solution: Use Virtual Memory Transparent to underlying processes Physical memory becomes a cache "Real" memory lives on slower, less expensive storage Private to each process %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Overloading Physical Memory Two Basic Strategies Swapping Paging Swapping When there is not enough memory, suspend processes Simple, but inefficient Process only needs working set to make progress Can never use more virtual memory than there is physical memory Paging More fine grained Each process only needs its working set in core Extra memory can be used as a data cache Very complicated algorithms Must manage thousands or millions of frames Page eviction Speculative paging Must be very smart: disk accesses are thousands of times slower than memory accesses %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Virtual Memory %center, image "virtual-memory-layout.png" %left Virtual Address Spaces Uniform Transparent Protection domain Pages are either: Active and mapped onto frames (green) Inactive and paged out (yellow) Invalid, i.e. not mapped (red) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Types of Memory Anonymous Memory Non-persistent memory Allocated by malloc Used for private copies of file data, etc. Can be shared between processes Changes by one process will be seen by others Paged to swap Mapped Memory Memory associated with a region on a disk, in a file, etc. Read-only access can be shared Paged to the associated backing store Copy On Write Private memory which is logically shared (i.e. sharing read/write access) Reduces copying Maps (e.g. bss) can be shared between two processes, copy is made on demand Fork makes a copy of the entire process, however, most memory is never actually touched %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Designing a Virtual Memory Manager Four Mitigating Problems Who gets pages and how many Page table layout and management Which pages to evict and when Where and how to store inactive pages Interesting Systems Unix VMS Mach L4 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Traditional Unix Virtual Memory Management Page Allocation Handed out by kernel Easy to enforce global resource limits Very simple Low Level Management Completely inside the kernel Used by nearly all subsystems Uses many subsystems Boundaries are often unclear; no formal protocols Eviction Policy Does not consult user-space applications Applies a global policy, typically an approximation of LRU, least recently used Very inefficient for many classes of applications Databases Garbage collectors Paging Kernel sends pages to the appropriate backing store No trust issues: all functionality is in the kernel %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page VMS Page Allocation Each process is allocated a static number of pages at startup Kernel never allocates more pages than there are frames If there are not enough frames, a process is temporarily swapped out Low Level Management is done within the kernel; similar to Unix Eviction Policy All applications are completely self-paged If an application is at its frame quota and tries to allocate a page, the kernel first asks for a page back Pages are not necessarily flushed immediately If a page is required, the inactive list is checked Applications can "go over" their allocation limit even if there is extra memory in the system Method is very efficient Applications always know what is best for themselves No global policy Paging is similar to Unix; there are no trust issues %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Mach Page Allocation Pages are given out freely No mechanism to impose resource limitations Management All management is done within the kernel Similar to Unix Eviction Policy is similar to Unix' single global policy %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Mach (continued) Paging External paging interface File systems are implemented as user space tasks Kernel must gracefully handle rogue servers Establishing Maps %center, image "mach-map.png" %left (1) Client sends an open rpc to a server (2) Server returns a handle (known to the kernel) for the file (3) Client sends a vm_map rpc to the kernel (4) Server sends a memory_object_init to the server (5) When the server is ready, it replies with memory_object_init (6) Mach sets up the client's address space and returns %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Mach (continued) Handling Page Faults %center, image "mach-fault.png" %left (1) Client faults; control is transfered to the kernel (2) Kernel sends a memory_object_data_request to the server (3) Server looks up the page and returns it to Mach (4) Page is installed and the client thread resumed If the server does not respond, the thread blocks forever %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Mach (continued) Page Outs When there is memory pressure, Mach chooses pages to evict Sends a memory_object_data_return rpc to the server Changes the pages' manager to the default pager Manager must free the pages Mach may double page if the manager does not free the memory in a timely fashion %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page L4 Page Allocation Hands all memory to a server known as sigma0 Sigma0 gives out memory to anyone who asks and marks it as allocated Typically, the user-space VMM requests all of the memory at boot time Management Provides a map and unmap interface Real VMM is built on top Paging Each thread in L4 has a (user chosen) pager thread When a page fault occurs, the kernel sends an RPC to the thread's pager Pager must find the page and install it Eviction Policy: chosen and enforced by the user-space VMM L4 is very flexible; any of the aforementioned policies could be implemented on top of it %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Goals for a GNU VMM Less centrality Mach provides too much functionality in the kernel A single large VMM would not be better Should be stackable External pagers Mandatory: the Hurd relies heavily on this feature for translators Eviction Policy It must be more flexible Tasks should be completely self-paged Resource limits Must design an interface Cannot be done centrally Servers render services for clients Servers cannot be trusted by a central resource server Clients should supply required resources to servers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Proposed Properties Page Allocation Tasks negotiate contracts for guaranteed physical pages Total number of guaranteed pages will never exceed the number of frames Tasks may ask for extra pages VMM provides simple interfaces Memory interface for allocating memory Container interface for moving memory between processes Can be implemented by other servers for proxy purposes Management L4 is a cache of active mappings The memory server provides handles to physical pages Tasks must maintain all of their mappings Normally part of a library Easily overridden %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Proposed Properties (continued) Paging Tasks write pages to their corresponding server (e.g. a file system or swap) when cleaning Must respect the server's policy Eviction Policy Memory server only asks for extra pages Task must select which pages it wants to evict If it does not do so in a timely fashion it may be killed %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Guaranteed and Extra Pages Guaranteed Pages Negotiated at task creation time Medium term contract Renegotiated occasionally Total number of guaranteed pages will exceed total frames Extra Page Page allocation is done only when actually needed Tasks may be given extra pages if there is no memory pressure Extra pages must be freed on short notice If the task fails to do this, it may be terminated %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Resource Limits Multiserver Philosophy Inherently distributed Problems tracking resources on a global basis Traditional model does not work Server acts on behalf of its clients Easy to have priority inversion %center, image "priority-inversion.png" %left (1) A low-priority client makes a request (2) The high priority server requests memory on its behalf Medium priority makes a request at the same time High priority task wins Must embrace a model where clients provide the required resources %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Memory Interface memory_allocate (number of pages, flags) => array of handle ids memory_deallocate (array of handle ids) Allocate pages with certain attributes as described by flags. Contiguous pages Certain address range May fail if the task does not have enough guaranteed pages memory_map (array of handle ids) => map item Requests a map item that can be installed into the task's virtual address space Unmapping is done using L4's interface memory_release_data (array of handle ids) => map item If the pages are marked copy on write, the reference is dropped Used to reuse pages avoiding a superfluous copy (from logical sharing) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Moving Memory Around Containers Used to transfer memory Have versions Incremented after most operations Permits servers to determine if data was compromised container_create (num pages, time, task) => handle container_create_from (array of pages, time, task) => handle container_deallocate (handle) => version Places a number of physical pages into a container Specifies the maximum locking time Gives a task access to it Sets version id to 0 container_grant_access (handle, task) => version Gives access of a container to another task Returns the new version %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Locking Containers Containers need to be locked during certain operations, for example device I/O container_lock_map (handle, unmap) => map item container_lock_physical (handle, unmap) => array of physical addresses container_unlock (handle) Locks a container Protects the physical memory from being unmapping or deallocated If unmap is true, all maps are unmapped and noone else may map the item Returns a map item or the physical addresses Increments the version id container_unmap_all (handle) => version id Unmaps all pages in container from all other tasks Increments the version id and returns it %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Accessing the Data in Containers container_copy_in (array of ids, handle, private) => version container_copy_out (handle, array of ids, private) => version container_map (handle) => map item Moves data into or out of a container Uses a logical copy if possible Copy is private or shared as a function of the private argument Version id is incremented %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Reading from a File %center, image "l4-read.png" %left (1) container_create (x) => A The client creates a container with an appropriate number of pages The memory server returns a container handle (2) fs_read (file, offset, length, A) Client sends a read request to the server and passes the container (3) container_grant_access (A, Device driver) => version The server shares the container with the device driver Saves the version (4) calls device_read (A, Block list) Asks the device driver for the blocks %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Reading from a File (continued) %center, image "l4-read.png" %left (5) container_lock_physical (handle, unmap all) => physical addresses Device driver locks the container Does not allow others to map the data (6) Device driver does DMA (7) Unlocks the container and returns the number of operations on it %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Reading from a File (continued) %center, image "l4-read.png" %left (8) container_copy_out (handle, page cache, private) => version Makes a private logical copy of the data Saves in page cache Detecting corruption Server has new and old version ids Compares to expected number of operations If bad, the data is discarded and the operation fails (9) Returns success to the client (10) The client copies the data and deallocates the container Writing to a file is quite similar. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Conclusion Policy can be implemented outside of the physical memory server Interfaces are very simple yet flexible Resource tracking is possible Much better page eviction schemes Faster Specialized policies can be written Can choose policies at execution time Should be much more scalable %pause Code needs more testing %pause Code needs implementing %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Thanks %font "thick", center Thanks for having come! %font "standard", left I am very interested in discussing any ideas that you might have. Come visit in TD9 or email me at neal@cs.uml.edu.