%% %% Default settings %% %deffont "standard" xfont "lucida-medium-r" %deffont "thick" xfont "lucida-bold-r" %deffont "typewriter" xfont "lucidatypewriter" %% %default 1 area 90 90, fore "black", back "white", leftfill, vgap 40 %default 1 font "thick", size 5, vgap 10, prefix " " %default 2 size 2, bar "gray70", vgap 10 %default 3 size 4, fore "black", vgap 40, prefix " ", font "standard" %% %% Default settings that are applied to TAB-indented lines. %% %tab 1 size 4, vgap 30, prefix " ", icon box "blue" 30 %tab 2 size 4, vgap 30, prefix " ", icon arc "purple" 30 %tab 3 size 4, vgap 30, prefix " ", icon delta3 "black" 30 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %area 90 90, fore "black", back "white", leftfill %charset "iso8859-1" %center, font "thick", size 5 Better Best Effort: From Plato to Pope %font "standard", size 5 Neal H. Walfield neal@cs.uml.edu %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Summary Introduction Design Issues Proposed Interfaces Implementation Issues Conclusion %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %area 90 90, leftfill, vgap 40 %charset "iso8859-1" Argue for your limitations and sure enough, they're yours. -- Bach %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Virtual Memory Management "Traditional" Goals Protection boundaries Private memory Shared memory Virtual Memory Uniform address space Paging and swapping Mappings %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Paging Performance Critical Applications always want more memory than available Disk is thousands of time slower than memory Smart decisions yield dramatic results Monolithic Kernel All mechanism and policy is centralized in the kernel VMM has global knowledge Intimate with caches, file systems, etc. Resource limits Multiserver OS External paging File systems et al in user space Servers provide services on behalf of clients Servers do not trust each other %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page VMM Decisions Problems Who gets pages and how many Page table layout and management Which pages to evict and when Where and how to store inactive pages Interesting Systems Unix VMS Nemesis Mach %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Unix Virtual Memory Management Memory Allocation Handed out by kernel Easy to enforce global resource limits Low Level Management Completely inside the kernel Used by nearly all subsystems Uses many subsystems Boundaries are often unclear; no formal protocols Paging Policy Does not consult user-space applications Applies a global policy, typically an approximation of LRU, least recently used Very inefficient for many classes of applications Databases Garbage collectors Media applications Paging Kernel sends pages to the appropriate backing store No trust issues: all functionality and decisions in the kernel %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page L4 Memory Allocation Hands all memory to a server known as sigma0 Sigma0 gives out memory to anyone who asks and marks it as allocated Typically, the user-space VMM requests all of the memory at boot time Low Level Management Provides a map and unmap interface Real VMM is built on top Paging Policy Chosen and enforced by the user-space VMM Paging Each thread in L4 has a (user chosen) pager thread When a page fault occurs, the kernel sends an RPC to the thread's pager Pager must find the page and install it %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page L4 Maps Task B has an fpage backed by Task A Task B maps the fpage to Task C %area 50 30 51 14 %center, image "map1.png" %center, image "map2.png" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page L4 Grants Task B has an fpage backed by Task A Task B grants the fpage to Task C %area 50 30 51 14 %center, image "map1.png" %center, image "map3.png" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Goals for a GNU VMM Less centrality Mach provides too much functionality in the kernel A single large VMM would not be better Should be stackable External pagers Mandatory: the Hurd relies heavily on this feature for translators Eviction Policy It must be more flexible Tasks should be completely self-paged Resource limits Must design an interface Cannot be done centrally Servers render services for clients Servers are not necessarily trusted by the system Servers cannot be trusted by a central resource server Clients should supply required resources to servers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Resource Limits in Depth Multiserver Philosophy Inherently distributed Problems tracking resources on a global basis Traditional model does not work Server acts on behalf of its clients Easy to have priority inversion %center, image "priority-inversion.png" %left (1) A low-priority client makes a request (2) The high priority server requests memory on its behalf Medium priority makes a request at the same time High priority task wins Must embrace a model where clients provide the required resources %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Proposed Properties Page Allocation Tasks request memory from VMM VMM dialogs with Policy server Large TLB reach Subblocking VMM provides simple interfaces Memory interface for allocating memory Use handle ids; not physical addresses Container interface for moving memory between processes Can be implemented by other servers for proxy purposes %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Proposed Properties (continued) Management L4 is a cache of active mappings The memory server provides handles to physical pages Tasks must maintain all of their mappings Normally part of a library Easily overridden Paging Tasks send pages to the corresponding server File system server VMM if it should be swapped Must respect the server's policy Eviction Policy Memory server only asks for extra pages Task must select which pages it wants to evict No fall back (e.g. a default pager) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Memory Policy Policy Server Has knowledge of the Hurd interfaces Tasks negotiate contracts for guaranteed physical pages Total number of guaranteed pages will never exceed the number of frames Tasks may ask for extra pages Guaranteed Pages Negotiated at task creation time May be a function of uids Medium term contracts Renegotiated occasionally Total number of guaranteed pages will never exceed total frames Extra Page Page allocation is done only when actually needed Tasks may be given extra pages if there is no memory pressure Extra pages must be freed on short notice If the task fails to do this, VMM may revoke all memory %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Memory Interface memory_allocate (number of pages, size, flags) => array of handle ids memory_deallocate (array of handle ids) Allocate pages with certain attributes as described by flags. Contiguous pages Certain physical address range May fail if the task does not have enough guaranteed pages memory_map (array of handle ids) => map item Requests a map item that can be installed into the task's virtual address space Unmapping is done using L4's interface memory_release_data (array of handle ids) If the pages are marked copy on write, the reference is dropped Used to reuse pages avoiding a superfluous copy (from logical sharing) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Handle IDs Why not physical addresses? Want to move pages to different frames Construct super pages DMA %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Moving Memory Around Containers Used to transfer memory Versioned Version incremented after most operations Permits servers to determine if data was compromised container_create (num pages) => handle container_create_from (array of ) => handle container_deallocate (handle) => version Places a number of physical pages into a container Sets version id to 0 container_grant_access (handle, time, task) => version container_revoke_access (handle, task) => version Gives access of a container to another task Specifies the maximum locking time Returns the new version %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Locking Containers Containers need to be locked during certain operations, for example device I/O container_lock_map (handle, unmap) => map item container_lock_physical (handle, unmap) => array of physical addresses container_unlock (handle) Locks a container Protects the physical memory from being unmapping or deallocated If unmap is true, all maps are unmapped and noone else may map the item Returns a map item or the physical addresses Increments the version id container_unmap_all (handle) => version id Unmaps all pages in container from all other tasks Increments the version id and returns it %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Accessing the Data in Containers container_copy_in (array of ids, handle, private) => version container_copy_out (handle, array of ids, private) => version container_map (handle) => map item Moves data into or out of a container Uses a logical copy if possible Copy is private or shared as a function of the private argument Version id is incremented %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Implementing mmap() Calling mmap Functionality in process local VMM Verifies state of file descriptor Establishes memory range and driver On Page Fault Pager thread finds the range Contacts the correct server Installs memory Anonymous Memory Done locally Faults Reuse pages Zeroing is done locally sbrk() and Stacks Nothing special Up to local VMM range drivers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page MAP_COPY Consider Client maps a huge region Cannot provided all potential resources up front Too much memory Normal case, memory is not used Up calls %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Reading from a File %center, image "l4-read.png" %left (1) container_create (x) => A The client creates a container with an appropriate number of pages The memory server returns a container handle (2) fs_read (file, offset, length, A) Client sends a read request to the server and passes the container (3) container_grant_access (A, Device driver) => version The server shares the container with the device driver Saves the version (4) calls device_read (A, Block list) Asks the device driver for the blocks %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Reading from a File (continued) %center, image "l4-read.png" %left (5) container_lock_physical (handle, unmap all) => physical addresses Device driver locks the container Does not allow others to map the data (6) Device driver does DMA (7) Unlocks the container and returns the number of operations on it %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Reading from a File (continued) %center, image "l4-read.png" %left (8) container_copy_out (handle, page cache, private) => version Makes a private logical copy of the data Saves in page cache Detecting corruption Server has new and old version ids Compares to expected number of operations If bad, the data is discarded and the operation fails (9) Returns success to the client (10) The client copies the data and deallocates the container Writing to a file is quite similar. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Swap Client Interface in the VMM Assume clients send pages to swap directly Must manage limited swap space Possible DoS Has no knowledge of logical copies Difficult to do caching VMM is smarter Swap server can still be a separate server Policy remains external in policy server Swapping Pages VMM keeps the pages on an inactive list Permits coalescing of logical copies %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Getting More Information From User Space Glibc Extensions Interesting Functions malloc mmap fopen Add Attributes Not portable! Can be done via simple macros and autoconf %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Conclusion Policy can be implemented outside of the physical memory server Interfaces are very simple yet flexible L4 VMM interface Resource tracking is possible Much better page eviction schemes Faster Specialized policies can be written Can choose policies at execution time May be more scalable %pause Code needs more testing %pause Code needs implementing %pause Licensing issues Pistachio should be released January 2003 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Thanks %font "thick", center Thanks for the invitation Thanks for having come!