-
Notifications
You must be signed in to change notification settings - Fork 2
09. Intro to Virtual Memory
Starting from this chapter, we will implement a significant part of our kernel - memory management. One of the major reponsibilities of an OS kernel is to provide memory abstraction for user programs. Such abstraction should also guarantee at least the following two properties:
- Access control: allow us to describe specific permissions on different regions of the memory space
- Isolation: memory operations issued by one process should not directly interfere with other processes (and the kernel, obviously)
Access control and isolation together ensure that, for example, an out-of-bound array access in a buggy browser won't crash your pdf reader or even the OS itself. The key to achieve such isolation is virtualization (virtual memory). This is not trivial. This chapter involves relatively intense explanation of the basic theory behind, instead of diving directly into the code.
This chapter is a just recap of some of the basic OS concepts.
The goal of access control + isolation implies the split between logical memory address space and actual physical memory. We provide each user program the illusion that it possesses an idealized, contiguous, large piece of memory only belonging to itself - called its virtual memory space. OS kernel then implements a way to map those actually occupied memory regions of all programs onto physical memory (and translate between them).
Typically, virtual memory mapping is implemented through one of the two approaches:
- Segmentation
- Paging
You may have noticed that segmentation is one possible way of doing this. An example where two user programs are running concurrently, each using 150 bytes of memory, on a machine with 550 bytes of physical memory (thanks Philipp for the nice figures!):
The two programs' logical 150 bytes get mapped to two different segments on physical memory. If another program who wants 150 bytes of memory is launching up (which does not exceed the total remaining available memory), there is no place to insert a third contiguous 150 bytes segment. The spaces in between segments are called (external) fragments.
Processes may come and go in high frequency, and their memory patterns evolve along their timespan. With contiguous segmentation, we cannot avoid fragments. The kernel would have to implement compensate techniques such as kernel-level garbage collection (GC) to put these memory segments together and make room for other processes. GC normally requires pause of execution and should be run periodically, resulting in lower performance and responsiveness from time to time.
Paging is a commonly-used approach of memory virtualization with finer granularity, hence reduces external fragments. We divide memory spaces into pages, i.e., 4 KiB chunks. Both the physical memory and user processes' virtual memory spaces are divided in this way. Conventionally, we call a physical 4 KiB chunk of memory a frame, and we call a virtual 4 KiB chunk of memory a page ✭.
For each user process, we keep a page table somewhere in kernel memory to store user logical address - physical memory address mapping information. (Thanks Philipp again for the nice figures!)
Though paging solves the problem of external fragmentation, it actually makes internal fragmentation worse - user processes that do not need a full 4 KiB slot of memory have to occupy a whole 4 KiB slot.
Nevertheless, having internal fragments are still better than having external fragments, as they do not require a dedicated cleaner and are more predictable (half a page per allocation in average).
On x86, the physical address of the current active process's page table is stored in control register CR3
. Our OS is responsible to load and update this register correctly. Page table format is specified by hardware architectural-level specification. Translation is done entirely by the CPU in hardware.
Memory operations happen very frequently, so page table translation process needs to be blazingly fast. It would be rather inappropriate to implement a page table as a linked list or similar dynamic data structures - querying a dynamic data structure is too slow. We have to implement a page table as a "plain table". The processor can thus directly lookup the i
-th entry in the table to get the mapping info of the i
-th page.
A user process can be granted a very large virtual memory address space and it is probably very sparsely mapped. Using a single-level page table wastes a lot of space, since all free pages occupy an entry.
To alleviate this problem, we introduce multi-level paging. The basic idea is that we first divide the virtual memory space into larger memory regions consisting of many pages each. The top-level page table maps memory regions to handles of lower-level page-tables. The bottom-level page tables eventually map pages to physical frames. Translating a page now takes slightly longer (as we need to do multiple lookups downstream), but it greatly saves space.
By convention, we denote the lowest-level page tables as level 1 (L1). See this figure by Philipp as a two-level paging example:
In this example, memory regions starting from 10 000
upto 1 000 000
are completely free (all pages within them are free). Therefore, no L1 page tables are created for them.
We can expand such design to three-level or four-level page tables. With more levels, we save more space, but translation also takes longer. This is a typical tradeoff in OS design.
Multi-level paging makes address translation slower. To boost performance of the memory management unit, we can use a translation lookaside buffer (TLB). Think of it as a cache of recently looked-up pages.
On x86
, TLB is also hardware-specific and the CPU will implicitly interact with it when using paging as the memory management scheme. TLB is yet not fully transparent - Our OS must manually update it whenever updating a page table. This can be done by either:
-
invlpg
instruction to invalidate a page - Reloading the
CR3
register to simulate a complete address space switch
This chapter is purely theoretical so we are not doing any coding here. We will soon start concretizing these theories in the following chapters!
Current repo structure:
├── Makefile
├── scripts
│ ├── gdb_init
│ ├── grub.cfg
│ └── kernel.ld
├── src
│ ├── boot
│ │ ├── boot.s
│ │ ├── elf.h
│ │ └── multiboot.h
│ ├── common
│ │ ├── debug.c
│ │ ├── debug.h
│ │ ├── port.c
│ │ ├── port.h
│ │ ├── printf.c
│ │ ├── printf.h
│ │ ├── string.c
│ │ ├── string.h
│ │ ├── types.c
│ │ └── types.h
│ ├── device
│ │ ├── keyboard.c
│ │ ├── keyboard.h
│ │ ├── timer.c
│ │ └── timer.h
│ ├── display
│ │ ├── terminal.c
│ │ ├── terminal.h
│ │ └── vga.h
│ ├── gdt
│ │ ├── gdt-load.s
│ │ ├── gdt.c
│ │ └── gdt.h
│ ├── interrupt
│ │ ├── idt-load.s
│ │ ├── idt.c
│ │ ├── idt.h
│ │ ├── isr-stub.s
│ │ ├── isr.c
│ │ └── isr.h
│ └── kernel.c
Guanzhou Jose Hu @ 2021