-
Notifications
You must be signed in to change notification settings - Fork 2
19. File System Interface
The block device driver layer grants us the ability to send fixed-sized I/O requests to the disk with physical disk-space addresses. With the experience from virtualizing the memory and virtualizing the CPU, you should have noticed that merely exposing block-level interfaces to user programs is not enough. We would like to provide a more intuitive interface of files for users to name theirs groups of data and hide physical addresses away for most of the time.
A file system (FS) is a part of an operating system that provides a set of file-related APIs and translates logical file operations into storage device requests. We will talk about the internals of a file system in the next chapter, which involves a lot of interesting trade-offs. In this chapter, let's first discuss what the basic files interface in Hux looks like.
Scan through them before going forth:
- Files & Directories chapter of the OSTEP book: logical abstraction of files
- Memory Hierarchy on wikipedia
In CPU time-sharing, we introduced the abstraction of processes. In the virtual memory system, we introduced the abstraction of address spaces. Here in the storage stack, we introduce the abstraction of files & directories. We will mostly follow the classic UNIX definition.
A file is a group of logically contiguous blocks of persistent data with a name. To be more specific, a file is often viewed as the combination of two parts ✭:
- Metadata: name of the file (typically a low-level identifier, e.g., UNIX inode number), its indexing structures mapping logical file offsets to actual block addresses, permission information, etc.;
- Data: actual blocks of data - the file content.
A directory is a special type of file, whose content being a list of directory entries. Typically, a directory entry is a pair of (human-readable string filename, the file's low-level identifier). Having directories allows us to have a flexible tree-like organization of files. (Figure from the OSTEP book.)
User process opens a file by giving its string name and walking through the directory tree for its low-level identifier. Given the identifier, the file system should be able to locate the file's metadata structure (e.g., UNIX inode). Then, the process reads/writes the open file at given offsets. It is the file system's job to serve these (possibly concurrent) requests in a safe and efficient manner.
Filenames in Hux are restricted to only displayable ASCII characters except some special symbols: /
, \
,
(space), and some more. Maximum length of a filename is 32 bytes.
By convention, we specify a certain file under a directory tree by giving a pathname - filenames concatenated with /
(slash) symbols.
- A single
/
represents the root directory of the file system; -
/A/B/C
is an example of an absolute path, which meansC
underB
underA
under root; -
A/B/C
is an example of a relative path, which meansC
underB
underA
, where the nameA
is findable at the current path; -
.
is a special shortcut symbol meaning the current path; -
..
is a special combination meaning the upper-level (parent) path of the current path.
Hux will provide the following list of the most basic file APIs through syscalls ✭:
- Open: takes in a string name and returns an open file handle (throug a file descriptor integer)
- Close: closes an open file handle
- Read: reads contiguous bytes of data from a file into a memory buffer, implicitly modifies the file offset
- Write: writes contiguous bytes of data from a memory buffer into a file, implicitly modifies the file offset
- Create: creates a file or directory under a directory
- Remove: removes a certain file or directory
- Chdir: changes the current working directory of caller process
- Getcwd: get the absolute path string of current working directory
- Stat: get information about a file, e.g., size, type, access time, etc.
There are other file operations typically supported in UNIX systems:
- Rename: rename a file from one string name to the other, possibly changing directory location
- Dup: duplicate a new file descriptor that refers to the same open file structure as an old one
- Seek: jumping to given offsets in a file, so that reads/writes could be non-sequential
-
Hard/Soft (symbolic) linking: creating directory entries that ultimately refer to the same inode of a file (hence the exact same file); under the possibility of having links, removing a file in UNIX is done through an
unlink()
syscall, which decrements the link count of the inode and, only when it goes to zero, actually deletes the file - Permission bits & access control
- Mounting a file system at a mount point somewhere in a directory tree, interpreting that path as the root path of the mounted file system
Please see this chapter of the OSTEP book and the Linux man pages for more information.
You may have heard that in UNIX-flavor systems, everything is file. UNIX systems have a much more general file interface than the one described above. Besides normal files & directories, almost any resource of the system are exposed as files of special types (kinda like "fake" files). You could access a raw block device through /dev/xxxx
. You could check for run-time system information with sysfs under /proc/yyyy
. You could create volatile log files at /var/zzzz
. You could create pipes for flexible inter-process communication, etc.
For normal files, there could also be multiple file system implementations mounted at different mount points under the directory tree. All of these files share the same virtual file system (VFS) layer interface, so that users access them through a unified set of syscalls.
Though Hux does not strictly follow this design principle, it is still worth mentioning that such elegant design brings great flexibility and compatibility.
One reason for making the file system a layered software is the hierarchical nature of storage hardware. There is a common trade-off in hardware manufacturing between performance and capacity. Devices with higher performance tends to have limited capacity (due to laws of physics). Hence, in a Von-Neuman computer architecture, there is typically a hierarchy of memory devices named the memory hierarchy (or storage hierarchy):
- CPU registers (fastest, smallest capacity)
- On-chip SRAM CPU cache
- On-chip DRAM main memory
- Remote socket memory if on NUMA machines
- Fast persistent storage, e.g., Optane NVM, flash SSDs
- Slow persistent storage, e.g., traditional HDDs
- Archival storage, e.g., CD-ROMs, tapes (slowest, largest capacity)
Under this situation, file systems typically need to define their interfaces carefully so that, along with caching and device-specific optimizations, the workloads they face will minimize expensive data transfer between layers.
This section is not specific to this chapter but is a note on the order of wiki pages on concurrency. By far, we have applied synchronization (protection of critical sections) only in the form of disabling/enabling interrupts. Unfortunately, this method is not enough as we enter the file system world.
Consider the case of reading a data structure from disk. We would like to lock the data structure up during the read so that other processes cannot make changes to that data on disk in the middle. However, disk I/Os are generally slow so we are doing interrupt-based I/O, which means the process goes to sleep until the request completion interrupt wakes it up. In this case, simply disbaling interrupts by cli_push()
before the disk I/O leaves interrupts off when other processes get scheduled in this gap, which is incorrect.
Therefore, it is time to introduce mutex locks into the system and upgrade our previous synchronization points into using locks. Please read the first concurrency chapter on "Adopting Mutex Locks" first, before continuing on file systems ✭.
This chapter is purely theoretical so we are not doing any coding here. We will soon start concretizing file system implementation in the next chapter!
Current repo structure:
hux-kernel
├── Makefile
├── scripts
│ ├── gdb_init
│ ├── grub.cfg
│ └── kernel.ld
├── src
│ ├── boot
│ │ ├── boot.s
│ │ ├── elf.h
│ │ └── multiboot.h
│ ├── common
│ │ ├── debug.c
│ │ ├── debug.h
│ │ ├── intstate.c
│ │ ├── intstate.h
│ │ ├── port.c
│ │ ├── port.h
│ │ ├── printf.c
│ │ ├── printf.h
│ │ ├── string.c
│ │ ├── string.h
│ │ ├── types.c
│ │ └── types.h
│ ├── device
│ │ ├── idedisk.c
│ │ ├── idedisk.h
│ │ ├── keyboard.c
│ │ ├── keyboard.h
│ │ ├── sysdev.c
│ │ ├── sysdev.h
│ │ ├── timer.c
│ │ └── timer.h
│ ├── display
│ │ ├── sysdisp.c
│ │ ├── sysdisp.h
│ │ ├── terminal.c
│ │ ├── terminal.h
│ │ └── vga.h
│ ├── filesys
│ │ └── block.h
│ ├── interrupt
│ │ ├── idt-load.s
│ │ ├── idt.c
│ │ ├── idt.h
│ │ ├── isr-stub.s
│ │ ├── isr.c
│ │ ├── isr.h
│ │ ├── syscall.c
│ │ └── syscall.h
│ ├── memory
│ │ ├── gdt-load.s
│ │ ├── gdt.c
│ │ ├── gdt.h
│ │ ├── kheap.c
│ │ ├── kheap.h
│ │ ├── paging.c
│ │ ├── paging.h
│ │ ├── slabs.c
│ │ ├── slabs.h
│ │ ├── sysmem.c
│ │ └── sysmem.h
│ ├── process
│ │ ├── layout.h
│ │ ├── process.c
│ │ ├── process.h
│ │ ├── scheduler.c
│ │ ├── scheduler.h
│ │ ├── switch.s
│ │ ├── sysproc.c
│ │ └── sysproc.h
│ └── kernel.c
├── user
│ ├── lib
│ │ ├── debug.h
│ │ ├── malloc.c
│ │ ├── malloc.h
│ │ ├── printf.c
│ │ ├── printf.h
│ │ ├── string.c
│ │ ├── string.h
│ │ ├── syscall.h
│ │ ├── syscall.s
│ │ ├── syslist.s
│ │ ├── types.c
│ │ └── types.h
│ └── init.c
Please read the concurrency chapter on "Adopting Mutex Locks" first, before continuing on file systems.
Guanzhou Jose Hu @ 2021