diff --git a/docs/src/README.md b/docs/src/README.md index f36a038e..7d9a6e1c 100644 --- a/docs/src/README.md +++ b/docs/src/README.md @@ -1,3 +1,5 @@ +# Introduction + This document describes KxOS, a secure, fast, and modern OS written in Rust. As the project is a work in progress, this document is by no means complete. @@ -7,7 +9,7 @@ Despite the incompleteness, this evolving document serves several important purp 2. To convey the vision of this project to partners and stakeholders. 3. To serve as a blueprint for implementation. -# Opportunities +## Opportunities We believe now is the perfect time to start a new Rust OS project. We argue that if we are doing things right, the project can have a promising prospect to @@ -61,13 +63,13 @@ Can such success stories be repeated in the field of OSes? I think so. There are some China's home-grown OSes like [openKylin](https://www.openkylin.top/index.php?lang=en), but all of them are based on Linux and lack a self-developed OS _kernel_. The long-term goal of KxOS is to fill this key missing core of the home-grown OSes. -# Architecture Overview +## Architecture Overview Here is an overview of the architecture of KxOS. ![architecture overview](images/arch_overview.png) -# Features +## Features **1. Security by design.** Security is our top priority in the design of KxOS. As such, we adopt the widely acknowledged security best practice of [least privilege principle](https://en.wikipedia.org/wiki/Principle_of_least_privilege) and enforce it in a fashion that leverages the full strengths of Rust. To do so, we partition KxOS into two halves: a _privileged_ OS core and _unprivileged_ OS components. As a result, we can write the OS components almost entirely in _safe_ Rust, while taking extra cautions with the _unsafe_ Rust code in the OS core. Furthermore, we propose the idea of _everything-is-a-capability_, which elevates the status of [capabilities](https://en.wikipedia.org/wiki/Capability-based_security) to the level of a ubiquitous security primitive used throughout the OS. We make novel use of Rust's advanced features (e.g., [type-level programming](https://willcrichton.net/notes/type-level-programming/)) to make capabilities more accessible and efficient. The net result is improved security and uncompromised performance. diff --git a/docs/src/images/arch_comparison.png b/docs/src/images/arch_comparison.png new file mode 100644 index 00000000..dd0066bc Binary files /dev/null and b/docs/src/images/arch_comparison.png differ diff --git a/docs/src/privilege_separation/README.md b/docs/src/privilege_separation/README.md index e69de29b..09f5b02c 100644 --- a/docs/src/privilege_separation/README.md +++ b/docs/src/privilege_separation/README.md @@ -0,0 +1,29 @@ +# Privilege Separation + +One fundamental design goal of KxOS is to support _privilege separation_, i.e., the separation between the privileged OS core and the unprivileged OS components. The privileged portion is allowed to use `unsafe` keyword to carry out dangerous tasks like accessing CPU registers, manipulating stack frames, and doing MMIO or PIO. In contrast, the unprivileged portion, which forms the majority of the OS, must be free from `unsafe` code. With privilege separation, the memory safety of KxOS can be boiled down to the correctness of the privileged OS core, regardless of the correctness of the unprivileged OS components, thus reducing the size of TCB significantly. + +To put privilege separation into perspective, let's compare the architectures +of the monolithic kernels, microkernels, and KxOS. + +![Arch comparison](../images/arch_comparison.png) + +The diagram above highlights the characteristics of different OS architectures +in terms of communication overheads and the TCB for memory safety. +Thanks to privilege separation, KxOS promises the benefit of being _as safe as a microkernel and as fast as a monolithic kernel_. + +Privilege separation is an interesting research problem, prompting us to +answer a series of technical questions. + +1. Is it possible to partition a Rust OS into the privileged and unprivileged halves? (If so, consider the following questions) +2. What are the safe APIs exposed by the privileged OS core? +3. Can OS drivers be implemented as unprivileged code with the help from the privileged OS? +4. How small can the privileged OS core be? + +To answer these questions, we will make two case studies in the rest of this +chapter: one is [the common syscall workflow](syscall_workflow.md) and the other +is [the drivers for Virtio devices on PCI bus](pci_virtio_drivers.md). With the +two case studies, we can be confident to give a big YES for Q1 and Q3. And we +propose some key APIs of the privileged OS core, thus providing a partial answer +for Q2. We cannot give a precise answer to Q4 until the privileged OS core is +fully implemented. But the two case studies shall provide strong evidence that +the final TCB shall be much smaller than the size of the entire OS. \ No newline at end of file diff --git a/docs/src/privilege_separation/pci_virtio_drivers.md b/docs/src/privilege_separation/pci_virtio_drivers.md index e69de29b..aca7b7e4 100644 --- a/docs/src/privilege_separation/pci_virtio_drivers.md +++ b/docs/src/privilege_separation/pci_virtio_drivers.md @@ -0,0 +1,189 @@ +# Case Study 2: Virtio devices on PCI bus + +In our journal towards writing an OS without _unsafe_ Rust, a key obstacle is dealing with device drivers. Device drivers are the single largest contributor to OS complexity. In Linux, they constitute 70% of the code base. And due to their low-level nature, device driver code usually involves privileged tasks, like doing PIO or MMIO, accessing registers, registering interrupt handlers, etc. So the question is: can we figure out the right abstractions for the OS core to enable writing most driver code in unprivileged Rust? + +Luckily, the answer is YES. And this document will explain why. + +We will focus on Virtio devices on PCI bus. The reason is two-fold. First, Virtio devices are the single most important class of devices for our target usage, VM-based TEEs. Second, PCI bus is the most important bus for x86 architecture. Given the versatility of Virtio and the complexity of PCI bus, if a solution can work with Virtio devices on PCI, then it is most likely to work with other types of devices or buses. + +## The problem + +Here are some of the elements in PCI-based Virtio devices that may involve `unsafe` Rust. +* Access PCI configuration space (doing PIO with `in`/`out` instructions) +* Access PCI capabilities (specified by raw pointers calculated from BAR + offset) +* Initialize Virtio devices (doing MMIO with raw pointers) +* Allocate and initialize Virtio queues (managing physical pages) +* Push/pop entries to/from Virtio queues (accessing physical memory with raw pointers) + +## The solution + +### PCI bus + +### Privileged part + +```rust +// file: kxos-core-libs/pci-io-port/lib.rs +use x86::IoPort; + +/// The I/O port to write an address in the PCI +/// configuration space. +pub const PCI_ADDR_PORT: IoPort = { + // SAFETY. Write to this I/O port won't affect + // any typed memory. + unsafe { + IoPort::new(0x0cf8, Rights![Wr]) + } +} + +/// The I/O port to read/write a value from the +/// PCI configuration space. +pub const PCI_DATA_PORT: IoPort = { + // SAFETY. Read/write to this I/O port won't affect + // any typed memory. + unsafe { + IoPort::new(0x0cf8 + 0x04, Rights![Rd, Wr]) + } +}; +``` + +### Unprivileged part + +```rust +// file: kxos-comps/pci/lib.rs +use pci_io_port::{PCI_ADDR_PORT, PCI_DATA_PORT}; + +/// The PCI configuration space, which enables the discovery, +/// initialization, and configuration of PCI devices. +pub struct PciConfSpace; + +impl PciConfSpace { + pub fn read_u32(bus: u8, slot: u8, offset: u32) -> u32 { + let addr = (1 << 31) | + ((bus as u32) << 16) | + ((slot as u32) << 11) | + (offset & 0xFF); + PCI_ADDR_PORT.write(addr); + PCI_DATA_PORT.read() + } + + pub fn write_u32(bus: u8, slot: u8, offset: u32, val: u32) -> u32 { + let addr = (1 << 31) | + ((bus as u32) << 16) | + ((slot as u32) << 11) | + (offset & 0xFF); + PCI_ADDR_PORT.write(addr); + PCI_DATA_PORT.write(val) + } + + pub fn probe_device(&self, bus: u8, slot: u8) -> Option { + todo!("omitted...") + } +} + +/// A scanner of PCI bus to probe all PCI devices. +pub struct PciScanner { + bus_no: u8, + slot: u8, +} + +impl Iterator for PciScanner { + type Item = PciDevice; + + fn next(&mut self) -> Option { + while !(self.bus_no == 255 && self.slot == 31) { + if self.slot == 31 { + self.bus_no += 1; + self.slot = 0; + } + + let config = PciConfSpace::probe_device(self.bus_no, self.slot); + let slot = self.slot; + self.slot += 1; + + if let Some(config) = config { + todo!("convert the config to a device...") + } + } + } +} + +/// A general PCI device +pub struct PciDevice { + // ... +} + +/// The configuration of a general PCI device. +pub struct PciDeviceConfig { + // ... +} + +/// The capabilities of a PCI device. +pub struct PciCapabilities { + // ... +} +``` + +### Virtio + +Most code of Virtio drivers can be unprivileged thanks to the abstractions of `VmPager` and `VmCell` provided by the OS core. + +```rust +// file: kxos-comp-libs/virtio/transport.rs + +/// The transport layer for configuring a Virtio device. +pub struct VirtioTransport { + isr_cell: VmCell, + // ... +} + +impl VirtioTransport { + /// Create a new instance. + /// + /// According to Virtio spec, the transport layer for + /// configuring a Virtio device consists of four parts: + /// + /// * Common configuration structure + /// * Notification structure + /// * Interrupt Status Register (ISR) + /// * Device-specific configuration structure + /// + /// This constructor requires four pointers to these parts. + pub fn new( + common_cfg_ptr: PAddr, + isr_ptr: PAddr, + notifier: PAddr, + device_cfg: PAddr, + ) -> Result { + let isr_cell = Self::new_part(isr_ptr)?; + todo!("do more initialization...") + } + + /// Write ISR. + pub fn write_isr(&self, new_val: u8) { + self.isr_cell.write(new_val).unwrap() + } + + /// Read ISR. + pub fn read_isr(&self) -> u8 { + self.isr_cell.read().unwrap() + } + + fn new_part(part: PAddr) -> Result> { + let addr = part.as_ptr() as usize; + let page_addr = align_down(addr, PAGE_SIZE); + let page_offset = addr % PAGE_SIZE; + + // Acquire the access to the physical page + // that contains the part. If the physical page + // is not safe to access, e.g., when the page + // has been used by the kernel, then the acquisition + // will fail. + let vm_pager = VmPagerOption::new(PAGE_SIZE) + .paddr(page_addr) + .exclusive(false) + .build()?; + let vm_cell = vm_pager.new_cell(page_offset)?; + vm_cell + } +} +``` diff --git a/docs/src/privilege_separation/syscall_workflow.md b/docs/src/privilege_separation/syscall_workflow.md index e69de29b..e52d5a25 100644 --- a/docs/src/privilege_separation/syscall_workflow.md +++ b/docs/src/privilege_separation/syscall_workflow.md @@ -0,0 +1,419 @@ +# Case study 1: Common Syscall Workflow + +## Problem definition + +In a nutshell, the job of an OS is to handle system calls. While system calls may differ greatly in what they do, they share a common syscall handling workflow, which includes at least the following steps. + +* User-kernel switching (involving assembly code) +* System call parameter parsing (which has to access CPU registers) +* System call dispatching (needs to _interpret_ integer values to corresponding C types specified by Linux ABI) +* Per-system call handling logic, which often involves accessing user-space memory (pointer dereference) + +It seems that each of the steps requires the use of `unsafe` more or less. So the question here is: **is it possible to design a syscall handling framework that has a clear cut between privileged and unprivileged code, allowing the user to handle system calls without `unsafe`?** + +The answer is YES. This document describes such a solution. + +## To `unsafe`, or not to `unsafe`, that is the question + +> The `unsafe` keyword has two uses: to declare the existence of contracts the compiler can't check, and to declare that a programmer has checked that these contracts have been upheld. --- The Rust Unsafe Book + +> To isolate unsafe code as much as possible, it’s best to enclose unsafe code within a safe abstraction and provide a safe API. --- The Rust book + +Many Rust programmers, sometimes even "professional" ones, do not fully understand when a function should be marked `unsafe` or not. Check out [Kerla OS](https://github.com/nuta/kerla)'s `UserBufWriter` and `UserVAddr` APIs, which is a classic example of _seemingly safe_ APIs that are _unsafe_ in nature. + +```rust +impl<'a> SyscallHandler<'a> { + pub fn sys_clock_gettime(&mut self, clock: c_clockid, buf: UserVAddr) -> Result { + let (tv_sec, tv_nsec) = match clock { + CLOCK_REALTIME => { + let now = read_wall_clock(); + (now.secs_from_epoch(), now.nanosecs_from_epoch()) + } + CLOCK_MONOTONIC => { + let now = read_monotonic_clock(); + (now.secs(), now.nanosecs()) + } + _ => { + debug_warn!("clock_gettime: unsupported clock id: {}", clock); + return Err(Errno::ENOSYS.into()); + } + }; + + let mut writer = UserBufWriter::from_uaddr(buf, size_of::() + size_of::()); + writer.write::(tv_sec.try_into().unwrap())?; + writer.write::(tv_nsec.try_into().unwrap())?; + + Ok(0) + } +} +``` + +```rust +/// Represents a user virtual memory address. +/// +/// It is guaranteed that `UserVaddr` contains a valid address, in other words, +/// it does not point to a kernel address. +/// +/// Futhermore, like `NonNull`, it is always non-null. Use `Option` +/// represent a nullable user pointer. +#[derive(Debug, Copy, Clone, Eq, PartialEq, Ord, PartialOrd, Hash)] +#[repr(transparent)] +pub struct UserVAddr(usize); + +impl UserVAddr { + pub const fn new(addr: usize) -> Option { + if addr == 0 { + None + } else { + Some(UserVAddr(addr)) + } + } + + pub fn read(self) -> Result { + let mut buf: MaybeUninit = MaybeUninit::uninit(); + self.read_bytes(unsafe { + slice::from_raw_parts_mut(buf.as_mut_ptr() as *mut u8, size_of::()) + })?; + Ok(unsafe { buf.assume_init() }) + } + + pub fn write(self, buf: &T) -> Result { + let len = size_of::(); + self.write_bytes(unsafe { slice::from_raw_parts(buf as *const T as *const u8, len) })?; + Ok(len) + } + + pub fn write_bytes(self, buf: &[u8]) -> Result { + call_usercopy_hook(); + self.access_ok(buf.len())?; + unsafe { + copy_to_user(self.value() as *mut u8, buf.as_ptr(), buf.len()); + } + Ok(buf.len()) + } +} +``` + +Interestingly, zCore makes almost exactly the same mistake. + +```rust +impl Syscall<'_> { + /// finds the resolution (precision) of the specified clock clockid, and, + /// if buffer is non-NULL, stores it in the struct timespec pointed to by buffer + pub fn sys_clock_gettime(&self, clock: usize, mut buf: UserOutPtr) -> SysResult { + info!("clock_gettime: id={:?} buf={:?}", clock, buf); + + let ts = TimeSpec::now(); + buf.write(ts)?; + + info!("TimeSpec: {:?}", ts); + + Ok(0) + } +} +``` + +```rust +pub type UserOutPtr = UserPtr; + +/// Raw pointer from user land. +#[repr(transparent)] +#[derive(Copy, Clone)] +pub struct UserPtr(*mut T, PhantomData

); + +impl From for UserPtr { + fn from(ptr: usize) -> Self { + UserPtr(ptr as _, PhantomData) + } +} + +impl UserPtr { + /// Overwrites a memory location with the given `value` + /// **without** reading or dropping the old value. + pub fn write(&mut self, value: T) -> Result<()> { + self.check()?; // check non-nullness and alignment + unsafe { self.0.write(value) }; + Ok(()) + } +} +``` + +The examples reveal two important considerations in designing KxOS: +1. Exposing _truly_ safe APIs. The privileged OS core must expose _truly safe_ APIs: however buggy or silly the unprivileged OS components may be written, they must _not_ cause undefined behaviors. +2. Handling _arbitrary_ pointers safely. The safe API of the OS core must provide a safe way to deal with arbitrary pointers. + +With the two points in mind, let's get back to our main goal of privilege separation. + +## Code organization with privilege separation + +Our first step is to separate privileged and unprivileged code in the codebase of KxOS. For our purpose of demonstrating a syscall handling framework, a minimal codebase may look like the following. + +```text +. +├── kxos +│   ├── src +│ │   └── main.rs +│   └── Cargo.toml +├── kxos-core +│   ├── src +│ │   ├── lib.rs +│ │   ├── syscall_handler.rs +│ │   └── vm +│ │ ├── vmo.rs +│ │ └── vmar.rs +│   └── Cargo.toml +├── kxos-core-libs +│ ├── linux-abi-types +│ │   ├── src +│ │   │ └── lib.rs +│ │   └── Cargo.toml +│ └── pod +│ ├── src +│   │ └── lib.rs +│  └── Cargo.toml +├── kxos-comps +│   └── linux-syscall +│ ├── src +│   │ └── lib.rs +│   └── Cargo.toml +└── kxos-comp-libs +    └── linux-abi + ├── src +   │ └── lib.rs +    └── Cargo.toml +``` + +The ultimate build target of the codebase is the `kxos` crate, which is an OS kernel that consists of a privileged OS core (crate `kxos-core`) and multiple OS components (the crates under `kxos-comps/`). + +For the sake of privilege separation, only crate `kxos` and `kxos-core` along with the crates under `kxos-core-libs` are allowed to use the `unsafe` keyword. To the contrary, the crates under `kxos-comps/` along with their dependent crates under `kxos-comp-libs/` are not allowed to use `unsafe` directly; they may only borrow the superpower of `unsafe` by using the safe API exposed by `kxos-core` or the crates under `kxos-core-libs`. To summarize, the memory safety of the OS only relies on a small and well-defined TCB that constitutes the `kxos` and `kxos-core` crate plus the crates under `kxos-core-libs/`. + +Under this setting, all implementation of system calls goes to the `linux-syscall` crate. We are about to show that the _safe_ API provided by `kxos-core` is powerful enough to enable the _safe_ implementation of `linux-syscall`. + +## Crate `kxos-core` + +For our purposes here, the two most relevant APIs provided by `kxos-core` is the abstraction for syscall handlers and virtual memory (VM). + +### Syscall handlers + +The `SyscallHandler` abstraction enables the OS core to hide the low-level, architectural-dependent aspects of syscall handling workflow (e.g., user-kernel switching and CPU register manipulation) and allow the unprivileged OS components to implement system calls. + +```rust +// file: kxos-core/src/syscall_handler.rs + +pub trait SyscallHandler { + fn handle_syscall(&self, ctx: &mut SyscallContext); +} + +pub struct SyscallContext { /* cpu states */ } + +pub fn set_syscall_handler(handler: &'static dyn SyscallHandler) { + todo!("set HANDLER") +} + +pub(crate) fn syscall_handler() -> &'static dyn SyscallHandler { + HANDLER +} + +static mut HANDLER: &'static dyn SyscallHandler = &DummyHandler; + +struct DummyHandler; + +impl SyscallHandler for DummyHandler { + fn handle_syscall(&self, ctx: &mut UserContext) { + ctx.set_retval(-Errno::ENOSYS); + } +} +``` + +### VM capabilities + +The OS core provides two abstractions related to virtual memory management. +* _Virtual Memory Address Region (VMAR)_. A VMAR represents a range of virtual address space. In essense, VMARs abstract away the architectural details regarding page tables. +* _Virtual Memory Pager (VMP)_. A VMP represents a range of memory pages (yes, the memory itself, not the address space). VMPs encapsulates the management of physical memory pages and enable on-demand paging. + +Both VMARs and VMPs are _privileged_ as they need to have direct access to page tables and physical memory, which demands the use of `unsafe`. + +These two abstractions are adopted from similar concepts in zircon ([Virtual Memory Address Regions (VMARs)](https://fuchsia.dev/fuchsia-src/reference/kernel_objects/vm_address_region) and [Virtual Memory Object (VMO)](https://fuchsia.dev/fuchsia-src/reference/kernel_objects/vm_object)), also implemented by zCore. + +Interestingly, both VMARs and VMPs are [capabilities](../capabilities/README.md), +an important concept that we will elaborate on later. Basically, they are capabilities as they satisfy the following two properties of *non-forgeability* and *monotonicity*. This is because 1) a root VMAR or VMP can only be created via a few well-defined APIs exposed by the OS core, and 2) a child VMAR o VMP can only be derived from an existing VMAR or VMP with more limited access to resources (e.g., a subset of the parent's address space or memory pages or access permissions). + +## Crate `linux-syscall` + +Here we demonstrate how to leverage the APIs of `ksos-core` to implement system calls with safe Rust code in crate `linux-syscall`. + +```rust +// file: kxos-comps/linux-syscall/src/lib.rs +use kxos_core::{SyscallContext, SyscallHandler, Vmar}; +use linux_abi::{SyscallNum::*, UserPtr, RawFd, RawTimeVal, RawTimeZone}; + +pub struct SampleHandler; + +impl SyscallHandler for SampleHandler { + fn handle_syscall(&self, ctx: &mut SyscallContext) { + let syscall_num = ctx.num(); + let (a0, a1, a2, a3, a4, a5) = ctx.args(); + match syscall_num { + SYS_GETTIMEOFDAY => { + let tv_ptr = UserPtr::new(a0 as usize); + let tz_ptr = UserPtr::new(a1 as usize); + let res = self.sys_gettimeofday(tv_ptr, ); + todo!("set retval according to res"); + } + SYS_SETRLIMIT => { + let resource = a0 as u32; + let rlimit_ptr = UserPtr::new(a1 as usize); + let res = self.sys_setrlimit(resource, rlimit_ptr); + todo!("set retval according to res"); + } + _ => { + ctx.set_retval(-Errno::ENOSYS) + } + }; + } +} + +impl SampleHandler { + fn sys_gettimeofday(&self, tv_ptr: UserPtr, _tz_ptr: UserPtr) -> Result<()> { + if tv_ptr.is_null() { + return Err(Errno::EINVAL); + } + + // Get the VMAR of this process + let vmar = self.thread().process().vmar(); + let tv_val: RawTimeVal = todo!("get current time"); + // Write a value according to the arbitrary pointer + // is safe because + // 1) the vmar refers to the memory in the user space; + // 2) the read_slice method checks memory validity (no page faults); + // + // Note that the vmar of the OS kernel cannot be + // manipulated directly by any OS components outside + // the OS core. + vmar.write_val(tv_ptr, tv_val)?; + Ok(()) + } + + fn sys_setrlimit(&self, resource: u32, rlimit_ptr: UserPtr) -> Result { + if rlimit_ptr.is_null() { + return Err(Errno::EINVAL); + } + + let vmar = self.thread().process().vmar(); + // Read a value according to the arbitrary pointer is safe + // due to reasons similar to the above code, but with one + // addition reason: the value is of a type `T: Pod`, i.e., + // Plain Old Data (POD). + let new_rlimit = vmar.read_val::(rlimit_ptr)?; + todo!("use the new rlimit value") + } +} +``` + +## Crate `pod` + +This crate defines a marker trait `Pod`, which represents plain-old data. + +```rust +/// file: kxos-core-libs/pod/src/lib.rs + +/// A marker trait for plain old data (POD). +/// +/// A POD type `T:Pod` supports converting to and from arbitrary +/// `mem::size_of::()` bytes _safely_. +/// For example, simple primitive types like `u8` and `i16` +/// are POD types. But perhaps surprisingly, `bool` is not POD +/// because Rust compiler makes implicit assumption that +/// a byte of `bool` has a value of either `0` or `1`. +/// Interpreting a byte of value `3` has a `bool` value has +/// undefined behavior. +/// +/// # Safety +/// +/// Marking a non-POD type as POD may cause undefined behaviors. +pub unsafe trait Pod: Copy + Sized { + fn new_from_bytes(bytes: &[u8]) -> Self { + *Self::from_bytes(bytes) + } + + fn from_bytes(bytes: &[u8]) -> &Self { + // Ensure the size and alignment are ok + assert!(bytes.len() == core::mem::size_of::()); + assert!((bytes as *const u8 as usize) % core::mem::align_of::() == 0); + + unsafe { + core::mem::transmute(bytes) + } + } + + fn from_bytes_mut(bytes: &[u8]) -> &mut Self { + // Ensure the size and alignment are ok + assert!(bytes.len() == core::mem::size_of::()); + assert!((bytes as *const u8 as usize) % core::mem::align_of::() == 0); + + unsafe { + core::mem::transmute(bytes) + } + } + + fn as_bytes(&self) -> &[u8] { + let ptr = self as *const u8; + let len = core::mem::size_of::(); + unsafe { + core::slice::from_raw_parts(ptr, len) + } + } + + fn as_bytes_mut(&mut self) -> &mut [u8] { + let ptr = self as *mut u8; + let len = core::mem::size_of::(); + unsafe { + core::slice::from_raw_parts_mut(ptr, len) + } + } +} + +macro_rule! impl_pod_for { + (/* define the input */) => { /* define the expansion */ } +} + +impl_pod_for!( + u8, u16, u32, u64, + i8, i16, i32, i64, +); + +unsafe impl [T; N] for Pod {} +``` + +## Crate `linux-abi-type` + +```rust +// file: kxos-core-libs/linux-abi-types +use pod::Pod; + +pub type RawFd = i32; + +pub struct RawTimeVal { + sec: u64, + usec: i64, +} + +unsafe impl Pod for RawTimeVal {} +``` + +## Crate `linux-abi` + +```rust +// file: kxos-comp-libs/linux-abi +pub use linux_abi_types::*; + +pub enum SyscallNum { + Read = 0, + Write = 1, + /* ... */ +} +``` + +## Wrap up + +I hope that this document has convinced you that with the right abstractions (e.g., `SyscallHandler`, `Vmar`, `Vmp`, and `Pod`), it is possible to write system calls---at least, the main system call workflow---without _unsafe_ Rust. \ No newline at end of file