
Booting Linux on a RISC-V core PURE RV32I - no M, no A, and no CSRs at all.
ping, 2 clocks, and chess. Base RV32I.
They said it could not be done.
Emulating this needs 100 lines of C++ code for the core, and 10 more for a trap unit.
How can we get preemtive multitasking, interrupts, M and U mode transition in a pure RV32I I am talking PURE. any opcode that is outside the 40 base ISA is invalid. Full Kernel.
Whats connected to it:
Full network stack (libslirp), package manager, desktop, stereo sound, keyboard and mouse, all running over memory mapped IO exclusively with thin linux driver wrappers to talk to software within linux.
If you want to check it out, or read the patches they are here in this branch: https://github.com/Gigantua/RiscVEmulator/tree/RiscV-LinuxOnBaseRV32I
I only want the mention the genuinely interesting parts.
Every instruction outside the Base RV32I ISA (40 instructions) is invalid.
Gap 1: no multiply/divide. That one is easy. Give the compiler the fallback in software for:
function call (__mulsi3, __divsi3, …) instead of emitting an opcode.
Gap 2: No atomics. Lots of patching. Normally they can be a NOP but we are not allowed to decode them, so they need to be patched away. See repo. This is a single core processor fundamentally.
Gap 3: There are no CSRs. A fixed RAM page (0x0F000000) holds what mtvec/mepc/mcause/mscratch/mstatus/mie should hold. Every csrr/csrw in the kernel's trap path is rewritten into an ordinary lw/sw against that page via a patch. There is no special logic about where we jumped to. There sits assembly code to read and store the registers.That means RV32I is executing microcode that is doing what the instruction would have done with a lw.
Gap 4: Interrupts without any CPU instruction. Single interrupt pin,
Timer/external interrupt hits a user process -> do_trap
Timer interrupt hits the kernel -> do_trap
ecall from userspace -> do_trap
Return to a user task -> trap_return
Gap 5: No packages for this arch. Package repo runs on the host and cross compiles any package that has source code for RV32i and hosts on port 8080. On the machine via rvcpkg add. Since we have a memory mapped eth0 with a working driver this just works. Works fine for zork, the sl locomotive and others.
Limitations: There is fundamentally no memory protection, but we get multitasking, and full linux access, wget etc just work. Without a mmu there is fundamentally also no dynamic library loading so framebuffer browsers could work, but none exist without dynamic loading.
Why is this exciting:
With core code as constexpr and really simple code, there is nothing stopping us fundamentally to run this in cuda (think of warp divergency, so thrust_group by next instruction at pc) or other experimental setups.
Busybox boots in about 1.5 seconds.
The system is elegant. Premtive interrupts, ecalls, M mode fall out of a trap system as a side effect, extremely reducing cpu complextiy.
"save state, enter M-mode, vector." Done in assembly at 0x0F000000. This is the "new" trick here I wanted to share. Lots of kernel patches later, we can have BASE RV32I and do the private register storing in assembly without a seperate instruction, to have full preemtive multitasking and a modern linux kernel with a network stack via a single lw instruction.