u/dangi12012

Booting Linux on a RISC-V core PURE RV32I - no M, no A, and no CSRs at all.

Booting Linux on a RISC-V core PURE RV32I - no M, no A, and no CSRs at all.

ping, 2 clocks, and chess. Base RV32I.

They said it could not be done.

Emulating this needs 100 lines of C++ code for the core, and 10 more for a trap unit.

How can we get preemtive multitasking, interrupts, M and U mode transition in a pure RV32I I am talking PURE. any opcode that is outside the 40 base ISA is invalid. Full Kernel.

Whats connected to it:
Full network stack (libslirp), package manager, desktop, stereo sound, keyboard and mouse, all running over memory mapped IO exclusively with thin linux driver wrappers to talk to software within linux.

If you want to check it out, or read the patches they are here in this branch: https://github.com/Gigantua/RiscVEmulator/tree/RiscV-LinuxOnBaseRV32I

I only want the mention the genuinely interesting parts.
Every instruction outside the Base RV32I ISA (40 instructions) is invalid.

Gap 1: no multiply/divide. That one is easy. Give the compiler the fallback in software for:
function call (__mulsi3, __divsi3, …) instead of emitting an opcode.

Gap 2: No atomics. Lots of patching. Normally they can be a NOP but we are not allowed to decode them, so they need to be patched away. See repo. This is a single core processor fundamentally.

Gap 3: There are no CSRs. A fixed RAM page (0x0F000000) holds what mtvec/mepc/mcause/mscratch/mstatus/mie should hold. Every csrr/csrw in the kernel's trap path is rewritten into an ordinary lw/sw against that page via a patch. There is no special logic about where we jumped to. There sits assembly code to read and store the registers.That means RV32I is executing microcode that is doing what the instruction would have done with a lw.

Gap 4: Interrupts without any CPU instruction. Single interrupt pin,
Timer/external interrupt hits a user process -> do_trap
Timer interrupt hits the kernel -> do_trap
ecall from userspace -> do_trap
Return to a user task -> trap_return

Gap 5: No packages for this arch. Package repo runs on the host and cross compiles any package that has source code for RV32i and hosts on port 8080. On the machine via rvcpkg add. Since we have a memory mapped eth0 with a working driver this just works. Works fine for zork, the sl locomotive and others.

Limitations: There is fundamentally no memory protection, but we get multitasking, and full linux access, wget etc just work. Without a mmu there is fundamentally also no dynamic library loading so framebuffer browsers could work, but none exist without dynamic loading.

Why is this exciting:

With core code as constexpr and really simple code, there is nothing stopping us fundamentally to run this in cuda (think of warp divergency, so thrust_group by next instruction at pc) or other experimental setups.

Busybox boots in about 1.5 seconds.

The system is elegant. Premtive interrupts, ecalls, M mode fall out of a trap system as a side effect, extremely reducing cpu complextiy.

"save state, enter M-mode, vector." Done in assembly at 0x0F000000. This is the "new" trick here I wanted to share. Lots of kernel patches later, we can have BASE RV32I and do the private register storing in assembly without a seperate instruction, to have full preemtive multitasking and a modern linux kernel with a network stack via a single lw instruction.

reddit.com
u/dangi12012 — 3 days ago
▲ 7 r/RISCV

Booting Linux on a RISC-V core with no M, no A, and no CSRs at all. Pure RV32I

They said it could not be done.

Emulating this needs 100 lines of C++ code for the core, and 10 more for a trap unit.

How can we get preemtive multitasking, interrupts, M and U mode transition in a pure RV32I I am talking PURE. any opcode that is outside the 40 base ISA is invalid. Full Kernel.

Whats connected to it:
Full network stack (libslirp), package manager, desktop, stereo sound, keyboard and mouse, all running over memory mapped IO exclusively with thin linux driver wrappers to talk to software within linux.

If you want to check it out, or read the patches they are here in this branch: https://github.com/Gigantua/RiscVEmulator/tree/RiscV-LinuxOnBaseRV32I

I only want the mention the genuinely interesting parts.
Every instruction outside the Base RV32I ISA (40 instructions) is invalid.

Gap 1: no multiply/divide. That one is easy. Give the compiler the fallback in software for:
function call (__mulsi3, __divsi3, …) instead of emitting an opcode.

Gap 2: No atomics. Lots of patching. Normally they can be a NOP but we are not allowed to decode them, so they need to be patched away. See repo. This is a single core processor fundamentally.

Gap 3: There are no CSRs. A fixed RAM page (0x0F000000) holds what mtvec/mepc/mcause/mscratch/mstatus/mie should hold. Every csrr/csrw in the kernel's trap path is rewritten into an ordinary lw/sw against that page via a patch. There is no special logic about where we jumped to. There sits assembly code to read and store the registers.That means RV32I is executing microcode that is doing what the instruction would have done with a lw.

Gap 4: Interrupts without any CPU instruction. Single interrupt pin,
Timer/external interrupt hits a user process -> do_trap
Timer interrupt hits the kernel -> do_trap
ecall from userspace -> do_trap
Return to a user task -> trap_return

Gap 5: No packages for this arch. Package repo runs on the host and cross compiles any package that has source code for RV32i and hosts on port 8080. On the machine via rvcpkg add. Since we have a memory mapped eth0 with a working driver this just works. Works fine for zork, the sl locomotive and others.

Limitations: There is fundamentally no memory protection, but we get multitasking, and full linux access, wget etc just work. Without a mmu there is fundamentally also no dynamic library loading so framebuffer browsers could work, but none exist without dynamic loading.

Why is this exciting:

With core code as constexpr and really simple code, there is nothing stopping us fundamentally to run this in cuda (think of warp divergency, so thrust_group by next instruction at pc) or other experimental setups.

Busybox boots in about 1.5 seconds.

The system is elegant. Premtive interrupts, ecalls, M mode fall out of a trap system as a side effect, extremely reducing cpu complextiy.

"save state, enter M-mode, vector." Done in assembly at 0x0F000000. This is the "new" trick here I wanted to share. Lots of kernel patches later, we can have BASE RV32I and do the private register storing in assembly without a seperate instruction, to have full preemtive multitasking and a modern linux kernel with a network stack via a single lw instruction.

reddit.com
u/dangi12012 — 3 days ago
▲ 11 r/RISCV

RISCV - NO MMU - Busybox - Linux Desktop - Internet - package install

How far do we get with RISCV RV32I + MAF + Zicsr. That is around 200 lines of C++ code for the CPU nothing else.

We wanted to see how far we get with memory mapped devices, keyboard, sound, midi, mouse, framebuffer, NETWORK, socket() support. As far as I can see this is the first constepxr step() function, (so no side effects that would hinder constant evaluation of many CPU steps)

Buybox with FULL socket support. That means ping google and wget working. On top of that a package server, that cross compiles any package available as sourcecode to rvpkg rv32i, so we can rvpkg within the busybox guest, pick any of the 3000 packages, and either bake them into the busybox image, or serve them at runtime!

This is the absolute bare bones emulated cpu, with infrastructure showcasing, how far we can get even with no mmu far surpassing what busybox is supposed to be, installing and running with buttons for newly installed applications appearing on the desktop.

Busybox Microwindows - crosscompiled rvpkg available. Linux is just one of the demos. Take a look:
https://github.com/Gigantua/RiscVEmulator

https://preview.redd.it/q7gg7uljna1h1.png?width=1781&format=png&auto=webp&s=ea7149dbc99852be769d6c251319225641aff47f

reddit.com
u/dangi12012 — 7 days ago

Made a simple online tool for minimizing Boolean logic: https://www.logic-solve.com/

Drop in a truth table or PLA and it gives you the optimized version along with Verilog/VHDL/C output and a K-map view. Runs locally in the browser for smaller designs.

Figured it might be useful for some of you working on digital stuff. Especially folks working with hardcore, transistor or relais computers. With this you can minify your designs.

u/dangi12012 — 19 days ago