u/Embarrassed-Mess412

▲ 15 r/csharp

C# Networking Deep Dive - io_uring from scratch part 4 - Zero Copy Receive

Rabbit holes and disappointment define part 4.

io_uring zcrx (zero copy receive) looks great on paper and presentations but implementing it is a pain in the ass, it is very far from being widely used, requires specialized hardware (NIC) and leaks into the application design. I'd say that nowadays its single use case are very specialized solutions.

Part 4 TLDR;

zcrx allows registering a memory area that our NIC DMAs received packets' payload into. This eliminates the extra copy the kernel does when the NIC DMAs into kernel memory instead. The whole shebang comes with an hefty price though, we no longer own the buffers and it becomes more complex "isolate memory per thread"(ensure threads don't access each other's memory), deteriorating memory caching per CPU thread and making it a lot more complex to ensure that each reactor/worker never thread hops.

On top of that I can't test it because I don't have a NIC that supports this so who knows if it is any good..

Not too happy about this part 4 but can be insightful to read about this zero copy rx possibility even though my guess is that it will likely never see the light of day.

mda2av.github.io
u/Embarrassed-Mess412 — 4 days ago
▲ 30 r/csharp+1 crossposts

C# Networking Deep Dive - io_uring from scratch part 3 - Touching the Bytes

Part 3 extends the async model to include the actual data bytes pushed by the kernel and explains the mechanism to avoid allocations on the read branch. This can be achieved through MemoryManager<byte>, working over views over the shared memory buffers/slabs.

Part 3 TLDR;

The recv buffers live in a single _bufSlab allocated once during reactor init, kernel picks slots from this provided buffer ring and writes the received bytes into them so we never allocate per recv on user space. UnmanagedMemoryManager exposes each slot as Memory to provide compatibility with most BCL APIs.

The kernel itself still copies the bytes from the socket receive buffer into _bufSlab[bufferId], avoiding it requires different mechanisms outside this part 3 scope. Zero copy Rx on kernel side will be covered on part 4, see io_uring zero copy Rx — The Linux Kernel documentation

mda2av.github.io
u/Embarrassed-Mess412 — 7 days ago
▲ 37 r/csharp+1 crossposts

C# Networking Deep Dive - io-uring from scratch part 2 - Bridging the async model

Part 2 takes the barebones io_uring loop from part 1 and bridges it into C#'s async/await without allocating anything per request. The trick is that each Connection implements its own awaitable so that await ReadAsync() reuses the same underlying object for the lifetime of the TCP connection instead of creating a new Task every time.

A bounded ring buffer exists between the kernel dispatcher and the application handler buffering/absorbing bursts so that back to back arrivals don't get dropped.

The win is where the handler resumes after each await, by default the C# runtime hands the continuations off to a thread pool worker, this adds overhead at millions of requests per second. By implementing/plumbing the awaitable we can keep a "synchronous" continuation, when data arrives the reactor simple wakes the handler (which runs on the same OS thread as the reactor) and it resumes inline directly on top of the reactor's call stack, no thread pool hop, scheduler or need for locks on connection state as everything runs on the same thread.

The result is code that looks fully asynchronous (awaits everywhere) but in practice every iteration of the request loop runs on the same OS thread.

The current implementation already allows quite decent server performance even though the request parsing and handling logic is still missing, to be covered in future parts.

mda2av.github.io
u/Embarrassed-Mess412 — 12 days ago
▲ 20 r/csharp+1 crossposts

This post is the first part in a deep dive series on io_uring, it describes a basic example on how to bypass every abstraction and directly use the kernel interface for highest possible efficiency TCP networking using C# on Linux with io_uring.

mda2av.github.io
u/Embarrassed-Mess412 — 23 days ago