u/Background_Panic9344

System Design Interview Guide for Software Engineers
▲ 23 r/Cluely+1 crossposts

System Design Interview Guide for Software Engineers

Putting this together because every system design thread on here seems to start from scratch. This is the framework I use, refined over a bunch of FAANG and series C/D loops, and it works whether you're prepping for a junior loop or a staff round.

The same skeleton works for almost any prompt: design Twitter, design Dropbox, design Uber, design a URL shortener. What changes is the depth in each phase and which tradeoffs you spend the most time on.

  1. Clarify requirements (5-7 min)

Functional first. What is the system supposed to do, who are the users, what are the core flows. Don't list 20 features. Pick 3-4 that are clearly in scope and confirm with the interviewer. If they say "design Twitter" the in-scope is probably tweet, follow, feed. Out of scope: DMs, ads, search, video.

Non-functional second. Read volume vs write volume, latency tolerance, consistency requirements, availability target. This is where you anchor the whole design. A read-heavy feed system looks nothing like a write-heavy event log.

If the interviewer is vague, just propose numbers and confirm. "Let's assume 200M DAU, 100:1 read to write, p99 under 200ms." They'll correct you if they care.

  1. Back-of-envelope estimation (optional, 2-3 min)

A lot of people overdo this. The point is to figure out if you need to shard, if you need a cache, if a single database can handle it. If your write QPS is 100, you're not building Kafka. If it's 1M, you are.

The only numbers worth memorizing: rough seconds per day (~100k), L1 / disk / network latency orders of magnitude, single MySQL instance ceiling (~5k writes/s with replicas), single Redis instance throughput (~100k ops/s). The rest you can derive.

  1. API design (3-5 min)

Define the endpoints or RPC signatures for your core flows. POST /tweet, GET /feed, POST /follow. Include pagination, auth, idempotency keys where it matters. This makes the rest of the design concrete, every endpoint maps to a path through your architecture.

Don't skip this. Interviewers use API design to check whether you actually understand what the system is doing or you're just drawing boxes.

  1. High-level architecture (10-15 min)

Boxes and arrows. Client, load balancer, app servers, cache, primary DB, read replicas, async workers, message queue, downstream services. Walk through each core flow and trace the request path on the diagram.

Don't pre-optimize. Start simple. The interviewer will push you on bottlenecks, that's the next phase, not this one.

  1. Deep dives on bottlenecks (15-20 min)

This is the part candidates underrate and where the offer is actually decided. The interviewer will pick a component and ask "how does this scale to 10x." Common deep dive targets:

  • Feed generation: fan-out on write vs fan-out on read vs hybrid (Twitter's hot-celebrity problem). Know when to push and when to pull.
  • Database sharding: by user ID, by tweet ID, by time. Consistent hashing if rebalancing matters.
  • Caching: read-through, write-through, write-behind. Cache invalidation strategy. TTLs.
  • Hot keys / hot partitions: how do you keep one viral user from collapsing a shard.
  • Async processing: which writes can be eventually consistent. Message queue choice (Kafka vs SQS vs RabbitMQ) and why.
  • Failure modes: what happens if the cache goes down, if the primary goes down, if a region goes down.

Pick the 2-3 most interesting deep dives for the system you're designing and go hard on those. Better to nail 2 deep than dust 5 shallow.

  1. Wrap up (last 2-3 min)

If you have time, summarize what you'd do next: monitoring, alerting, multi-region, cost. Not because you'll cover it, but because it signals you know there's a long tail of real-world concerns past the whiteboard.

What actually moves the needle in the interview

  • Talk through tradeoffs out loud. The interviewer doesn't care which DB you pick, they care whether you can articulate why. "Postgres because read-heavy with strong consistency on the write path, DynamoDB if we needed predictable single-digit ms at scale and could give up joins."
  • Ask before you assume. Cheap to clarify, expensive to redesign at minute 35.
  • Don't hero-pitch a tool. If you say Kafka, be ready to explain partitioning, consumer groups, exactly-once semantics. Naming a tool you can't defend is worse than not naming it.
  • Time-box yourself. If you're at minute 25 and still on requirements, you're going to bomb. Move.

Resources I actually used

  • Designing Data-Intensive Applications (Kleppmann). Read it twice. The replication and consistency chapters carry the entire field.
  • Hello Interview's System Design in a Hurry. Free, focused, the closest thing to a real framework rather than a list of patterns.
  • The System Design Primer GitHub repo for breadth.
  • Build something. Spin up a service with a real database, add a cache, blow it up under load, watch what breaks. The book theory only sticks once you've seen a hot partition fall over in front of you.

The framework above is not magic. It's just a checklist that keeps you from skipping a phase under pressure. The actual signal in the interview is whether you've operated at scale before, and the framework makes that signal legible. If you haven't, build the project. If you have, run the framework and don't freeze.

Good luck.

u/Background_Panic9344 — 4 days ago