u/Low_Car_7590

Is this decomposition-based area modeling approach reasonable for microarchitecture DSE?

I am exploring a lightweight area modeling flow for microarchitecture DSE (design-space exploration). The goal is not signoff-accurate area estimation, but fast and structurally meaningful area prediction across many gem5 / HDL parameter configurations.

The core idea is to avoid using a single black-box model. Instead, I decompose the design into several structure classes and model them separately:

  1. SRAM-like storage structures (e.g., caches, BTBs, large regular arrays)
  2. Register/state-array structures (e.g., register files, rename tables, scoreboards)
  3. Queue/buffer-like structures (e.g., ROB, LSQ, FIFO, write buffers)
  4. CAM / associative selection logic (e.g., wakeup-select, associative lookup, priority/age selection)
  5. Remaining control and arithmetic datapath (modeled as residual area after subtracting the first four categories)

For SRAM-like structures, I plan to use OpenRAM / SRAM compiler results as ground truth. For logic-like structures, I plan to synthesize representative RTL with Yosys and train separate ML models. The final chip area would be the sum of all category predictions.

The motivation is that different microarchitectural structures scale very differently with parameters like ports, entries, width, associativity, and issue width, so a single global predictor may not capture these scaling behaviors well.

My questions are:

  1. Does this decomposition make sense for early-stage microarchitecture DSE?
  2. Are these categories architecturally meaningful from an area-modeling perspective?
  3. Would you classify structures like ROB, issue queue, LSQ, rename table, and physical register file differently?
  4. Is combining SRAM compiler/OpenRAM results with synthesized logic models a reasonable flow?
  5. What are the biggest pitfalls of this approach?
  6. Are there prior works or open-source projects that use a similar methodology?

I am mainly trying to understand whether this “decompose-by-structure-type” modeling strategy is fundamentally sound, even if absolute area accuracy is limited.

reddit.com
u/Low_Car_7590 — 6 days ago

Hi all,

I’m using Chipyard with Spike to generate checkpoints, and then running BOOM simulations with Verilator.

Here’s my current setup:

  • Warm-up instructions: 20M
  • Measured (simulated) instructions: 1M

(I know this is a bit of a weird configuration…)

What I observed is that the CPI results across different checkpoints — even from different workloads — are almost identical (~2.57), with differences only appearing at around the 3rd decimal place.

From what I understand (and from some prior discussion), a 1M instruction window might be too short, and the CPI I’m getting could essentially be dominated by noise. I’ve also seen suggestions that simulation windows should be at least 10M–100M instructions to capture meaningful performance behavior.

However, I’m still not fully convinced:

  • Is this behavior expected given such a short simulation window?
  • Or could this indicate a potential issue in my statistics collection or simulation setup?

Any insights or similar experiences would be greatly appreciated!

reddit.com
u/Low_Car_7590 — 17 days ago

I’m familiar with Verilog and SystemVerilog, and I’ve been using testbenches to verify simple systems. However, when I tried using UVM for verification, I found that I constantly need to write a lot more modules like drivers, monitors, reference models, etc. The effort involved in setting up UVM seems to exceed the effort of just writing the system itself.

At this point, I still don’t fully understand the main benefit of UVM. For those of you who have experience with it, is UVM really worth the effort? If so, could you explain why?

reddit.com
u/Low_Car_7590 — 1 month ago