
r/ScientificComputing

Audited 512³ split-step quantum-state simulation on an i7 laptop — evidence packet included
I’m an independent researcher in Cairo working on CPU-first numerical simulation and reproducible solver evidence.
I recently released a bounded solver-evidence paper and SHA-256 locked artifact packet:
Audited Laptop-Scale 512³ Quantum-State Simulation: A REPA-Governed Solver Stack Beyond the Cluster-Only Assumption
DOI: https://zenodo.org/records/20247942
The claim is narrow:
- 512³ internal-state complex split-step simulation using a oneAPI CPU backend on an Intel i7 laptop-class machine
- persisted outputs are 2D amplitude/phase slice planes, not full 512³ volume dumps
- separate Crank–Nicolson Hermitian conservation validation
- separate GMRES/multigrid comparison against a PARDISO direct-solve oracle at calibration scale
- dimension-tagged evidence matrix to prevent merging solver lanes
What I am not claiming:
- not 512³ Crank–Nicolson execution
- not 512³ GMRES/PARDISO parity
- not cluster obsolescence in general
- not proof of any AI/identity theory attached to the broader research program
I’m looking for hostile technical review: numerical issues, memory-accounting mistakes, evidence-boundary problems, reproduction suggestions, or places where the public claim should be narrowed.
Paper/evidence packet:
https://zenodo.org/records/20247942GitHub:
https://github.com/ChasingBlu/RECP_evidence
MCP server for the TLA+ model checker tla-rs
Hi all,
Just shipped an MCP server some of you might find useful: **tla-mcp**.
TLA+ is a formal-spec language for designing concurrent and distributed
systems. You describe what your protocol should do and a model checker
tries every reachable state to catch invariant violations, deadlocks,
race conditions you didn't see coming. With tla-mcp registered, Claude
Code can call the checker as a first-class tool: validate a spec, run a
bounded check with a counterexample trace, replay specific scenarios, all from inside the chat.
Tool descriptions are deliberately opinionated about how the model
should use the checker (budget all limits upfront, treat `limit_reached`
as inconclusive, look at the last transition of a trace first) so the
guidance survives context truncation.
Install + client config snippet + tour of the four tools is on the
landing page: **https://fabracht.github.io/tla-rs/**
It's an experiment. Feedback and bug reports welcome.
Two identical MPI jobs slow down drastically on Intel Alder Lake but not on Threadripper. Is it normal?
Hi everyone,
I regularly run multiple parallel MPI jobs simultaneously on my workstations. I have two systems:
- Intel i7-12700 (12 cores: 8 P-cores + 4 E-cores), OS: Ubuntu 20.04
- AMD Threadripper 3960X (24 cores, 48 threads), OS: Ubuntu 18.04
I wrote a simple C++ MPI test program that runs with mpirun -np 2. On both machines, a single instance finishes in about 12 seconds.
The problem appears when I run two instances at the same time (both mpirun -np 2):
- Threadripper: Both finish in ~12 seconds (no slowdown)
- Intel: Both take ~30 seconds (significant slowdown)
I tried pinning processes to specific cores using taskset and --cpu-set in mpirun. The processes do land on the correct cores (I verified with ps), but the slowdown persists.
Is this expected behavior for Alder Lake? Could the hybrid P-core/E-core architecture be causing memory bandwidth contention? Or am I missing something else?
I'm trying to figure out if my Intel system is performing normally or if I should be hunting for a configuration issue.
Additional notes:
- My code shows reasonable&normal speed-up with increasing core numbers on both systems
- The Intel PC has only one memory stick
- The AMD PC has multiple memory sticks
- My test code is not memory intensive (mostly CPU math)
I can provide more details if needed. I'm not super knowledgeable about CPU architectures, so apologies in advance.
Thanks for any insights!
PhysCC: A DSL Compiler for Physics Simulations (SYCL, MPI, AVX2)
I’ve been working on PhysCC, an open-source tool designed to bridge the gap between high-level physics equations and low-level hardware optimization.
The problem: Writing boilerplate for SYCL, MPI, or AVX2 stencils is tedious. The solution: You write a simple equation like u = u + dt * lap(u) and PhysCC generates the optimized backend code.
Key Features:
- Multi-backend support (Single-core, OpenMP, MPI, SYCL, CUDA).
- AI-informed pass: It analyzes the PDE type (Hyperbolic, Parabolic, Elliptic) and suggests optimal work-group sizes for Intel Iris Xe.
- Built-in visualization script for heatmaps.
It’s still a work in progress, but I’d love to hear your thoughts on the codegen or the feature extraction logic!
https://github.com/NikosPappas/PhysCC