u/Nvestiq

Backtest looked clean. Paper trading looked clean. Live account bled. Here's what I missed.

Backtest: ~38% CAGR, Sharpe 1.9, max DD 11%. Paper-traded it for two months and the tracking held within a few percent. At this point, i'm fairly confident that things will work out.

Went live, down 6.2% in five weeks. Same code, same broker, same hours.

It took me longer than I want to admit to figure out what was actually different.

What the isses were:

- Slippage on entries: Backtest assumed mid. Real fills landed 20-40% closer to the ask on average. On a strategy targeting ~8bps per trade, that gap was the entire edge.

- Spread widening around the open: I'd modeled a static 1bp spread. In reality, the first 15 minutes ran 3–6bps regularly, and most of my entries clustered right there.

- Partial fills on the exit. Backtest filled the full size at the limit. Live, I'd get 40–60% filled and have to chase, or hold past my exit logic and watch the move fade.

- Queue position. Limit orders that "filled" in the backtest were sitting behind 50k shares of resting liquidity. Half the time, they never filled at all.

The strategy wasn't wrong. The simulation was lying about execution.

The harder part is none of this shows up in paper trading either, because most paper engines model fills the same lazy way backtests do. You don't find out until real money hits the book.

Full disclosure: I've been working on validation infrastructure for this exact problem, so I'm biased on the framing. But for those of you who've made the sim->live jump and held up, what execution assumptions did you have to tighten before the numbers stopped lying?

Specifically interested in how you model spread, partial fills, and queue position on equities or futures.

reddit.com
u/Nvestiq — 2 days ago

The same bug shows up in almost every LLM-written backtest we've looked at

We've spent the last few months looking at trading code people generated with ChatGPT, Claude, and Cursor. Strategies in Pine Script, Python, MQL5, you name it. There's one bug that shows up over and over, and the equity curves never look broken because of it.

The model writes signal logic that references the current bar's close to decide whether to enter on that same bar. In live trading, you can't act on a close before it's happened. In a backtest with sloppy indexing, you can, and the strategy looks brilliant. Look-ahead bias, baked in.

LLMs do this constantly because the training data is full of educational code where indicator-on-close gets computed across the full series without thinking about real-time causality. The model isn't reasoning about when information becomes available; it's pattern-matching to "compute SMA, compare to close, generate signal."

You can't really catch it by reading the output either. The Python looks clean. The Pine Script compiles. The Sharpe is 2.4 and the curve goes up and to the right. The first time you find it is usually when live PnL doesn't match week one.

Full disclosure, we work on backtesting infrastructure, so this is the problem I think about constantly. We're interested in what others have found when you've used an LLM to write strategy code. Has lookahead been the bug that bit you, or have you seen worse?

reddit.com
u/Nvestiq — 3 days ago

How are you handling contract rollover in your futures backtests?

Full disclosure, we work on backtesting infrastructure, which is why this problem is stuck in our head. We just want to compare notes on something I don't think has a clean answer.

The basic issue: a continuous futures contract series has a discontinuity every time you roll. If you stitch front-month closes together without adjusting, the roll-day gap shows up in your PnL as a phantom win or loss the strategy never actually took.

The three methods I've used or seen:

  1. Back-adjusted (price-shift). Subtract the roll-day gap from all prior prices so the curve is continuous. Clean to backtest on, but absolute price levels become fictional (old support from 2019 might now sit at a negative number on ES.)

  2. Ratio-adjusted (Panama). Multiply historical prices by the rollover ratio. Preserves percentage moves better, still wrecks absolute levels.

  3. No adjustment, handle the roll inside the strategy logic. Most realistic, biggest pain to implement.

A few related gotchas I kept running into:

- Rolling on expiration day instead of open interest/volume crossover - you end up trading the dead contract through the final week

- Not modeling the cost of the roll trade itself (a tick or two on ES, materially more on something like back-month NG)

- Strategies that fire in the front month's last few sessions look great in backtest and fall apart live because the liquidity has already migrated

For anyone running multi-year systematic strategies on ES, CL, or NG, our question is:

Which adjustment method are you on, and have you ever sanity-checked the back-adjusted series against an unadjusted one to see how big the divergence actually is?

Also open to hearing how people handle micro->mini transitions (MES to ES) once size scales up without distorting the historical equity curve.

reddit.com
u/Nvestiq — 3 days ago

Not enough emphasis is put on the infrastructure that goes on behind the scenes of AI trading. How do you know if what you're backtesting is actually accurate, and not giving you a pretty equity curve? To trade confidently, you must be able to see a full, detailed breakdown of why your code works the way it does. Not just to solve the issues of look-ahead bias, walk-forward, compounding errors, slippage, etc., but so that your intent matches your code. Intent -> Validation -> Export

This process can only be done at the system architectural level, where the system is designed for trading and backtesting. That is why we built Nvestiq, and we're getting ready for our full launch in the next few weeks.

Priority Beta access will be rolling out earlier.

https://reddit.com/link/1sz9irz/video/iw3e8ki8o6yg1/player

reddit.com
u/Nvestiq — 22 days ago