I audited my own "validated" backtest and found the Sharpe I'd been quoting was wrong by 7x. Here's the full teardown.

Six years of QQQ opening-range-breakout data, 112 raw trades, a filter waterfall, a loss autopsy, and a stress test aimed at the exact failure mode that gets backtests torn apart here. Posting the whole thing because I'd rather get this checked before real money touches it than after.

Setup: Solo build, systematic ORB on QQQ/NQ, no ML, deterministic rules only (regime gate, day-of-week filter, signal grade, opening range breakout). Going live on a funded futures account shortly, which is why I spent this weekend trying to break my own numbers before someone else did it for me.

The Sharpe was wrong

Original claim: 3.50 Sharpe. Sounded great. Turned out the annualization method was undocumented and effectively assumed daily trading frequency on a system that fires roughly 10 times a year. Recomputed properly:

  • Per-trade Sharpe (mean_R / std_R): 0.49
  • Correctly annualized for actual trade frequency: 1.54

3.50 was fiction. 1.54 is defensible. Retired the old number everywhere, including my own notes, and documented the methodology so it's reproducible.

The filter waterfall (112 raw trades → 59 filtered)

Stage Trades Win Rate EV/trade Sharpe Max DD
Raw 112 48.2% +0.888R 0.27 6.8R
+ Calendar guard (FOMC/NFP/CPI) 109 48.6% +0.912R 0.27 6.8R
+ Friday blocked 80 53.8% +1.246R 0.33 4.0R
+ Wed BULL blocked 70 58.6% +1.479R 0.37 4.0R
+ Wed BEAR retained only 61 62.3% +1.539R 0.38 3.0R
+ Signal grade filter (4-confirmation alignment) 59 57.6% +0.987R 0.49 3.0R

Biggest single lever: the Friday filter alone accounts for ~38% of the total edge improvement from raw to final. Friday trades averaged -0.042R across 30 occurrences, essentially free money to remove. Everything else (day-of-week regime interaction, signal grading) matters, but nowhere near as much as just not trading on Fridays.

Loss autopsy—where does the edge actually die

Ran a structural post-mortem on all 59 filtered trades, winners and losers, looking for taxonomy rather than a magic filter (I know curve-fitting a "what-would-have-avoided-this-loss" rule off 25 losses is how people fool themselves, so I explicitly didn't do that, see below).

25 losses broke into three types:

  • Target-miss reversals (13, 52%): reached ≥1R in favor, then reversed to a full stop
  • Slow bleed (11, 44%): sideways chop, stopped late, no real signal
  • Immediate reversal (1, 4%): stopped within 3 bars, the classic fakeout, essentially absent

The 52% figure was the interesting one. Half the losses weren't bad entries, they were good entries the market later took back.

The counterfactual that actually mattered

I'd already built a two-tier exit (bank 50% at +1R, trail the remainder) but never backtested it, it was execution-layer code, not signal logic. Ran it against the loss autopsy as a historical counterfactual:

Backtest (no engine) With engine
13 target-miss losses -13.0R
11 slow-bleed losses -10.8R
34 winners +82.0R
Total EV/trade +0.987R

The mechanism is boring and mechanical, which is exactly why I trust it: locking half a position at +1R structurally can't be curve-fit to 13 specific historical trades, because it's a rule about R-multiples reached, not about any feature of those particular trades. It generalizes by construction.

Stress-testing against the thing that usually kills these posts

Saw enough "smooth equity curve = look-ahead bias" callouts on posts here to specifically check my own backtester for it. The risk: when a bar's high and low both contain the stop and target level, does the backtest assume favorable sequencing (target hit first) when live execution could easily have hit the stop first?

Audited all 93 grade-A trades (pre-final-filter set) for this exact condition:

  • 79 trades (84.9%): unambiguous — stop and target far enough apart that same-bar sequencing isn't a question
  • 14 trades (15.1%): ambiguous — same-day exit with price between stop and target

Worst-case stress test—force stop-first resolution on all 14 ambiguous trades:

  • Original EV: +0.633R (this subset)
  • Worst-case EV: +0.449R (-29%)
  • After typical live degradation: +0.269R—still positive

It's not zero-impact, and I'm not pretending it is. But the edge survives an assumption that's actively hostile to it, which is a meaningfully different claim than "the backtest looks clean. " I've now wired live trade tracking to flag these same-bar-ambiguous trades going forward and compare real fills against this worst-case floor if, live underperforms +0.449R on this specific cohort, that's the signal something in the backtester's sequencing assumption was actually wrong, not just theoretically risky.

What I did NOT do (the trap I was trying to avoid)

Did not go hunting for a rule that would have "saved" the 25 losses. That's the classic move that always works and always means nothing, with enough features you can always draw a line around your own losses in hindsight. The asymmetry engine passed a higher bar: it existed before the autopsy, has a mechanical justification independent of these specific trades, and its cost side (what it gives up on winners) was measured with equal rigor. Anything that only showed up as "add this filter, get 15 more percentage points" got treated as a red flag, not a discovery.

Where it stands

  • 59-trade filtered configuration, 57.6% win rate, +1.266R EV with the exit engine active
  • Per-trade Sharpe 0.49, correctly annualized ~1.54
  • Max drawdown 3.0R across the full filtered sample
  • Live drift monitor now tracks rolling EV against this backtest floor, with explicit drift alerts at 10 and 20 trades, and separately tracks the 14 ambiguous-sequence trades against their own worst-case floor

Going live on a funded account shortly. Wanted this checked here first rather than finding out about a hole from a blown drawdown limit.

Genuinely interested in where this is still wrong. What would you attack first, the calendar guard's negligible impact (only removed 2 trades, is that suspicious in itself?), the grade-filter methodology, or something in the intrabar sequencing check I haven't thought of?

reddit.com
u/Heavy-Star3388 — 1 day ago
▲ 1 r/FuturesTradingNQ+2 crossposts

I audited my own "validated" backtest and found the Sharpe I'd been quoting was wrong by 7x. Here's the full teardown.

Six years of QQQ opening-range-breakout data, 112 raw trades, a filter waterfall, a loss autopsy, and a stress test aimed at the exact failure mode that gets backtests torn apart here. Posting the whole thing because I'd rather get this checked before real money touches it than after.

Setup: Solo build, systematic ORB on QQQ/NQ, no ML, deterministic rules only (regime gate, day-of-week filter, signal grade, opening range breakout). Going live on a funded futures account shortly, which is why I spent this weekend trying to break my own numbers before someone else did it for me.

The Sharpe was wrong

Original claim: 3.50 Sharpe. Sounded great. Turned out the annualization method was undocumented and effectively assumed daily trading frequency on a system that fires roughly 10 times a year. Recomputed properly:

  • Per-trade Sharpe (mean_R / std_R): 0.49
  • Correctly annualized for actual trade frequency: 1.54

3.50 was fiction. 1.54 is defensible. Retired the old number everywhere, including my own notes, and documented the methodology so it's reproducible.

The filter waterfall (112 raw trades → 59 filtered)

Stage Trades Win Rate EV/trade Sharpe Max DD
Raw 112 48.2% +0.888R 0.27 6.8R
+ Calendar guard (FOMC/NFP/CPI) 109 48.6% +0.912R 0.27 6.8R
+ Friday blocked 80 53.8% +1.246R 0.33 4.0R
+ Wed BULL blocked 70 58.6% +1.479R 0.37 4.0R
+ Wed BEAR retained only 61 62.3% +1.539R 0.38 3.0R
+ Signal grade filter (4-confirmation alignment) 59 57.6% +0.987R 0.49 3.0R

Biggest single lever: the Friday filter alone accounts for ~38% of the total edge improvement from raw to final. Friday trades averaged -0.042R across 30 occurrences, essentially free money to remove. Everything else (day-of-week regime interaction, signal grading) matters, but nowhere near as much as just not trading on Fridays.

Loss autopsy—where does the edge actually die

Ran a structural post-mortem on all 59 filtered trades, winners and losers, looking for taxonomy rather than a magic filter (I know curve-fitting a "what-would-have-avoided-this-loss" rule off 25 losses is how people fool themselves, so I explicitly didn't do that, see below).

25 losses broke into three types:

  • Target-miss reversals (13, 52%): reached ≥1R in favor, then reversed to a full stop
  • Slow bleed (11, 44%): sideways chop, stopped late, no real signal
  • Immediate reversal (1, 4%): stopped within 3 bars, the classic fakeout, essentially absent

The 52% figure was the interesting one. Half the losses weren't bad entries, they were good entries the market later took back.

The counterfactual that actually mattered

I'd already built a two-tier exit (bank 50% at +1R, trail the remainder) but never backtested it, it was execution-layer code, not signal logic. Ran it against the loss autopsy as a historical counterfactual:

Backtest (no engine) With engine
13 target-miss losses -13.0R +9.75R
11 slow-bleed losses -10.8R -10.8R (unaffected, as expected)
34 winners +82.0R +75.8R (gives back ~0.19R/trade insurance cost)
Total EV/trade +0.987R +1.266R (+28.3%)

The mechanism is boring and mechanical, which is exactly why I trust it: locking half a position at +1R structurally can't be curve-fit to 13 specific historical trades, because it's a rule about R-multiples reached, not about any feature of those particular trades. It generalizes by construction.

Stress-testing against the thing that usually kills these posts

Saw enough "smooth equity curve = look-ahead bias" callouts on posts here to specifically check my own backtester for it. The risk: when a bar's high and low both contain the stop and target level, does the backtest assume favorable sequencing (target hit first) when live execution could easily have hit the stop first?

Audited all 93 grade-A trades (pre-final-filter set) for this exact condition:

  • 79 trades (84.9%): unambiguous — stop and target far enough apart that same-bar sequencing isn't a question
  • 14 trades (15.1%): ambiguous — same-day exit with price between stop and target

Worst-case stress test — force stop-first resolution on all 14 ambiguous trades:

  • Original EV: +0.633R (this subset)
  • Worst-case EV: +0.449R (-29%)
  • After typical live degradation: +0.269R—still positive

It's not zero-impact, and I'm not pretending it is. But the edge survives an assumption that's actively hostile to it, which is a meaningfully different claim than "the backtest looks clean. " I've now wired live trade tracking to flag these same-bar-ambiguous trades going forward and compare real fills against this worst-case floor if, live underperforms +0.449R on this specific cohort, that's the signal something in the backtester's sequencing assumption was actually wrong, not just theoretically risky.

What I did NOT do (the trap I was trying to avoid)

Did not go hunting for a rule that would have "saved" the 25 losses. That's the classic move that always works and always means nothing, with enough features you can always draw a line around your own losses in hindsight. The asymmetry engine passed a higher bar: it existed before the autopsy, has a mechanical justification independent of these specific trades, and its cost side (what it gives up on winners) was measured with equal rigor. Anything that only showed up as "add this filter, get 15 more percentage points" got treated as a red flag, not a discovery.

Where it stands

  • 59-trade filtered configuration, 57.6% win rate, +1.266R EV with the exit engine active
  • Per-trade Sharpe 0.49, correctly annualized ~1.54
  • Max drawdown 3.0R across the full filtered sample
  • Live drift monitor now tracks rolling EV against this backtest floor, with explicit drift alerts at 10 and 20 trades, and separately tracks the 14 ambiguous-sequence trades against their own worst-case floor

Going live on a funded account shortly. Wanted this checked here first rather than finding out about a hole from a blown drawdown limit.

Genuinely interested in where this is still wrong. What would you attack first, the calendar guard's negligible impact (only removed 2 trades, is that suspicious in itself?), the grade-filter methodology, or something in the intrabar sequencing check I haven't thought of?

reddit.com
u/Heavy-Star3388 — 2 days ago

I audited my own "validated" backtest and found the Sharpe I'd been quoting was wrong by 7x. Here's the full teardown.

Six years of QQQ opening-range-breakout data, 112 raw trades, a filter waterfall, a loss autopsy, and a stress test aimed at the exact failure mode that gets backtests torn apart here. Posting the whole thing because I'd rather get this checked before real money touches it than after.

Setup: Solo build, systematic ORB on QQQ/NQ, no ML, deterministic rules only (regime gate, day-of-week filter, signal grade, opening range breakout). Going live on a funded futures account shortly, which is why I spent this weekend trying to break my own numbers before someone else did it for me.

The Sharpe was wrong

Original claim: 3.50 Sharpe. Sounded great. Turned out the annualization method was undocumented and effectively assumed daily trading frequency on a system that fires roughly 10 times a year. Recomputed properly:

  • Per-trade Sharpe (mean_R / std_R): 0.49
  • Correctly annualized for actual trade frequency: 1.54

3.50 was fiction. 1.54 is defensible. Retired the old number everywhere, including my own notes, and documented the methodology so it's reproducible.

The filter waterfall (112 raw trades → 59 filtered)

Stage Trades Win Rate EV/trade Sharpe Max DD
Raw 112 48.2% +0.888R 0.27 6.8R
+ Calendar guard (FOMC/NFP/CPI) 109 48.6% +0.912R 0.27 6.8R
+ Friday blocked 80 53.8% +1.246R 0.33 4.0R
+ Wed BULL blocked 70 58.6% +1.479R 0.37 4.0R
+ Wed BEAR retained only 61 62.3% +1.539R 0.38 3.0R
+ Signal grade filter (4-confirmation alignment) 59 57.6% +0.987R 0.49 3.0R

Biggest single lever: the Friday filter alone accounts for ~38% of the total edge improvement from raw to final. Friday trades averaged -0.042R across 30 occurrences, essentially free money to remove. Everything else (day-of-week regime interaction, signal grading) matters, but nowhere near as much as just not trading on Fridays.

Loss autopsy—where does the edge actually die

Ran a structural post-mortem on all 59 filtered trades, winners and losers, looking for taxonomy rather than a magic filter (I know curve-fitting a "what-would-have-avoided-this-loss" rule off 25 losses is how people fool themselves, so I explicitly didn't do that, see below).

25 losses broke into three types:

  • Target-miss reversals (13, 52%): reached ≥1R in favor, then reversed to a full stop
  • Slow bleed (11, 44%): sideways chop, stopped late, no real signal
  • Immediate reversal (1, 4%): stopped within 3 bars, the classic fakeout, essentially absent

The 52% figure was the interesting one. Half the losses weren't bad entries, they were good entries the market later took back.

The counterfactual that actually mattered

I'd already built a two-tier exit (bank 50% at +1R, trail the remainder) but never backtested it, it was execution-layer code, not signal logic. Ran it against the loss autopsy as a historical counterfactual:

Backtest (no engine) With engine
13 target-miss losses -13.0R +9.75R
11 slow-bleed losses -10.8R -10.8R (unaffected, as expected)
34 winners +82.0R +75.8R (gives back ~0.19R/trade insurance cost)
Total EV/trade +0.987R +1.266R (+28.3%)

The mechanism is boring and mechanical, which is exactly why I trust it: locking half a position at +1R structurally can't be curve-fit to 13 specific historical trades, because it's a rule about R-multiples reached, not about any feature of those particular trades. It generalizes by construction.

Stress-testing against the thing that usually kills these posts

Saw enough "smooth equity curve = look-ahead bias" callouts on posts here to specifically check my own backtester for it. The risk: when a bar's high and low both contain the stop and target level, does the backtest assume favorable sequencing (target hit first) when live execution could easily have hit the stop first?

Audited all 93 grade-A trades (pre-final-filter set) for this exact condition:

  • 79 trades (84.9%): unambiguous — stop and target far enough apart that same-bar sequencing isn't a question
  • 14 trades (15.1%): ambiguous — same-day exit with price between stop and target

Worst-case stress test — force stop-first resolution on all 14 ambiguous trades:

  • Original EV: +0.633R (this subset)
  • Worst-case EV: +0.449R (-29%)
  • After typical live degradation: +0.269R—still positive

It's not zero-impact, and I'm not pretending it is. But the edge survives an assumption that's actively hostile to it, which is a meaningfully different claim than "the backtest looks clean. " I've now wired live trade tracking to flag these same-bar-ambiguous trades going forward and compare real fills against this worst-case floor if, live underperforms +0.449R on this specific cohort, that's the signal something in the backtester's sequencing assumption was actually wrong, not just theoretically risky.

What I did NOT do (the trap I was trying to avoid)

Did not go hunting for a rule that would have "saved" the 25 losses. That's the classic move that always works and always means nothing, with enough features you can always draw a line around your own losses in hindsight. The asymmetry engine passed a higher bar: it existed before the autopsy, has a mechanical justification independent of these specific trades, and its cost side (what it gives up on winners) was measured with equal rigor. Anything that only showed up as "add this filter, get 15 more percentage points" got treated as a red flag, not a discovery.

Where it stands

  • 59-trade filtered configuration, 57.6% win rate, +1.266R EV with the exit engine active
  • Per-trade Sharpe 0.49, correctly annualized ~1.54
  • Max drawdown 3.0R across the full filtered sample
  • Live drift monitor now tracks rolling EV against this backtest floor, with explicit drift alerts at 10 and 20 trades, and separately tracks the 14 ambiguous-sequence trades against their own worst-case floor

Going live on a funded account shortly. Wanted this checked here first rather than finding out about a hole from a blown drawdown limit.

Genuinely interested in where this is still wrong. What would you attack first, the calendar guard's negligible impact (only removed 2 trades, is that suspicious in itself?), the grade-filter methodology, or something in the intrabar sequencing check I haven't thought of?

reddit.com
u/Heavy-Star3388 — 2 days ago
▲ 0 r/civilengineering+1 crossposts

Senior Structural Engineer / PM (OSHPD) — California SE License Required — Strong Comp + Profit Sharing

Curious what SE-licensed structural engineers with OSHPD/HCAI healthcare experience are seeing in the California market right now.

Are firms actively hiring for senior-level roles? What comp ranges are you seeing for 7+ years of experience with healthcare project delivery?

Asking because I work in technical recruiting for engineering and construction firms and want to understand what candidates in this niche are actually experiencing.

Feel free to DM me.

reddit.com
u/Heavy-Star3388 — 6 days ago