BTCUSDT 1h XGBoost - 80-fold walk-forward and 180-day untouched holdout, looking for sanity checks before live
BTCUSDT 1h Binance cross-margin. Two XGBoost classifiers (LONG/SHORT) plus regression guard, price-action allow gates, and loss-avoid veto. One position per side, 1h max hold, 1000 USDT pool, 30/50 percent normal/max sizing, 15 bps fee plus 5 bps slip modeled per side.
Data: BTCUSDT 1h from 2019 through mid-May 2026 (7.4 years). Last 180 days carved out before any training, tuning, or walk-forward. WF uses 80 monthly segments with monthly stride on the pre-holdout frame. Final bundle then evaluated once on the untouched 180 days.
WALK-FORWARD AGGREGATE | folds=80 | window_days=30 | interval=1h
--------------------------------------------------------------------------------------
- Compounding Annual Return 30.8460%
- Total Net Profit 486.9209% / 4869.2087 USDT
- Capital Pool 1000.00 USDT
- Normal / Max Deployment 30.0000% / 50.0000%
- Avg / Max Trade Notional 131.41 / 500.00 USDT
- Min Order Notional 10.00 USDT
- Annualized Sharpe 18.2040
- Annualized Sortino 15.9653
- Calmar Ratio 25.3234
- Maximum Drawdown 1.2181% / 12.1808 USDT
- Max DD Duration 11.62 days
- Win Rate 75.9855%
- Profit Factor 6.6119
- Avg Win / Avg Loss 2.0896
- Expectancy per Trade 0.5174 USDT
- Round-trip Trades 9411 / 119.14 per month
- Avg Holding Duration 1.00 hours
- Fees and Costs 254.1122% / 2541.12 USDT
- Beta vs BTC Buy-Hold -0.0014
- Value at Risk 95 0.0000%
- Value at Risk 99 0.0742%
- Probabilistic Sharpe 100.0000%
- Alpha vs BTC Buy-Hold 71.8106%
--------------------------------------------------------------------------------------
Final untouched holdout result:
HOLDOUT - FINAL UNTOUCHED HOLDOUT | last 180d | 1h | bars=4320
--------------------------------------------------------------------------------------
- Compounding Annual Return 72.8191%
- Total Net Profit 30.9608% / 309.6082 USDT
- Capital Pool 1000.00 USDT
- Normal / Max Deployment 30.0000% / 50.0000%
- Avg / Max Trade Notional 168.93 / 500.00 USDT
- Min Order Notional 10.00 USDT
- Annualized Sharpe 18.1728
- Annualized Sortino 15.1701
- Calmar Ratio 277.2289
- Maximum Drawdown 0.2627% / 2.6267 USDT
- Max DD Duration 1.46 days
- Win Rate 81.2646%
- Profit Factor 9.8810
- Avg Win / Avg Loss 2.2780
- Expectancy per Trade 0.7251 USDT
- Round-trip Trades 427 / 72.17 per month
- Avg Holding Duration 1.00 hours
- Fees and Costs 14.4267% / 144.27 USDT
- Beta vs BTC Buy-Hold -0.0045
- Value at Risk 95 0.0000%
- Value at Risk 99 0.1157%
- Probabilistic Sharpe 100.0000%
- Alpha vs BTC Buy-Hold 54.7127%
--------------------------------------------------------------------------------------
What I am specifically asking for. The Sharpe near 18 and the Calmar of 277 on the holdout are not numbers I take at face value. They look unusually high even accounting for the 1-hour holding period and small per-trade PnL with small variance. Some specific questions where I would value experienced input.
Is there a known failure mode in walk-forward design where stitching 80 monthly segments into a single concatenated equity curve artificially compresses cross-fold variance and inflates the aggregate Sharpe? My segments are independent (each fold trains from scratch on data before it) and the aggregate is the chronological concatenation. Is concatenation the right way to compute the aggregate Sharpe, or should I report per-fold Sharpe distribution stats instead?
Cumulative Fees and Costs on the WF is 254 percent of the 1,000 USDT capital pool, meaning the strategy paid 2,541 USDT in simulated fees to produce 4,869 USDT of net profit. About 34 percent of gross consumed by fees. The per-trade math checks out at 20 basis points one-way on a 137 USDT average ticket, but the cumulative ratio looks high. Does this fee burden look reasonable for a high-frequency 1-hour spot strategy, or is the simulator missing a cost category I should add?
The holdout outperforms the WF aggregate (72.8 percent CAGR vs 30.8 percent). My reading is that this is regime-specific (the last 180 days happen to be favourable for the momentum patterns the model learned) and the WF aggregate is the more honest long-run expectation. Does that interpretation hold, or is there another explanation I should consider?
The chosen thresholds (0.30 LONG, 0.37 SHORT) sit close to the median model output (p50 around 0.275 to 0.277). My pre-fire rate is high but actual trade rate is low because of downstream gates (regression guard, price-action allow flags, loss-avoid veto, single-position-per-side management). Is this stacked-gate design typical for an XGBoost momentum strategy, or do experienced people see this as fragile?
The slippage assumption of 5 basis points is from the standard cross-margin order book at this ticket size. If I scaled the strategy to 10x or 100x the capital pool, what slippage assumption would experienced live traders consider realistic on BTCUSDT 1h spot?
-------------------------------------------------------------------------------------------------------
P.S. Already ran this past a few LLMs. They explained the high Sharpe as consistent with 1h holds and small per-trade variance, the holdout vs WF gap as a favourable recent regime, and predicted live Sharpe to compress into the 5 to 8 range after slippage. Looking for what an LLM would not catch: feature or label leakage I missed, hidden flaws in the walk-forward stitching, and BTCUSDT 1h live execution issues only operators see in practice.