How I model fair value for Polymarket BTC binary options — Black-Scholes on a 15-min horizon, conviction scoring, and what the backtest actually taught me
Following up on my auto-tuner post. Several people asked about the core signal logic, so here's a deeper dive into how the bot actually decides to enter a trade.
The market
Polymarket runs BTC (and ETH) Up/Down markets on 15-minute windows. You buy YES or NO shares at a price between 0 and 1. If BTC closes above the opening price at slot expiry, YES resolves to 1.00. Taker fee is ~1.8% at p=0.5, drops to zero at the extremes.
The signal model
I use a Black-Scholes digital option formula to compute a fair value probability:
p_up = N( drift / (sigma * sqrt(T)) )
Where:
drift= (spot_now − slot_open) / slot_opensigma= rolling 15m realized volatility (per-minute)T= seconds remaining in slot / 900
Edge = |fair_value − market_ask|. Only enter if edge ≥ 0.26 (taker fee at p=0.5 is 1.8%, so you need meaningful edge to have a real business).
What I learned the hard way
Edge bucket 0.22–0.25 was consistently negative in live data. The fee eats it. I was entering trades that looked like edge but weren't, once fees were accounted for. Raising the minimum edge from 0.22 to 0.26 cut roughly 40% of entries but turned the PnL positive.
Re-entries after SL: disabled. 37 re-entries in the first day generated −$16.71. The model was still "convinced" but the market had already told me I was wrong.
Conviction scoring
Not all edge-positive entries are equal. I score each potential entry 0–1:
score = 0.30×edge_norm + 0.25×upside_norm + 0.20×drift_norm + 0.15×time_norm + 0.10
edge_norm: edge / min_edge (capped at 1)upside_norm: (1 − ask_px) / 0.40 — how much room to TPdrift_norm: confirmed momentum from slot opentime_norm: seconds remaining (longer window = more time for price to move)
Below 0.62 conviction: skip. Position sizing is tiered: $25 / $40 / $60 by tier (0.62–0.70 / 0.70–0.85 / ≥0.85).
Entry filters beyond edge
- Min drift: 0.12% from slot open. Don't enter a market that hasn't moved — the model overestimates probability when BTC is flat.
- Min price: 0.35 — very cheap shares have high variance and the SL fires frequently at noise levels.
- Min seconds left: 60 — at <60s the TP at 0.97 is unreachable for most entries.
- Max seconds left: 270 — don't enter in the first 10 minutes of the slot (slot_too_fresh).
- Late-entry penalty: for entries with <400s left, required edge scales up proportionally.
Position management — the stack
Evaluated in this order every 3 seconds:
- TP at 0.97 — exit immediately
- Time-stop — if age > 240s and price hasn't moved ≥3% from entry, close. Dead positions waste slot time.
- Break-even — if HWM ≥ entry × 1.05, move SL to entry. A trade that reached +5% should never close negative.
- Lock-profit — if HWM ≥ entry × 1.10, floor SL at entry × 1.03. Minimum 3% locked.
- SL — dynamic by price: 15% for mid-range entries, 10% for expensive (0.60–0.85), 8% for high-prob (≥0.85).
- Trailing — 8% from HWM, activates after ≥8% gain. Protects the peak without cutting winners early.
Directional block and circuit breaker
After any SL in a (slot, direction) pair: block re-entry in that direction for the rest of the slot. This is cross-market too — BTC and ETH on the same 15m timestamp are highly correlated. A Down SL on BTC blocks Down entries on ETH for that slot.
Circuit breaker: ≥2 SLs in the same direction within 45 minutes → block that direction for 30 minutes.
Backtest reality
The Polymarket API returns limited historical data (~18–22 closed slots). With current parameters (MIN_EDGE=0.26, MIN_CONVICTION=0.62, MIN_DRIFT=0.12%) the main rejection reason is drift_too_low — BTC/ETH are sideways most of the time. The bot is very selective.
3 trades from 18 slots in the last backtest: 100% win rate, +$78 total. Small sample — meaningless for win rate estimation, but useful to confirm the plumbing works and sizing makes sense.
What I'd want feedback on
The conviction formula is hand-tuned. I used bucket analysis on ~200 live trades to weight the components, but there's no guarantee the weights generalize. Has anyone used Bayesian optimization or simple grid search to calibrate something like this without overfitting?
Also curious if anyone else is running models on these markets — the Black-Scholes assumption of constant intra-slot volatility is obviously wrong (news events, liquidations), but it's a useful baseline.