Hey everyone. I'm currently modeling inefficiencies in prediction markets (specifically Polymarket) and could use some insight from those experienced in event-driven arbitrage.
The core thesis is simple: predicting the divergence between the "true" probability of an event $P_{true}$ based on real-time news sentiment, and the current implied probability priced by the market $P_{market}$.
Currently, I'm using an LLM-based sentiment analyzer to parse breaking news and assign a continuous sentiment score, which is then mapped to a probability shift $\Delta P$.
The trigger condition for an entry is when the expected value is significantly positive, accounting for fees and slippage:
$$EV = (P_{true} \cdot \text{Payout}) - \text{Cost} > \text{Threshold}$$
However, I'm running into a bottleneck with sentiment latency vs. order book liquidity. By the time the LLM parses the text and calculates the $\Delta P$, the HFT market makers have often already adjusted the bid/ask spread, leaving the order book too thin to execute the calculated $EV$ without massive slippage.
For those of you modeling sentiment-driven alpha:
- How do you mathematically decay the value of news sentiment over the first few milliseconds/seconds?
- Are you relying entirely on smaller, fine-tuned NLP models locally to beat the latency, or is there a specific statistical filter you use to predict the spread widening before the NLP finishes processing?
Appreciate any insights on the modeling or execution side!