u/BoyCuan — reddlx

Hey everyone. I'm currently modeling inefficiencies in prediction markets (specifically Polymarket) and could use some insight from those experienced in event-driven arbitrage.

The core thesis is simple: predicting the divergence between the "true" probability of an event $P_{true}$ based on real-time news sentiment, and the current implied probability priced by the market $P_{market}$.

Currently, I'm using an LLM-based sentiment analyzer to parse breaking news and assign a continuous sentiment score, which is then mapped to a probability shift $\Delta P$.

The trigger condition for an entry is when the expected value is significantly positive, accounting for fees and slippage:

$$EV = (P_{true} \cdot \text{Payout}) - \text{Cost} > \text{Threshold}$$

However, I'm running into a bottleneck with sentiment latency vs. order book liquidity. By the time the LLM parses the text and calculates the $\Delta P$, the HFT market makers have often already adjusted the bid/ask spread, leaving the order book too thin to execute the calculated $EV$ without massive slippage.

For those of you modeling sentiment-driven alpha:

How do you mathematically decay the value of news sentiment over the first few milliseconds/seconds?
Are you relying entirely on smaller, fine-tuned NLP models locally to beat the latency, or is there a specific statistical filter you use to predict the spread widening before the NLP finishes processing?

Appreciate any insights on the modeling or execution side!