Most resolutions are unsurprising โ the model said 70%, the trade went the right way, you train on it and move on. But occasionally the model says 85% confidently, then the trade
goes against it.
Those are the examples that contain the most learning signal โ the model has a confident wrong belief that needs to be corrected.
The algorithm:
- On every resolution, check if |predicted - 0.5| > 0.20 AND the prediction was wrong.
- If yes, add the (features, predProb, actualLabel) tuple to the hindsight pool.
- After each main training round, replay the last 3 hindsight examples through
model.train() with 3ร learning rate.
- The model's gradient on those examples is amplified, correcting the confident-wrong belief faster.
Why this works: standard SGD weights each example equally. But the examples that move the loss the most are the ones the model was most wrong about. Hindsight replay is a form of
hard negative mining โ bias the gradient toward examples where the model needs the most learning.
Pool is FIFO-capped at 100 examples to avoid memory bloat. Label smoothing applies during replay if enabled.