Hindsight Replay — bpleone / brain

📚 How hindsight replay works

Most resolutions are unsurprising — the model said 70%, the trade went the right way, you train on it and move on. But occasionally the model says 85% confidently, then the trade goes against it. Those are the examples that contain the most learning signal — the model has a confident wrong belief that needs to be corrected.

The algorithm:

On every resolution, check if |predicted - 0.5| > 0.20 AND the prediction was wrong.
If yes, add the (features, predProb, actualLabel) tuple to the hindsight pool.
After each main training round, replay the last 3 hindsight examples through model.train() with 3× learning rate.
The model's gradient on those examples is amplified, correcting the confident-wrong belief faster.

Why this works: standard SGD weights each example equally. But the examples that move the loss the most are the ones the model was most wrong about. Hindsight replay is a form of hard negative mining — bias the gradient toward examples where the model needs the most learning.

Pool is FIFO-capped at 100 examples to avoid memory bloat. Label smoothing applies during replay if enabled.