What this does: Pulls the last N trading days of real Stooq bars for every symbol in the universe. For each historical day, asks the current trained brain "would you trade this?" by computing features and getting a probability. If the calibrated probability is above 0.55 (long) or below 0.45 (short), simulates a 1R-risk trade and tracks the realized outcome by next-day return.
Why this is the real test: Train accuracy doesn't matter if you don't make money. This backtest produces:
- Total R: sum of R-multiples realized โ positive = profitable system
- Sharpe: risk-adjusted return โ >1 is decent, >2 is excellent
- Max drawdown: worst peak-to-trough โ should stay under 30%
- Win rate ร avg R: edge per trade โ must be positive after fees
Compared against random-trade baseline (50/50 coin flip) and buy-and-hold SPY.