Weights = each horizon's edge above coin-flip in the current regime, normalized. A horizon with no signal (โค50% accuracy) gets weight 0.
Symbols where the model is least confident (prediction in [40%, 60%]). These get re-captured every 5 minutes instead of 30, so the brain explores its weak spots faster.
Multi-horizon (idea #1): three models in parallel. When you ask for a prediction, you get a weighted blend where the weights reflect each horizon's recent performance in THIS regime. If the 5-day model is crushing in choppy markets and the 1-day model fails, the ensemble auto-routes to the 5-day.
Reward shaping (idea #2): training samples are weighted by |R-multiple|. A correct prediction on a +3R move trains the model 3ร harder than a correct prediction on a 0.5R move. The brain learns to care about BIG, ACTIONABLE moves more than tiny drifts.
Active learning (idea #3): when the model says "I'm 50/50 confident" on a symbol, that prediction has the highest information value when resolved. So those symbols get re-captured 6ร more often (5min vs 30min cooldown), accelerating learning where it matters most.
Feature alpha (idea #4): every resolved prediction is back-attributed to which features contributed most to the logit. Over thousands of predictions, this surfaces which features are REAL predictors vs noise. The brain can then prune low-alpha features or up-weight high-alpha ones.