The principle: the 22-feature logistic regression model is parametric โ it compresses everything into 22 weights. Useful for generalization, but throws away the memory of specific past situations.
k-NN recall fixes this: for every new prediction, find the K=10 most-similar past RESOLVED predictions (Euclidean distance, feature-importance weighted) and look at their realized outcomes. If 8 of 10 similar past cases won, k-NN votes 80%.
Blended output:
final_prob = 0.7 ร model_prob + 0.3 ร knn_prob
Why this catches what the model misses: the parametric model averages over all training data. If your most-recent 5 NVDA setups looked exactly like THIS setup and 4 of 5 won, k-NN sees that. The averaged model might dilute it with 100 other unrelated NVDA setups.
For each live symbol, the K=10 most-similar past resolved predictions and their outcomes. The k-NN probability is the inverse-distance-weighted win rate among those neighbors.
ฮฑ = 0.7 (default): model carries 70% of the weight, k-NN gets 30%. The model dominates for general patterns; k-NN provides a sanity check from memory.
When k-NN disagrees strongly with the model: ensemble agreement scorer picks up the divergence and reduces overall confidence. So the two systems naturally check each other.
Distance metric: weighted Euclidean. Each feature's weight comes from FeatureImportance.lrMultiplier โ so the high-alpha features dominate the similarity calculation. Noise features barely matter.
Why not always trust k-NN? Need enough history. Until 5+ resolved predictions exist, k-NN returns null and the system uses the model alone. After 50+, k-NN gets meaningful and starts catching pattern-specific signal the parametric model dilutes.