Brain Proof — bpleone / trade

🎯 Worker — held-out test results (real signal vs noise)

The worker bootstrap trains on 80% of historical data and tests on the remaining 20% the model NEVER saw. Two splits: random shuffle (stationary upper bound) and walk-forward (honest "trained on past, predicts future" test).

Random 80/20 split — stationary estimate

Test set size

—

Accuracy

—

Brier Skill Score

—

vs constant 0.5 baseline

p-value vs random

—

Walk-forward split — honest trading test

Test set size

—

Accuracy

—

Brier Skill Score

—

vs constant 0.5 baseline

p-value vs random

—

BSS reading guide: >+0.05 = strong calibration vs the base-rate baseline · 0 to +0.05 = weak · ≤0 = below baseline. BSS is calibration skill, not the same as beating market drift — see the drift-vs-skill verdict above.
p <0.05 = statistically significant. Walk-forward is the truth-test. If random says REAL SIGNAL but walk-forward says BELOW BASELINE, the model is overfitting / leaking future info.

🏆 Champion / Challenger — the Brain Trying New Things

Each bootstrap the brain trains 4 model variants (different regularization + training depth), races them on data none of them trained on, and promotes the winner as the live model. The champion's config then drives all live training until the next bootstrap unseats it. This is genuine autonomous experimentation — the brain keeps what works and discards what doesn't.

config	L2	epochs	val acc	val BSS	champion

📊 Live Resolutions — Post-Deployment Performance

Captures that resolved at the brain's actual prediction horizon (5d). This is the truth test after deployment. The backtest found mostly drift, not a statistically proven timing edge — this card is the live forward-test on real future data the model never saw. A positive BSS means better-than-baseline calibration; the accuracy figure still includes market drift, so read it as provisional, not a proven edge.

Live resolved

—

Accuracy

—

Brier

—

BSS

—

Captures pending resolution: —. Resolutions take 5 trading days from capture.

🧠 Live Brain Picks — Sorted by Conviction

The Cloudflare worker captured these in the last 24h. Each symbol shown ONCE (most recent prediction). Conviction = |predProb − 0.5| — how far from neutral. Higher = more confident. Sort and filter: only shows predictions with conviction ≥ 0.05 (model has at least a small opinion).
The underlying model's walk-forward accuracy is mostly market drift (~52% base rate); the timing skill on top is small and not statistically significant. These are calibrated probabilities (Platt-applied when fit is healthy) — a conviction score, not a promise the call wins.

🎯 Per-Symbol Breakdown — heldout accuracy by name

The held-out test pairs broken down by symbol. Per-symbol samples are small (tens of calls), so these numbers are noisy and the extremes are partly luck — the best names out of 72 will look strong by chance. The brain's only proven-level result is the overall walk-forward number (mostly drift, no significant timing edge), not any single name. Treat this as exploratory, not a trade list. Reads /brain/symbols on the worker.

Top 5 by heldout accuracy (small-sample — not a proven per-name edge)

Bottom 5 by heldout accuracy (small-sample)

Show full per-symbol table

sym	heldout n	acc	BSS	live n	live acc

📜 Weight Ledger — proves weights are changing over time

Each row is a SHA-style hash of the model's weight vector at that moment. If two hashes are different, the brain trained between them. If they're all identical, the brain isn't learning — flag a bug.

Last 20 page-loads. New row added every time you visit this page.

⚗ Live Verification — inject a test prediction

Click below to inject a known prediction (AAPL LONG, prob=0.85) into the journal. Then watch /brain-debug tick — within 30s the new entry shows up. After the short-horizon (24h) elapses it resolves automatically; OR click "Force resolve" to immediately simulate the resolution against current AAPL price.

🧮 Live Accuracy — recomputed from raw journal

Recomputes from bpleone_pred_journal_v1 directly. If this disagrees with the number shown on brain-truth.html, there's a stale cache somewhere — flag it.

Total resolved

—

Correct (recomputed)

—

Accuracy (recomputed)

—

vs baseline 50%

—

📡 Recent Activity Stream — last 20 events

Live feed showing the brain capturing predictions and resolving them. If this freezes, the brain isn't ticking.

⚠ What this page does NOT prove

This page proves the mechanics work. It does NOT prove the brain is making GOOD predictions — only that it's capturing, resolving, and training. Whether the predictions actually beat random is a separate question (see /brier-skill and /sharpe-ratio).

It also does not prove the input features are correct — only that whatever-was-captured trains the model. If you suspect bad features, run /self-test which checks the FeatureExtractor's output shape and bounds.

For a paper-trading audit (did open/stop/target prices match real moves?), run /daily-replay for any past day.