We don't trust our own backtest — or our own bot
Two kinds of software lie to a trader: the backtest that grades a strategy before it is real, and the live bot that tells you what it just did. We have caught both lying — about our own strategies, on our own accounts. Here are three tests we run so the code can't fool us.
1. The backtest that read one bar into the future
Our trailing-stop exit ratcheted the stop using the current one-minute bar's high, then checked that same bar's low against it. But inside a single OHLC bar you don't know whether the high or the low printed first. Assuming high-before-low pins every trailing exit near the top of the bar — free money that does not exist.
We didn't debate it; we wrote a forensic A/B (atr_lookahead_test.py). The same strategy runs
two ways: BUGGY (current-bar extremes) and FIXED (prior-bar
extremes only). The honest result is bracketed between them. If FIXED still clears the baseline, the edge is
real — just smaller. If FIXED collapses to the baseline, the "edge" was the artifact. Most of our flashier
intraday results lived in that gap. It is the same lesson a locked fill-realism gate taught us elsewhere:
notebook profit factors of 5–18 became 0.11 once fills were
real, and a per-trade Sharpe of 3.3 became 0.5 on a live account.
2. The bot that booked a win on a losing trade
A live micro-Nasdaq trade logged +$229. The strategy had recorded its fill at the signal price. The broker had actually filled the entry about 90 points worse, and the exit went against us — the real result was −$128. Same trade, opposite sign.
The fix is a rule, not a patch: broker fills are authoritative; the bot's own accounting is a placeholder
until the fill event arrives. A regression test (test_fill_reconciliation.py) replays that exact
trade and asserts that once broker truth lands, the trade's P&L, the day's realized total, and the risk
manager all read −$128 — not +$229.
3. The risk limit that almost didn't trip
Mis-reporting one trade is bad. Mis-reporting the day is dangerous. In one session the bot's internal math
said +$600 on the day; the broker said −$1,500 — past a
$1,000 daily-loss limit. Trusting its own number, the bot would have kept trading straight through a blown
limit. A second regression test (test_broker_truth_risk.py) replays the scenario and asserts the
bot halts on broker truth, stays halted, and reports how stale its broker data is, in seconds — so a frozen
feed can't pass for "all clear."
In an execution system your internal state is a hypothesis; the broker is the ground truth. The job of a test suite is to prove the code believes the broker — especially when the broker brings bad news.
Why we test the boring paths
The cost of not testing execution shows up the way it always does — at 11 p.m., live. A retry loop with no backoff re-sent on every margin rejection, doubling position size each time: 1, 2, 4, 8, 16, 32 contracts, until 40% of its orders (95 of 236) were rejected in three minutes before it was caught. No happy-path backtest will ever show you that. A test that injects a rejection will.
This is the unglamorous half of "algo proven." The strategy research gets the headlines; the test suite is what lets us put real money behind it — and it's why our cockpit treats the broker, not the bot, as the source of truth.