We don't trust our own backtest — or our own bot

AlgoProven Research · June 2026 · #8 in a series on why backtests lie · read #7

Two kinds of software lie to a trader: the backtest that grades a strategy before it is real, and the live bot that tells you what it just did. We have caught both lying — about our own strategies, on our own accounts. Here are three tests we run so the code can't fool us.

1. The backtest that read one bar into the future

Our trailing-stop exit ratcheted the stop using the current one-minute bar's high, then checked that same bar's low against it. But inside a single OHLC bar you don't know whether the high or the low printed first. Assuming high-before-low pins every trailing exit near the top of the bar — free money that does not exist.

We didn't debate it; we wrote a forensic A/B (atr_lookahead_test.py). The same strategy runs two ways: BUGGY (current-bar extremes) and FIXED (prior-bar extremes only). The honest result is bracketed between them. If FIXED still clears the baseline, the edge is real — just smaller. If FIXED collapses to the baseline, the "edge" was the artifact. Most of our flashier intraday results lived in that gap. It is the same lesson a locked fill-realism gate taught us elsewhere: notebook profit factors of 5–18 became 0.11 once fills were real, and a per-trade Sharpe of 3.3 became 0.5 on a live account.

2. The bot that booked a win on a losing trade

A live micro-Nasdaq trade logged +$229. The strategy had recorded its fill at the signal price. The broker had actually filled the entry about 90 points worse, and the exit went against us — the real result was −$128. Same trade, opposite sign.

The fix is a rule, not a patch: broker fills are authoritative; the bot's own accounting is a placeholder until the fill event arrives. A regression test (test_fill_reconciliation.py) replays that exact trade and asserts that once broker truth lands, the trade's P&L, the day's realized total, and the risk manager all read −$128 — not +$229.

3. The risk limit that almost didn't trip

Mis-reporting one trade is bad. Mis-reporting the day is dangerous. In one session the bot's internal math said +$600 on the day; the broker said −$1,500 — past a $1,000 daily-loss limit. Trusting its own number, the bot would have kept trading straight through a blown limit. A second regression test (test_broker_truth_risk.py) replays the scenario and asserts the bot halts on broker truth, stays halted, and reports how stale its broker data is, in seconds — so a frozen feed can't pass for "all clear."

In an execution system your internal state is a hypothesis; the broker is the ground truth. The job of a test suite is to prove the code believes the broker — especially when the broker brings bad news.

Why we test the boring paths

The cost of not testing execution shows up the way it always does — at 11 p.m., live. A retry loop with no backoff re-sent on every margin rejection, doubling position size each time: 1, 2, 4, 8, 16, 32 contracts, until 40% of its orders (95 of 236) were rejected in three minutes before it was caught. No happy-path backtest will ever show you that. A test that injects a rejection will.

This is the unglamorous half of "algo proven." The strategy research gets the headlines; the test suite is what lets us put real money behind it — and it's why our Control Center treats the broker, not the bot, as the source of truth.

Watch our bots live → · Join the beta list →