The size that passes your backtest can bust your account

AlgoProven Research · June 2026 · why backtests lie series · read the last one

Your backtest scores PnL. It does not score survival — and on a trailing-drawdown account those are two different games.

Here is the most expensive misread in prop trading. You run your strategy at 2 micros and again at 4 micros, and the 4-micro curve is exactly twice as tall. Same shape, same win rate, same profit factor — just bigger. The backtest says: size is free money. Double the size, double the result.

So you size up. And a few weeks into the evaluation, the account is gone — not because the edge stopped working, but because one ordinary losing night reached the trailing drawdown line. The backtest never warned you, because the backtest was measuring the wrong thing.

The dev's view vs. the trader's view

A backtest models profit: PnL = size × edge-per-contract. That relationship is linear, so on the PnL axis bigger size is strictly, boringly better. Nothing in that equation can ever tell you to size down.

But a funded/eval account does not pay out on PnL alone — it has a trailing maximum-loss line that follows your equity up and never moves down. Survival is a second, separate quantity: the probability that your open-trade drawdown touches that line on a given night. And that probability is not linear in size — it is convex. Going from 2 to 4 contracts does not double your bust risk; past a point it explodes it.

Size scales your profit linearly and your probability of ruin nonlinearly. A backtest plots the first curve and hides the second.

The ratchet — why "I'll just size down when I'm losing" doesn't save you

The intuitive defense is dynamic: "I'm only at full size when I have a big cushion; as the account loses, the distance to the line shrinks and I'll automatically be smaller." Half true — and the missing half is what gets people.

The trailing line ratchets up every time you print a new equity high. And a working edge makes money, so you print new highs often. Each new high pulls the line up behind you and resets your distance-to-line back toward its starting buffer — but now you are trading the larger, profit-grown size. So you do not visit the dangerous, near-the-line, max-size state once at the start. You return to it again and again, every time you bank profit. The cushion you earn is spent by the ratchet, and it parks you right back at the worst-case exposure.

The numbers (from our own test accounts)

In our simulations across nine years of index-futures data, the per-night probability of an open-trade trough touching the line — measured at the widest, full-buffer state, which the ratchet keeps revisiting — ran roughly 3–5% per night at an aggressive-but-"reasonable" size. That looks survivable. It is not, because an evaluation is not one night:

4.75% per night compounds to a ~62% chance of busting over 20 trading nights, ~86% over 40.
3.05% per night is still ~46% over 20 nights.
Trimming roughly one contract per leg pulled those same configurations down toward a ~0.5–0.8%/night band — a 4–6× cut in cumulative eval-death for a one-time, linear haircut on size.

Read that trade again: you give up ~25–50% of position size — which, on a daily-compounding edge, you barely notice against the monthly target — and in exchange you cut the probability of losing the whole account by several-fold. On an account where the defined risk is "time out / lose the fee," that is the most asymmetric trade on the board. And your backtest, scoring only PnL, would have told you to do the exact opposite.

What actually fixes it (and what doesn't)

The reflex is to reach for a tighter stop or a take-profit. Neither addresses this. A take-profit caps your upside on the days the edge works — and our own study found the profit peak arrives late in the hold, so capping early just donates expectancy. A tighter stop changes your per-trade loss, not the joint, overnight, correlated-drawdown exposure that touches a trailing line. The lever for survival is size — specifically, a survival-calibrated size computed from the distance to the line and the worst-case trough per contract, not from how confident you feel.

That is the whole reason a serious automation stack carries a size-down step: before every entry, ask "what is the largest size that still survives the worst night from here?" and never exceed it. The edge per contract is untouched — same entries, same holds, same profit factor — you are only refusing the contracts that convert a bad night into a dead account.

How to make your backtest stop lying about this

Model the line, not just the PnL. Carry the trailing-drawdown rule in the sim and record, per night, how close the open-trade trough came to it — not just the closing equity.
Report a bust rate, not an average. "Average return at size N" is the number that lies. "Probability the account is dead by night 20 at size N" is the number that decides whether you get paid.
Size the whole book jointly. Correlated longs trough on the same night; their risks add. Sizing each symbol as if it were alone is how a "safe" per-symbol book busts the joint line.
Stress the convex region. Sweep size upward and watch where the bust rate turns the corner. That corner — not the top of the PnL curve — is your real position limit.

Bottom line: the backtest answers "how much could I have made?" The account answers "did I survive long enough to get paid?" Those are different questions, and size is the variable that quietly separates them. Pick the size that survives the convex tail — the PnL takes care of itself.

Watch our bots live → · Join the beta list →

Educational content only; not financial advice and not a solicitation. Figures are drawn from hypothetical simulations and our own internal test/evaluation accounts, are not audited, and do not represent the results of any customer or live trading program. Hypothetical and simulated performance has inherent limitations (including hindsight and the absence of real execution risk) and is not indicative of future results. Trading futures carries substantial risk of loss. Per CFTC Rule 4.41, hypothetical results are not actual performance.