We tested 30 strategies. Two survived.

AlgoProven Research · June 2026 · #5 in a series on why backtests lie · read #4

We spent a month running futures strategies through one unforgiving filter: a locked train/test split (2008–2017 to build, 2018–2026 to judge), real fills, and a hard rule — an edge has to be profitable in both windows. Most of what the internet, the academy, and the prop community sell as "edges" died on contact. Here is the graveyard, the two survivors, and the boringly simple book we built from them.

The graveyard

Nearly everything failed, and it failed the same way: strong in one window, dead or reversed in the other.

StrategyVerdict
Z-score pairs / statistical arbitragefailed (train PF 0.72)
Commodity carry, cross-sectional momentumweak / failed
Calendar effects (turn-of-month, OPEX, Santa)a measurement artifact
Gap fade/follow, opening-range breakoutregime-flip 2018
Overnight-intraday "reversal everywhere"train Sharpe 2.25 → test −0.38
Intraday momentum (last half-hour)decayed post-publication
Relative-value spreads, futures valueflat / negative out of sample
Liquidity sweeps (without order-flow data)noise on bars

This is not a failure of effort — it is the post-publication decay the academic literature itself documents. Published edges get arbitraged. A strategy that shows a profit factor of 3 in one decade and 0.7 in the next was never an edge; it was a regime.

Two things lived

SurvivorMechanismSharpe (test)
IBS mean-reversion (equity index)buy the close near the day's low~1.5
Vol-managed trend (broad basket)follow persistent moves, sized by volatility~0.65

Neither is exotic. Both have an economic story — one bets that weakness reverts, the other that strength persists. And critically, they earn money in opposite conditions.

The only thing that actually compounded

Here is the lesson that took 30 tests to earn. Adding more correlated edges to a book does nothing — we tried stacking several mean-reversion variants and the combined Sharpe just averaged down. But IBS (mean-reversion) and trend are uncorrelated — their return streams have a correlation of essentially zero. Combine them and the book Sharpe rises above either one alone, in both windows.

Diversification only works when the mechanisms are genuinely different. Ten flavors of the same long-bias bet is not a portfolio. A mean-reversion sleeve and a trend sleeve, each thin on its own, combine into something steadier than either — because when one is suffering, the other is usually working.

Then simple beat clever — twice

With uncorrelated sleeves in hand, we tried to be sophisticated about combining them. Two results, both humbling:

Fancy weighting didn't help. Sharpe-weighted risk parity, inverse-vol, shrunk Markowitz — all landed within a rounding error of plain equal-risk weighting. When sleeves are uncorrelated and volatility-scaled, equal-risk is already near-optimal, and the clever methods just add estimation error.

Regime-switching actively hurt. We built a "measure the regime, don't predict it" trend-strength signal to tilt between trend and mean-reversion. Static 50/50 beat it out of sample (0.75 vs 0.64). The literature warns that most regime-switching is overfit; our gate confirmed it on the spot.

What we actually built

A diversified futures book that is almost aggressively boring: a handful of uncorrelated, economically-grounded sleeves, each volatility-scaled, combined at equal risk, with a single portfolio volatility target. No regime magic, no optimizer, no holy grail. Modest return, controlled drawdown, and — the part that matters — we know exactly why each piece is there and what would have to break for it to stop working.

Along the way the gate caught our own mistakes too: a calendar "edge" that was a day-shift bug, a mean-reversion variant whose per-trade Sharpe of 3.3 collapsed to a real account Sharpe of 0.5. Even an ex-AHL author shipped a book with a broken mean-reversion backtest. Everyone's backtest lies a little. The discipline is the product.