We tested 30 strategies. Two survived.
We spent a month running futures strategies through one unforgiving filter: a locked train/test split (2008–2017 to build, 2018–2026 to judge), real fills, and a hard rule — an edge has to be profitable in both windows. Most of what the internet, the academy, and the prop community sell as "edges" died on contact. Here is the graveyard, the two survivors, and the boringly simple book we built from them.
The graveyard
Nearly everything failed, and it failed the same way: strong in one window, dead or reversed in the other.
| Strategy | Verdict |
|---|---|
| Z-score pairs / statistical arbitrage | failed (train PF 0.72) |
| Commodity carry, cross-sectional momentum | weak / failed |
| Calendar effects (turn-of-month, OPEX, Santa) | a measurement artifact |
| Gap fade/follow, opening-range breakout | regime-flip 2018 |
| Overnight-intraday "reversal everywhere" | train Sharpe 2.25 → test −0.38 |
| Intraday momentum (last half-hour) | decayed post-publication |
| Relative-value spreads, futures value | flat / negative out of sample |
| Liquidity sweeps (without order-flow data) | noise on bars |
This is not a failure of effort — it is the post-publication decay the academic literature itself documents. Published edges get arbitraged. A strategy that shows a profit factor of 3 in one decade and 0.7 in the next was never an edge; it was a regime.
Two things lived
| Survivor | Mechanism | Sharpe (test) |
|---|---|---|
| IBS mean-reversion (equity index) | buy the close near the day's low | ~1.5 |
| Vol-managed trend (broad basket) | follow persistent moves, sized by volatility | ~0.65 |
Neither is exotic. Both have an economic story — one bets that weakness reverts, the other that strength persists. And critically, they earn money in opposite conditions.
The only thing that actually compounded
Here is the lesson that took 30 tests to earn. Adding more correlated edges to a book does nothing — we tried stacking several mean-reversion variants and the combined Sharpe just averaged down. But IBS (mean-reversion) and trend are uncorrelated — their return streams have a correlation of essentially zero. Combine them and the book Sharpe rises above either one alone, in both windows.
Diversification only works when the mechanisms are genuinely different. Ten flavors of the same long-bias bet is not a portfolio. A mean-reversion sleeve and a trend sleeve, each thin on its own, combine into something steadier than either — because when one is suffering, the other is usually working.
Then simple beat clever — twice
With uncorrelated sleeves in hand, we tried to be sophisticated about combining them. Two results, both humbling:
Fancy weighting didn't help. Sharpe-weighted risk parity, inverse-vol, shrunk Markowitz — all landed within a rounding error of plain equal-risk weighting. When sleeves are uncorrelated and volatility-scaled, equal-risk is already near-optimal, and the clever methods just add estimation error.
Regime-switching actively hurt. We built a "measure the regime, don't predict it" trend-strength signal to tilt between trend and mean-reversion. Static 50/50 beat it out of sample (0.75 vs 0.64). The literature warns that most regime-switching is overfit; our gate confirmed it on the spot.
What we actually built
A diversified futures book that is almost aggressively boring: a handful of uncorrelated, economically-grounded sleeves, each volatility-scaled, combined at equal risk, with a single portfolio volatility target. No regime magic, no optimizer, no holy grail. Modest return, controlled drawdown, and — the part that matters — we know exactly why each piece is there and what would have to break for it to stop working.
Along the way the gate caught our own mistakes too: a calendar "edge" that was a day-shift bug, a mean-reversion variant whose per-trade Sharpe of 3.3 collapsed to a real account Sharpe of 0.5. Even an ex-AHL author shipped a book with a broken mean-reversion backtest. Everyone's backtest lies a little. The discipline is the product.