ML Strategy— how it works
An XGBoost ensemble trained on ~78 features — options flow, technicals, volatility, regime state, sentiment — that learns to rank stocks by expected forward 10-day return. Trades in parallel with the Composite Strategy so we can measure whether ML actually outperforms the rule-based signal.
The core idea
Each day the model looks at every ticker in the universe and predicts its 10-day forward return minus that day's cross-sectional median. That's the "is this stock going to beat its peers?" target — exactly the signal a long-short portfolio consumes.
We train the model in a strict walk-forward fashion: on any given day, the model has only seen data from earlier dates. This prevents look-ahead bias. The model retrains every 3 months on a rolling 24-month window.
After ranking every ticker, the top 10 become long positions and the bottom 10 short positions — same portfolio construction as Composite, different signal source.
What makes this different from Composite
- 78 features vs 5. ML sees every dimension of options flow plus technicals (RSI, Bollinger, ATR), volatility context, regime state, and analyst/congress sentiment.
- 5-day hold vs 10-day. Tested hold periods of 3-20 days; 5 was the empirical sweet spot for the ML signal.
- Continuous score vs rule-based rank. Composite averages 5 percentile ranks; ML outputs a single learned score per ticker.
- 5-seed ensemble. Five XGBoost models with different random seeds get averaged. This reduces seed-specific noise and in tests cut max drawdown from -15% (single seed) to -6% (ensemble).
Feature set (78 total)
The model is fed features across 6 families:
- Options flow (27 features) — deep-OTM call volume z-scores, PCR ranks, moneyness skew, block trade share
- Combo features (14) — engineered products like "unusual calls × breakout" or "deep OTM × coiled setup"
- Technicals (18) — RSI, Bollinger position, ATR regime, SMA distance, breakout flags, ADX
- Regime / macro (7) — VIX level, beta to SPY, bull/bear flags, BTC momentum, copper returns
- Sentiment / fundamentals (8) — analyst buy ratios, congress trading, buyback activity, earnings beat streaks
- Volatility (4) — 0DTE, 1-week, 1-month implied-vol
Ablation tests showed options flow is the biggest driver — dropping it costs more Sharpe than any other family.
Validation
Before declaring the model "worth running", it had to pass:
- Walk-forward validation — 8 rolling 3-month OOS windows from 2024-04-21 to 2026-04-20, never training on data after the test date.
- Seed robustness — 5 random-seed models, mean Sharpe +2.78 with std 0.25 (tight distribution).
- Transaction cost stress — Sharpe +2.81 at 25bps slippage, +1.78 at 50bps. Edge survives realistic execution costs.
- Factor decomposition — residual alpha of +91.7% annualized (t-stat +4.40) after controlling for market, size, and sector factors. R² of 0.013 means 98.7% of variance is idiosyncratic — not factor beta in disguise.
Backtest headlines (OOS 2024-04-21 → 2026-04-20)
- Sharpe ratio: +3.48
- Annualized return: +97.8%
- Max drawdown: -6.3%
- Win rate: 52.3% of closed trades
- Top-decile vs bottom-decile spread: ~62% annualized long-short alpha
Caveat: the OOS window was heavily 2024-dominated (regime active 90% of days in 2024 vs 2% in 2025). Real-world performance depends on how often VIX stays below 20.
What this is NOT
- Not trading real money. The ML book is a pure $100k simulation — no Tradier orders are submitted. We're collecting shadow-mode data to see if the backtest result holds in live market conditions.
- Not a black box. The full feature list, training methodology, and backtest code are in the project repository. We can explain every pick.
- Not guaranteed to generalize. The OOS window leans on 2024's calm market. Strategy needs to survive regime changes we haven't seen in training.
The research journey
The current ML config came out of 11 rounds of iterative experiments:
- ML v1 with 5 features → failed (Sharpe 0.19 OOS, pure noise)
- ML v2 with 40 features + walk-forward → Sharpe 2.40, but -15% drawdown
- v3 testing 78 features + relative-return target → Sharpe 2.47, -8.8% drawdown
- v4 testing hold periods → found 5-day beats 10-day by +0.82 Sharpe
- v5 matrix of all combinations → Sharpe 3.48 with -6.3% drawdown (champion)
- v6 verification (seed variance, cost, sub-periods) → robust
- v7 stop-loss / multi-hold / confidence sizing → no improvement
- v8 LightGBM ensemble → LGBM actually hurt
- v9 both-agree, earnings filter, stacking → all failed
- v10 disagreement, vol-timing, ablation, ticker-split → all failed
- v11 drop-combos + ticker-ensemble → combos matter, ticker-split had leakage
We tested ~60 variants. The v5 champion survived them all. Further improvements likely require new data sources (larger universe, intraday features, short interest) rather than more algorithmic tweaks.
Important disclaimers
- Not financial advice. A free research dashboard.
- Shadow mode only — no trades executed. The equity curve is a simulation starting at $100k.
- The regime filter (VIX<20, SPY>200-day-SMA) means the strategy is dormant most of the time in volatile regimes. Expect long flat stretches.
- Past performance does not guarantee future results.