Building a Bayesian Blender: Early Results, Upgrades, and Where This Could Go
Part 12 below talks about my early attempts at building a Bayesian blender
This is part 12 of my series — Building & Scaling an Algorithmic Trading Platform.
The Case for Bayes
I’ve been slowly building out the allocator (the long–short engine), the volatility sleeve (my convexity sidecar), and a basic volatility-regime framework around it. But there’s still a missing piece: How do I intelligently blend multiple sleeves without hardcoding rules?
Right now I use a pretty simple blend:
allocator as the core
vol sleeve small and stable
maybe a hedge someday
But what I really want — eventually — is something that can answer:
“Which sleeve deserves more weight today?”
“Is the allocator behaving like it normally does?”
“Is the vol sleeve about to shine or about to bleed?”
“Should I hold more cash?”
So I started hacking together a Bayesian blender.
This post is about the first results, the upgrades, and where I think this thing can actually go.
1. The First Version: Simple Bayesian Weights
The idea was straightforward:
Each day, estimate the probability that the dual sleeve or the volatility sleeve will outperform (or at least produce positive returns), and then weight them accordingly.
Nothing fancy — simple Gaussian Naive Bayes with a few features, a small warmup window and a daily walk-forward.
And surprisingly, it worked better than expected.
Initial 1,028-day run:
Dual: ROI 241.08%, CAGR 35.09%, Sharpe 1.41, MaxDD –19.47%
Vol: ROI 839.43%, CAGR 73.17%, Sharpe 0.75, MaxDD –28.04%
Bayes Combo: ROI 669.23%, CAGR 64.89%, Sharpe 0.93, MaxDD –21.41%
Avg Weights: dual 59.21%, vol 40.79%
It didn’t beat the vol sleeve outright, but it created a much smoother version of the combined curve. And the weights kind of made sense — generally preferring the allocator, with bursts of vol sleeve exposure.
That alone was a good sign.
2. Then I Ran It Over a Full 10-Year Window
Same idea, but with more data and a slightly longer horizon.
10-year run (1,228 days of evaluation):
Dual: ROI 292.70%, CAGR 32.41%, Sharpe 1.35
Vol: ROI 915.72%, CAGR 60.92%, Sharpe 0.70
Bayes Combo: ROI 932.95%, CAGR 61.47%, Sharpe 0.95
Avg Weights: dual 60.14%, vol 39.86%
Again: smoother, better-behaved than vol-only, and still more aggressive than the allocator alone.
The model had real signal — so I kept pushing.
3. Upgrading the Blender (Logistic Regression + Cash Gate)
Next experiment:
Swap out Naive Bayes for logistic regression, give it better features, and introduce a cash gate so the model can say “you know what, I don’t like either sleeve today.”
This version was the first one that felt like a real step toward a flexible sleeve allocator — not just a toy.
Full 10-year run (logistic model with cash option):
Dual: ROI 292.70%, CAGR 32.41%
Vol: ROI 915.72%, CAGR 60.92%
Combo (logreg): ROI 1,731.41%, CAGR 81.61%, Sharpe 1.13, MaxDD –20.77%
Avg Weights: dual 46.05%, vol 35.14%, cash ~18.8%
This is a huge jump in ROI — and not an illusion. The model really did push into the vol sleeve at the right times, lean into the allocator when conditions stabilized, and go into cash during the messy middle.
Is it too good to be true? Maybe. It definitely needs more stress tests and more robust testing criteria.
But the behavior makes sense based on the weights and the market periods it responded to.
4. A Shorter Sample Test (1,028 days) With the Same Setup
To check that I wasn’t just overfitting the 10-year window, I ran a shorter one:
1,028-day logreg run (no cash gate, warmup 250 days):
Combo ROI: 1,160.43%
CAGR: 86.11%
Sharpe: 1.11
MaxDD: –20.77%
This is still strong, still in the realm of “believable / feasible” and definitely still worth pursuing.
5. What I Added (and What’s Next)
I implemented three things from the upgrade list and they made a noticeable difference:
1. Logistic model with better feature engineering
realized vol changes
rolling drawdowns
cross-autocorrelation
lagged dual vs lagged vol behavior
vol z-score × trend interactions
2. Cash gate
If both sleeves look bad → probabilities low → weight on cash.
This alone fixed a ton of whipsaw.
3. Weight smoothing + caps
no insane swings
slower transitions
cleaner equity curve
This is why the model feels less like a manic DJ and more like a real allocator.
6. What’s Still on the List (The “Next Version” Ideas)
Here’s what I want to try next, once I’ve validated the logistic model:
Model upgrades
Bayesian/logistic regression with priors
probability calibration (Platt/ISO)
forgetting factors (recent data matters more)
hierarchical model that includes a crude regime indicator
Feature expansion
breadth / stress proxies
skew metrics
exponential drawdown windows
more lag interactions
Decision logic improvements
expected return × probability (not just probability)
smoother weight changes
better cash gating
tail-aware weighting (more vol sleeve in stress)
Validation
multi-horizon walk-forward
cost/slippage modeling
compare against static blends
Implementation tweaks
horizon flags
weight caps
smoothing windows
posterior dump (to analyze what the model “thinks”)
NB baseline for sanity checks
All of these should make the blender more stable without making it a black box, which I really want to avoid.
(As an aside, the last thing I want is a model that “works” but I am unable to tease apart the weights and the reason why it works…)
7. Why I’m Building This at All
I want to be clear about my objectives here. I don’t want a “super learner.” And I don’t want a meta-model that tries to be smarter than the strategies it blends.
What I want is a simple mechanism that:
Understands when each sleeve tends to shine
Understands when both should chill
Smooths out volatility
Reduces drawdowns without killing upside
Doesn’t hallucinate false precision
Doesn’t require a 20-server research cluster
Plays nicely with volatility-regime tags
Leaves room for human oversight
At the end of the day, I just want a GPS that nudges the system to drive in the right direction, not one that tries to drive and gets lost in the rotaries of New England.
And so far — honestly — this Bayesian blender feels like the first step in that direction.
Closing Thoughts
This isn’t “production-ready” yet. It needs more testing, cost modeling, sensitivity analysis, and probably a few months of shadow live trading before I trust it.
But it’s the most promising thing I’ve built outside the allocator itself.
The big next steps:
tune the cash gate
refine the feature space
test against multiple horizons
add simple macro/vol regime features
integrate with the allocator’s existing sized positions
Slowly, this system is becoming more adaptive — not reactive, not overfitted, but context-aware.
And if I do this right, the final version won’t feel like a fancy ML model. It’ll feel like a smoother, smarter, more patient version of the allocator — one that knows when to step on the gas, when to lean into volatility, and when to just sit in cash and wait.
The information presented in Math & Markets is not financial advice and should not be construed as such.


