← back to writing · February 2026 · 12 min read

The Kelly Criterion: How Information Theory Tells You How Much to Bet

The most elegant formula in finance connects three fields that seem to have nothing to do with each other — and most people who use it get it wrong.

In 1956, a physicist at Bell Labs named John L. Kelly Jr. published a paper with a dry title: "A New Interpretation of Information Rate." It appeared in the Bell System Technical Journal, wedged between papers on signal processing and telephone switching circuits.

The paper nearly didn't get published. AT&T management worried it was about gambling. They weren't wrong — but they were missing the point. Kelly had discovered something far more fundamental: a mathematical law connecting the quality of your information to the rate at which you can grow wealth.

That law, now known as the Kelly criterion, is arguably the most important formula in the history of quantitative finance. And yet, most people who've heard of it understand it only superficially — as a betting formula. The real story is about information, entropy, and the deep structure of compound growth.


The Setup: A Gambler with a Private Wire

Kelly's paper posed a specific problem. Imagine a gambler who receives information about horse races through a noisy private channel — think of an insider tip that's correct, say, 70% of the time. The gambler knows the channel is noisy. The question: how much of their bankroll should they bet on each race to maximize long-term wealth?

This seems like a straightforward probability question. But the naive answers are all wrong:

Kelly solved it exactly. And his answer turned out to be connected to Claude Shannon's information theory in a way that nobody expected.

The Formula

For a simple bet where you either win or lose, the Kelly criterion says:

The Kelly Fraction
f* = (bp − q) / b
f* = fraction of your bankroll to bet
b = net odds (e.g., b = 1 for even money, b = 2 for 2-to-1)
p = probability of winning
q = 1 − p = probability of losing

For even-money bets (b = 1), this simplifies to something remarkably clean:

Even-money Kelly
f* = 2p − 1
If you have a 60% chance of winning an even-money bet, Kelly says bet 20% of your bankroll.
If you have a 55% edge, bet 10%. If you have a 51% edge, bet 2%.
If you have no edge (p = 0.5), bet zero.

This can be stated even more intuitively: f* = edge / odds. Your bet size should be proportional to your edge and inversely proportional to the odds. The bigger your informational advantage, the more you bet. The larger the potential loss, the less you bet.

Why Logarithms? The Math of Compounding

The derivation reveals why Kelly works. The key insight is that wealth grows multiplicatively, not additively.

If you start with bankroll \(W\) and make \(n\) bets, each wagering fraction \(f\), your wealth after \(n\) bets is:

Multiplicative wealth growth
Wn = W0 × (1 + bf)wins × (1 − f)losses

Taking the logarithm converts multiplication to addition:

Log growth rate
G(f) = p · log(1 + bf) + q · log(1 − f)
By the law of large numbers, this converges to your long-run growth rate.
Kelly maximizes G(f) by taking the derivative, setting it to zero, and solving for f.

Setting G'(f) = 0:

Derivation
pb / (1 + bf) − q / (1 − f) = 0
Rearranging: pb(1 − f) = q(1 + bf)
→ pb − pbf = q + qbf → pb − q = fb(p + q) = fb
f* = (bp − q) / b

This is elegant, but the real insight isn't the formula itself — it's that the growth rate function G(f) is concave. There's a single peak. Bet less than f*, you grow slower than optimal. Bet more than f*, you also grow slower than optimal. Bet exactly 2× Kelly, and your expected growth rate drops to zero. Bet more than that, and you're expected to go broke.

The most counterintuitive result in all of finance: betting more can make you grow slower. Above the Kelly fraction, increasing your bet size actively destroys wealth.

The Information Theory Connection

Here's where it gets beautiful. Kelly didn't just solve a gambling problem — he proved a deep theorem about the relationship between information and money.

Kelly showed that the optimal growth rate of your wealth is exactly equal to the information rate of your private channel — in Shannon's precise, mathematical sense.

Kelly's Theorem

If you receive information through a channel with capacity C bits per transmission, then the maximum achievable growth rate is:

G* = C (in bits, or equivalently Wn ≈ 2nC)

Your rate of wealth accumulation is literally bounded by the quality of your information. Not your courage, not your conviction — your edge, measured in bits.

This connects Shannon entropy, mutual information, channel capacity, and financial returns into a single framework. It means:

This is profound. It means that how fast you can compound wealth is a question of information theory, not bravery or conviction or gut feeling. It's physics.

A Concrete Example

The biased coin

You're offered a game. A coin comes up heads 60% of the time. It pays even money. You start with $1,000. How should you bet?

Kelly says: f* = 2(0.6) − 1 = 0.20. Bet 20% of your current bankroll each round.

If you bet 20% (Kelly): After 100 bets, your expected log growth rate is G = 0.6·log(1.2) + 0.4·log(0.8) ≈ 0.020 per bet. Your bankroll roughly doubles every 35 bets. After 250 bets, expected wealth: ~$150,000.

If you bet 40% (2× Kelly): G = 0.6·log(1.4) + 0.4·log(0.6) ≈ 0.000. Growth rate is zero. You go nowhere on average despite having a 60% edge. Your bankroll just random-walks.

If you bet 10% (half Kelly): G ≈ 0.75 × G(f*). You grow at 75% the rate of full Kelly, but with much smaller drawdowns. After 250 bets, expected wealth: ~$50,000. Slower, but far smoother.

If you bet 100% (all-in): First tails and you have $0. Game over. Probability of surviving 100 rounds: 0.6100 ≈ 6.5 × 10−23. Effectively zero.

The 2× Kelly result is the one that shocks people. You have a genuine, known 60% edge — and betting 40% of your bankroll gives you zero expected growth. The arithmetic mean of your returns is positive, but the geometric mean — the one that governs compounding — is not.

This is the difference between additive thinking and multiplicative thinking. In a single bet, 40% is fine. Over hundreds of compounding bets, it's catastrophic.

Interactive: Kelly bet sizer
Full Kelly
20.0%
Half Kelly
10.0%
Growth rate (full)
2.02%
Growth rate (half)
1.51%

Edward Thorp: From Blackjack to Wall Street

The Kelly criterion might have stayed buried in information theory journals if not for Edward O. Thorp, an MIT mathematics professor who read Kelly's paper and immediately saw its practical power.

In 1962, Thorp published Beat the Dealer, the first mathematically rigorous system for beating blackjack through card counting. Card counting gave Thorp an estimate of his edge on each hand. Kelly told him how much to bet. The combination was devastating — casinos eventually banned him and changed their rules.

But Thorp's real impact was in finance. In 1969, he founded Princeton Newport Partners, one of the first quantitative hedge funds. The fund used options pricing models (Thorp independently derived what we now call the Black-Scholes formula, years before Black and Scholes published), statistical arbitrage, and — crucially — Kelly-based position sizing.

Princeton Newport Partners generated 15.1% annualized returns with minimal drawdowns for nearly 20 years before closing in 1988 (due to a legal issue involving a partner, not investment losses). Thorp's track record remains one of the most consistent in hedge fund history.

Thorp advocated fractional Kelly — typically half-Kelly — in practice. His reasoning was characteristically precise: in the real world, you never know your exact edge. If you overestimate your edge and bet full Kelly based on the wrong number, you're actually betting above the true Kelly — which, as we saw, can be catastrophic. Half-Kelly provides a buffer against estimation error.

Shannon's Secret Portfolio

Kelly's colleague at Bell Labs was none other than Claude Shannon — the father of information theory, arguably the most brilliant American scientist of the 20th century.

Shannon was deeply interested in the Kelly criterion. He and Thorp became friends, and in 1961, they built one of the first wearable computers — a shoe-mounted device designed to predict roulette outcomes by measuring the ball's speed and deceleration. It worked, though it was never deployed at scale.

More significantly, Shannon applied information-theoretic thinking to his own investments. Though he was private about his portfolio, records show he achieved approximately 28% annualized returns over 30 years — a track record that rivals the best professional fund managers.

The connection is almost poetic: the man who invented the mathematical theory of information used that theory to inform his investment sizing, and it worked spectacularly.

Kelly in the Continuous World: The Portfolio Connection

For a portfolio of assets with continuous returns, the Kelly criterion takes a matrix form that's strikingly familiar to anyone who's studied portfolio theory:

Continuous Kelly (multi-asset)
f* = Σ−1(μ − r·𝟏)
Σ−1 = inverse covariance matrix of asset returns
μ = vector of expected returns
r = risk-free rate
This is exactly the Markowitz tangency portfolio with a specific risk-aversion parameter (λ = 1).

This is a remarkable convergence. Three seemingly different problems —

  1. Kelly: "Maximize my long-run geometric growth rate"
  2. Markowitz: "Maximize expected return for a given level of risk"
  3. Log utility: "Maximize my expected logarithmic utility of wealth"

— all produce the same portfolio (at the right parameterization). They're three faces of the same underlying mathematical structure. The Kelly portfolio is the mean-variance optimal portfolio is the log-utility-maximizing portfolio.

The Fractional Kelly Tradeoff

Full Kelly is theoretically optimal but practically terrifying. The standard deviation of the log growth rate equals the mean, which means enormous drawdowns are not just possible — they're expected. A full-Kelly bettor should expect to see their bankroll drop by 50% or more regularly.

This is why serious practitioners use fractional Kelly — typically betting some fraction α of the full Kelly amount. The tradeoff is elegant:

The fractional Kelly tradeoff

If you bet fraction α of the Kelly amount, your growth rate is:

G(αf*) = α(2 − α) · G(f*)

Half Kelly (α = 0.5): You get 75% of the growth rate with roughly half the variance. This is the sweet spot most practitioners target.

Quarter Kelly (α = 0.25): You get 43.75% of the growth rate with much lower drawdowns. Appropriate when estimation uncertainty is high.

The practical reasons for fractional Kelly are compelling:

The Samuelson Critique

Not everyone was convinced. Paul Samuelson, the Nobel laureate economist, was the Kelly criterion's most famous critic. In 1979, he published a paper with a deliberately provocative title: "Why We Should Not Make Mean Log of Wealth Big Though Years to Act Are Long" — written entirely in one-syllable words to prove his point was simple enough to need no jargon.

Samuelson's argument: maximizing log utility is just one preference among many. A risk-averse investor might prefer a different utility function. A terminally ill patient with one year to live has no use for asymptotically optimal growth. Different people have different risk tolerances, and Kelly imposes a specific one (log utility, which implies a risk-aversion coefficient of exactly 1).

This critique is mathematically valid but somewhat misses the point. Kelly's result isn't really a claim about utility — it's a structural theorem about multiplicative processes. If you're in a repeated game with compounding, and you bet above the Kelly fraction, you will almost surely end up with less wealth than a Kelly bettor, regardless of your utility function. Kelly isn't saying you should maximize log wealth — it's saying that if you bet more than Kelly, you're provably leaving money on the table.

The practical resolution: Kelly defines the upper bound on sensible bet sizing. You can bet less if you prefer lower variance. You should never bet more.

Real-World Kelly

Bill Benter — $1 Billion from Horse Racing

Bill Benter is perhaps the most successful gambler in history. Starting in the 1980s, he developed a statistical model for Hong Kong horse racing that estimated the true probability of each horse winning. He then used Kelly-based position sizing to determine optimal bet amounts. Over several decades, his operation reportedly earned over $1 billion. Benter's system was a textbook application: statistical model for edge estimation + Kelly for sizing + disciplined execution.

Warren Buffett and Charlie Munger

Buffett has never explicitly said "I use the Kelly criterion," but his investment philosophy is essentially Kelly thinking. He concentrates heavily in his best ideas rather than diversifying into mediocre ones. Charlie Munger has been more explicit — he has directly cited Kelly and the importance of sizing bets in proportion to your edge. Munger: "The wise ones bet heavily when the world offers them that opportunity. They bet big when they have the odds. And the rest of the time, they don't."

Renaissance Technologies

Jim Simons's Medallion Fund — the most successful hedge fund in history, with ~66% annual returns before fees — is known to use information-theoretic approaches. While the specific methods are secret, the general framework of estimating small edges across thousands of trades and sizing positions optimally is fundamentally Kelly-inspired.

The Deeper Lesson

The Kelly criterion is often presented as a formula. It's better understood as a way of thinking.

What Kelly really tells you is:

  1. Your growth rate is bounded by your information. No amount of leverage, conviction, or cleverness can overcome a lack of genuine edge. If you don't have better information than the market, Kelly says bet zero.
  2. Sizing matters as much as selection. Finding a good bet is only half the problem. Sizing it correctly is the other half, and most people get it catastrophically wrong — usually by betting too much.
  3. Overbetting is strictly worse than underbetting. Below Kelly, you're leaving growth on the table. Above Kelly, you're destroying growth. This asymmetry is the most important practical takeaway.
  4. The geometric mean governs everything in a compounding world. Arithmetic averages are misleading. A strategy that returns +50%, −40%, +50%, −40%... has a positive arithmetic mean (+5% per period) but a negative geometric mean. You're going broke while your spreadsheet says you're winning.

Kelly connects information theory, probability, finance, and decision-making into a single, elegant framework. It says that the speed at which you can grow wealth is not a function of your courage or your conviction — it's a function of how much you genuinely know, measured in bits, and how precisely you act on that knowledge.

In a world full of overconfident bettors and overleveraged portfolios, that might be the most important lesson in quantitative finance.


Further Reading