/Tech6h ago

Preprint Proposes Theory for Profit-Optimal LLM Training

11354121.7K

Original post

William Merrill@lambdaviking#657inTech

(re: OpenAI and Anthropic IPO news, a preprint)

Scaling up training reliably improves LLMs, but it also increases training and inference costs, leading to massive capital expenditure by AI firms. How can we understand what level of LLM scaling is justified economically? 🧵⬇️

9:27 AM · Jun 10, 2026 · 654 Views

Sentiment

Sentiment building, check back later.

Cluster Engagement

Posts from X

Most Activity

VIEWS266REPLIES1

William Merrill@lambdaviking

We study this by proposing a microeconomic model grounded in scaling laws. An AI firm chooses how to scale their LLM size and data budget, balancing increased quality and consumer demand against training and inference costs. We study the firm’s profit maximization problem.

William Merrill@lambdaviking

(re: OpenAI and Anthropic IPO news, a preprint)

Scaling up training reliably improves LLMs, but it also increases training and inference costs, leading to massive capital expenditure by AI firms. How can we understand what level of LLM scaling is justified economically? 🧵⬇️

6h26611

BOOKMARKS2

William Merrill@lambdaviking

Paper link: https://arxiv.org/abs/2605.16430

William Merrill@lambdaviking

Our model enables understanding the interaction of research and market forces in LLM scaling, going beyond Chinchilla. We hope this provides a foundation for engaging critically with industry statements and supporting long-term economic decision making.

6h14422

LIKES2

William Merrill@lambdaviking

Interestingly, optimal model size, data budget, and train expenditure *decrease* as training gets more parameter-efficient. Thus, in the compute-bound setting, pretraining advances in parameter efficiency incentivize small LLMs (rather than further scaling) under our model.

William Merrill@lambdaviking

The scaling exponent depends on how consumer demand diminishes with quality: it is slightly superlinear when demand ~ quality. If demand diminishes (eg, demand ~ log quality), optimal model size, data budget, and train spend scale no more than linearly in hardware efficiency.

6h5620

RETWEETS1

William Merrill@lambdaviking

Our theoretical analysis is based on these assumptions (see §7.2): 1. The LLM firm has a monopoly on selling tokens 2. LLM quality increases only when N, D are scaled jointly (~Chinchilla) 3. Consumer demand increases diminishingly (or at most linearly) with quality

William Merrill@lambdaviking

Next, we consider training bound by data. In this regime, data efficiency improvements incentivize larger models and more train spend. In contrast, hardware and parameter efficiency advances incentivize *smaller* train spend when data bound.

6h5521

William Merrill@lambdaviking

When compute-bound, we show optimal model size n*, data budget d*, and train spend C*_train scale at most polynomially with hardware efficiency E. The scaling exponent is at most slightly superlinear.

William Merrill@lambdaviking

We study this by proposing a microeconomic model grounded in scaling laws. An AI firm chooses how to scale their LLM size and data budget, balancing increased quality and consumer demand against training and inference costs. We study the firm’s profit maximization problem.

6h12921

William Merrill@lambdaviking

Our model enables understanding the interaction of research and market forces in LLM scaling, going beyond Chinchilla. We hope this provides a foundation for engaging critically with industry statements and supporting long-term economic decision making.

William Merrill@lambdaviking

Thus, our results suggest that current investments in scaling are (perhaps implicitly) based on expectations of continued advances in hardware and algorithmic efficiency and expectations that consumer demand for LLMs will grow almost linearly with model quality.

6h14810

William Merrill@lambdaviking

Thus, our results suggest that current investments in scaling are (perhaps implicitly) based on expectations of continued advances in hardware and algorithmic efficiency and expectations that consumer demand for LLMs will grow almost linearly with model quality.

William Merrill@lambdaviking

Finally, we compare our theory to empirical trends in AI reported by http://Epoch.ai. Our model applied to this data predicts current expenditure is higher than profit-optimal unless consumer demand is almost linear in model quality (i.e., almost non-diminishing).

6h6710

William Merrill@lambdaviking

The scaling exponent depends on how consumer demand diminishes with quality: it is slightly superlinear when demand ~ quality. If demand diminishes (eg, demand ~ log quality), optimal model size, data budget, and train spend scale no more than linearly in hardware efficiency.

William Merrill@lambdaviking

When compute-bound, we show optimal model size n*, data budget d*, and train spend C*_train scale at most polynomially with hardware efficiency E. The scaling exponent is at most slightly superlinear.

6h6310

William Merrill@lambdaviking

In the compute-bound regime, data efficiency improvements always incentivize larger models, but data budgets and compute spend can either increase or decrease depending on the relationship between demand and quality.

William Merrill@lambdaviking

Interestingly, optimal model size, data budget, and train expenditure *decrease* as training gets more parameter-efficient. Thus, in the compute-bound setting, pretraining advances in parameter efficiency incentivize small LLMs (rather than further scaling) under our model.

6h5210

William Merrill@lambdaviking

Finally, we compare our theory to empirical trends in AI reported by http://Epoch.ai. Our model applied to this data predicts current expenditure is higher than profit-optimal unless consumer demand is almost linear in model quality (i.e., almost non-diminishing).

William Merrill@lambdaviking

Our theoretical analysis is based on these assumptions (see §7.2): 1. The LLM firm has a monopoly on selling tokens 2. LLM quality increases only when N, D are scaled jointly (~Chinchilla) 3. Consumer demand increases diminishingly (or at most linearly) with quality

6h5110

William Merrill@lambdaviking

Next, we consider training bound by data. In this regime, data efficiency improvements incentivize larger models and more train spend. In contrast, hardware and parameter efficiency advances incentivize *smaller* train spend when data bound.

William Merrill@lambdaviking

In the compute-bound regime, data efficiency improvements always incentivize larger models, but data budgets and compute spend can either increase or decrease depending on the relationship between demand and quality.

6h4910