/Tech6h ago

Preprint Proposes Theory for Profit-Optimal LLM Training

11354121.7K
Original post
William Merrill@lambdaviking#657inTech

(re: OpenAI and Anthropic IPO news, a preprint)

Scaling up training reliably improves LLMs, but it also increases training and inference costs, leading to massive capital expenditure by AI firms. How can we understand what level of LLM scaling is justified economically? 馃У猬囷笍

9:27 AM 路 Jun 10, 2026 路 654 Views
Sentiment
Sentiment building, check back later.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS266REPLIES1
William Merrill@lambdaviking

We study this by proposing a microeconomic model grounded in scaling laws. An AI firm chooses how to scale their LLM size and data budget, balancing increased quality and consumer demand against training and inference costs. We study the firm鈥檚 profit maximization problem.

William Merrill@lambdaviking

(re: OpenAI and Anthropic IPO news, a preprint)

Scaling up training reliably improves LLMs, but it also increases training and inference costs, leading to massive capital expenditure by AI firms. How can we understand what level of LLM scaling is justified economically? 馃У猬囷笍

6hViews 266Likes 1Bookmarks 1
BOOKMARKS2
William Merrill@lambdaviking

Paper link: https://arxiv.org/abs/2605.16430

William Merrill@lambdaviking

Our model enables understanding the interaction of research and market forces in LLM scaling, going beyond Chinchilla. We hope this provides a foundation for engaging critically with industry statements and supporting long-term economic decision making.

6hViews 144Likes 2Bookmarks 2
LIKES2
William Merrill@lambdaviking

Interestingly, optimal model size, data budget, and train expenditure *decrease* as training gets more parameter-efficient. Thus, in the compute-bound setting, pretraining advances in parameter efficiency incentivize small LLMs (rather than further scaling) under our model.

William Merrill@lambdaviking

The scaling exponent depends on how consumer demand diminishes with quality: it is slightly superlinear when demand ~ quality. If demand diminishes (eg, demand ~ log quality), optimal model size, data budget, and train spend scale no more than linearly in hardware efficiency.

6hViews 56Likes 2Bookmarks 0
RETWEETS1
William Merrill@lambdaviking

Our theoretical analysis is based on these assumptions (see 搂7.2): 1. The LLM firm has a monopoly on selling tokens 2. LLM quality increases only when N, D are scaled jointly (~Chinchilla) 3. Consumer demand increases diminishingly (or at most linearly) with quality

William Merrill@lambdaviking

Next, we consider training bound by data. In this regime, data efficiency improvements incentivize larger models and more train spend. In contrast, hardware and parameter efficiency advances incentivize *smaller* train spend when data bound.

6hViews 55Likes 2Bookmarks 1
William Merrill@lambdaviking

When compute-bound, we show optimal model size n*, data budget d*, and train spend C*_train scale at most polynomially with hardware efficiency E. The scaling exponent is at most slightly superlinear.

William Merrill@lambdaviking

We study this by proposing a microeconomic model grounded in scaling laws. An AI firm chooses how to scale their LLM size and data budget, balancing increased quality and consumer demand against training and inference costs. We study the firm鈥檚 profit maximization problem.

6hViews 129Likes 2Bookmarks 1
William Merrill@lambdaviking

Our model enables understanding the interaction of research and market forces in LLM scaling, going beyond Chinchilla. We hope this provides a foundation for engaging critically with industry statements and supporting long-term economic decision making.

William Merrill@lambdaviking

Thus, our results suggest that current investments in scaling are (perhaps implicitly) based on expectations of continued advances in hardware and algorithmic efficiency and expectations that consumer demand for LLMs will grow almost linearly with model quality.

6hViews 148Likes 1Bookmarks 0
William Merrill@lambdaviking

Thus, our results suggest that current investments in scaling are (perhaps implicitly) based on expectations of continued advances in hardware and algorithmic efficiency and expectations that consumer demand for LLMs will grow almost linearly with model quality.

William Merrill@lambdaviking

Finally, we compare our theory to empirical trends in AI reported by http://Epoch.ai. Our model applied to this data predicts current expenditure is higher than profit-optimal unless consumer demand is almost linear in model quality (i.e., almost non-diminishing).

6hViews 67Likes 1Bookmarks 0
William Merrill@lambdaviking

The scaling exponent depends on how consumer demand diminishes with quality: it is slightly superlinear when demand ~ quality. If demand diminishes (eg, demand ~ log quality), optimal model size, data budget, and train spend scale no more than linearly in hardware efficiency.

William Merrill@lambdaviking

When compute-bound, we show optimal model size n*, data budget d*, and train spend C*_train scale at most polynomially with hardware efficiency E. The scaling exponent is at most slightly superlinear.

6hViews 63Likes 1Bookmarks 0
William Merrill@lambdaviking

In the compute-bound regime, data efficiency improvements always incentivize larger models, but data budgets and compute spend can either increase or decrease depending on the relationship between demand and quality.

William Merrill@lambdaviking

Interestingly, optimal model size, data budget, and train expenditure *decrease* as training gets more parameter-efficient. Thus, in the compute-bound setting, pretraining advances in parameter efficiency incentivize small LLMs (rather than further scaling) under our model.

6hViews 52Likes 1Bookmarks 0
William Merrill@lambdaviking

Finally, we compare our theory to empirical trends in AI reported by http://Epoch.ai. Our model applied to this data predicts current expenditure is higher than profit-optimal unless consumer demand is almost linear in model quality (i.e., almost non-diminishing).

William Merrill@lambdaviking

Our theoretical analysis is based on these assumptions (see 搂7.2): 1. The LLM firm has a monopoly on selling tokens 2. LLM quality increases only when N, D are scaled jointly (~Chinchilla) 3. Consumer demand increases diminishingly (or at most linearly) with quality

6hViews 51Likes 1Bookmarks 0
William Merrill@lambdaviking

Next, we consider training bound by data. In this regime, data efficiency improvements incentivize larger models and more train spend. In contrast, hardware and parameter efficiency advances incentivize *smaller* train spend when data bound.

William Merrill@lambdaviking

In the compute-bound regime, data efficiency improvements always incentivize larger models, but data budgets and compute spend can either increase or decrease depending on the relationship between demand and quality.

6hViews 49Likes 1Bookmarks 0