Former OpenAI and DeepMind builder Phil Chen says future Claude models will let organizations pretrain their own clones
Anthropic will likely deploy automated filters to block model cloning
@philhchen @karpathy Note that it's already against Anthropic ToS to use Claude to develop competing products, which definitely covers this.
Anthropic hasn't yet put filters in place to reliably prevent this, but it seems reasonably likely they'll do so soon.
Thought experiment: if @karpathy's efforts at Anthropic yield a Claude model that is capable of pretraining the next generation of Claude, then any company with sufficient GPU infrastructure could use Claude to pretrain their own Claude-clone. Of course, Anthropic would then ban that company from using Claude. But then wouldn't any company with enough Claude spend be incentivized to use Claude to train their own Claude-clone eventually? What happens in 1-2 years when even open-weights models become good enough to run their own training?
@philhchen @karpathy Idk how they will/should draw the line. You could operationalize as "Claude won't help you train models that will be within a factor of 10x of cost competitiveness of any currently deployed Anthropic model"?
@bshlgrs @karpathy where do you draw the line between nanoGPT runs on 8xH100s (obviously allowed) and big pretrain on 100k B200s?
@philhchen @karpathy Yeah that would definitely be disallowed under the rule I proposed.
Definitely there's some tricky question for Anthropic about how to manage existing relationships with counterparties who use Claude for their AI work.
@bshlgrs @karpathy actually a clear counterexample to this would be Google DeepMind using Opus for Gemini pretraining code
@bshlgrs @karpathy where do you draw the line between nanoGPT runs on 8xH100s (obviously allowed) and big pretrain on 100k B200s?
@philhchen @karpathy Note that it's already against Anthropic ToS to use Claude to develop competing products, which definitely covers this. Anthropic hasn't yet put filters in place to reliably prevent this, but it seems reasonably likely they'll do so soon.
@bshlgrs @karpathy actually a clear counterexample to this would be Google DeepMind using Opus for Gemini pretraining code
@philhchen @karpathy Idk how they will/should draw the line. You could operationalize as "Claude won't help you train models that will be within a factor of 10x of cost competitiveness of any currently deployed Anthropic model"?