Open-source AI builder @xlr8harder argues model distillation is more effective than crowdsourcing for resolving underlying technical problems
Research engineer Florian Brand confirmed the assessment.
@xlr8harder @xeophon you don’t have to
closed model needs to be good at every use case
open model needs to be good at your use case
just do last-mile training on your own tasks and data
@xeophon I'm not sure I actually do (unless the answer is that it doesn't.) Crowd sourcing could do some of this but all of the efforts so far have been quite lackluster. Oh, just distillation, I guess?
@xlr8harder @xeophon open models don’t need to beat closed models outright, they just need to be close enough that you can bridge the gap and then some, relatively quickly and cheaply
for anything served at scale, you can amortize out the training cost pretty quickly, and retrain regularly
@xlr8harder @xeophon you don’t have to closed model needs to be good at every use case open model needs to be good at your use case just do last-mile training on your own tasks and data
@xlr8harder @xeophon crowdsourcing will come in due time; have learned a lot of lessons here, you need activation energy to be super low and QC to be really high, but these are solvable with the right tooling
@xlr8harder @xeophon open models don’t need to beat closed models outright, they just need to be close enough that you can bridge the gap and then some, relatively quickly and cheaply for anything served at scale, you can amortize out the training cost pretty quickly, and retrain regularly
@xlr8harder ding ding ding
@xeophon I'm not sure I actually do (unless the answer is that it doesn't.) Crowd sourcing could do some of this but all of the efforts so far have been quite lackluster. Oh, just distillation, I guess?
@willccbb @xeophon This is related to a question I've been exploring. Say you need to do meaningful domain adaptation, the kind of thing you'd probably need CPT for. Is there any good way to do this without wrecking post-trained behavior?
This seems like it would be very valuable.
@xlr8harder @xeophon you don’t have to closed model needs to be good at every use case open model needs to be good at your use case just do last-mile training on your own tasks and data