i dont see the language performance a bottleneck on local hardware but rather than inference from models that can run it - unless u are referring to this?
the new meta is going to be with local smaller models then delegating into bigger models in sandboxes on the cloud, then communicating both through agents
im pretty sure this is what google and apple have decided together as well. so u have local inference then a small eval to determine the route/intent but working with local models and agents just puts you into the fun space of supporting edge cases for different machines running different software
as for the language question, yea everything can be optimized but as with all engineering is just evaluating if it’s worth spending the time on it, for us, as a company just starting we don’t have that privilege yet