In the last 24 hours, I have had 5 founders message me of varying-sized companies; some 10-person startups and one $200BN public company.
All of them stated they have been able to cut inference spend by 75% or more with little effort, no performance change and better latency.
The times they are a changing.















