What are the best examples of an LLM API's behavior changed despite the LLM name/version being exactly the same?
I'd like to collect as many examples of these as possible.
What are the best examples of an LLM API's behavior changed despite the LLM name/version being exactly the same?
I'd like to collect as many examples of these as possible.
Users expressed frustration with model drift in fixed LLM API versions, describing it as a headache for developers.
No Digg Deeper questions have been answered for this story yet.
I know this Anthropic blog about serving issues causing accuracy to go down for instance: https://www.anthropic.com/engineering/a-postmortem-of-three-recent-issues
What are the best examples of an LLM API's behavior changed despite the LLM name/version being exactly the same?
I'd like to collect as many examples of these as possible.

@gneubig man, models drifting over time is such a headache for devs.

@gneubig the openai gpt 3.5 turbo situation where json reliability silently tanked for weeks is the classic one
system prompt following also drifts without any version bump and almost nobody catches it until evals break

@gneubig Does this count?