1d ago

PolyAI's Raven 3.5 Beats GPT-5 And Claude On Customer Service Benchmarks

0
Original post

Can a smaller model purpose-built for one domain beat a frontier general model that's 100× its size? A recent paper showed yes — and not by a small margin. Raven 3.5 from PolyAI shows that a smaller specialist model can beat bigger general models on customer service calls. It beats GPT-5 and Claude Sonnet 4.6 on all 4 customer service benchmarks while staying under 300ms latency. This is one of the live debates in ML. Every researcher is asking this question. The paper is the empirical answer. PolyAI's research team published “Raven 3.5: The post-training recipe that beats GPT-5 for customer service” —- Voice agents are moving from call-center software into everyday product infrastructure. PolyAI’s launch targets the gap between website traffic and real customer conversations. Made every website capable of answering out loud. PolyAI helps enterprises fix slow phone support, long wait times, costly contact centers, robotic IVRs, and missed revenue from abandoned calls. Its voice agents handle customer conversations 24/7 across voice, chat, SMS, and social channels in 45+ languages. The result is faster support, lower operating cost, more consistent answers, and better customer experience at enterprise scale. 📞 PolyAI is launching 2 new voice AI products: ADK, a code-first Agent Development Kit for building production voice agents from your own IDE, and PolyPhone, which turns any website into a live voice AI agent in about 10 minutes. ADK connects directly into Agent Studio, so developers can build, manage, and deploy agents from the terminal. PolyPhone reads a website, understands things like FAQs and product details, then creates a voice agent that can be embedded on any webpage without needing telephony setup. The bigger point: enterprise voice AI is moving from “contact center project” to “something teams can build and ship much faster.” 🧵 1

2:16 PM · May 18, 2026 View on X
Reposted by

🧵 2. The problem is that general LLMs can sound smart, but live phone support needs fast replies, careful instruction following, natural speech, and reliable handling of messy caller requests.

PolyAI’s answer is not a new giant model, but a stronger post-training recipe, which means extra training after the base model is built so it learns one job very well.

The model trains on millions of anonymized real customer conversations, then uses reward-based training and preference examples to learn what a good support reply should look like.

The authors tested Raven 3.5 on customer service benchmarks covering English quality, multilingual quality across 23 languages, response style, and date-time logic.

Rohan PaulRohan Paul@rohanpaul_ai

Can a smaller model purpose-built for one domain beat a frontier general model that's 100× its size? A recent paper showed yes — and not by a small margin. Raven 3.5 from PolyAI shows that a smaller specialist model can beat bigger general models on customer service calls. It beats GPT-5 and Claude Sonnet 4.6 on all 4 customer service benchmarks while staying under 300ms latency. This is one of the live debates in ML. Every researcher is asking this question. The paper is the empirical answer. PolyAI's research team published “Raven 3.5: The post-training recipe that beats GPT-5 for customer service” —- Voice agents are moving from call-center software into everyday product infrastructure. PolyAI’s launch targets the gap between website traffic and real customer conversations. Made every website capable of answering out loud. PolyAI helps enterprises fix slow phone support, long wait times, costly contact centers, robotic IVRs, and missed revenue from abandoned calls. Its voice agents handle customer conversations 24/7 across voice, chat, SMS, and social channels in 45+ languages. The result is faster support, lower operating cost, more consistent answers, and better customer experience at enterprise scale. 📞 PolyAI is launching 2 new voice AI products: ADK, a code-first Agent Development Kit for building production voice agents from your own IDE, and PolyPhone, which turns any website into a live voice AI agent in about 10 minutes. ADK connects directly into Agent Studio, so developers can build, manage, and deploy agents from the terminal. PolyPhone reads a website, understands things like FAQs and product details, then creates a voice agent that can be embedded on any webpage without needing telephony setup. The bigger point: enterprise voice AI is moving from “contact center project” to “something teams can build and ship much faster.” 🧵 1

9:16 PM · May 18, 2026 · 3.4K Views
9:16 PM · May 18, 2026 · 444 Views

🧵 3. Most customer support automation still breaks at the most important place: the actual phone call.

Old Interactive Voice Response systems usually force people through menus, chatbots often fail when the question gets messy, and human support teams get crushed during peak demand.

That creates long wait times, abandoned calls, higher staffing costs, and uneven customer experience across voice, chat, SMS, and social channels.

Rohan PaulRohan Paul@rohanpaul_ai

🧵 2. The problem is that general LLMs can sound smart, but live phone support needs fast replies, careful instruction following, natural speech, and reliable handling of messy caller requests. PolyAI’s answer is not a new giant model, but a stronger post-training recipe, which means extra training after the base model is built so it learns one job very well. The model trains on millions of anonymized real customer conversations, then uses reward-based training and preference examples to learn what a good support reply should look like. The authors tested Raven 3.5 on customer service benchmarks covering English quality, multilingual quality across 23 languages, response style, and date-time logic.

9:16 PM · May 18, 2026 · 444 Views
9:16 PM · May 18, 2026 · 283 Views
Rohan PaulRohan Paul@rohanpaul_ai

🧵 6. PolyAI’s new launch is really aimed at 2 very different builders. ADK is for developers who want to build serious voice AI agents from their normal coding setup, with more control over how the agent is built, managed, tested, and shipped. PolyPhone is for teams that want a much faster path: give it a website, let it understand the FAQs, product details, and key pages, then turn that site into a voice agent people can actually talk to. So instead of a website just sitting there with text, it can now answer questions, explain products, qualify leads, route users, and help customers without making them click around.

9:16 PM · May 18, 2026 · 1K Views
9:16 PM · May 18, 2026 · 958 Views