Most companies talk about vector search.
Few share what it actually takes to scale to 100M+ embeddings in production.
Başak Eskili from @bookingcom joined the Weaviate Podcast to break down their AI journey, and it's packed with insights about what building production systems at massive scale actually looks like.
𝗧𝗵𝗲 𝗘𝘃𝗼𝗹𝘂𝘁𝗶𝗼𝗻:
• Started with keyword matching → semantic retrieval with 𝗢𝗽𝗲𝗻𝗦𝗲𝗮𝗿𝗰𝗵 on AWS
• Scaled to hundreds of millions of embeddings with strict latency requirements
• Migrated to 𝗪𝗲𝗮𝘃𝗶𝗮𝘁𝗲 to handle complex filtering, rising concurrency, and production-scale demands
𝗣𝗿𝗼𝗱𝘂𝗰𝘁𝗶𝗼𝗻 𝗚𝗲𝗻𝗔𝗜 𝗶𝗻 𝗔𝗰𝘁𝗶𝗼𝗻:
Their partner-to-guest messaging agent is a real-world example of 𝗮𝗴𝗲𝗻𝘁𝗶𝗰 𝗔𝗜:
• 𝗪𝗲𝗮𝘃𝗶𝗮𝘁𝗲 retrieves relevant response templates
• 𝗔𝗣𝗜𝘀 fetch property and booking context
• The agent suggests templates, crafts grounded replies, or defers to humans (human-in-the-loop design!)
• Evaluation spans offline datasets, LLM-as-a-judge, A/B testing, and live partner feedback
@CShorten30 and Başak talk about how 𝗕𝗼𝗼𝗸𝗶𝗻𝗴.𝗰𝗼𝗺 tested with 100 million embeddings, filtered vector search, multi-threaded concurrency, reads during writes, and cost-efficient infrastructure provisioning to evaluate Weaviate, as well as a look ahead at personalized travel agents with memory systems that capture user preferences, session context, and long-term personalization!
Watch the full podcast here: https://www.youtube.com/watch?v=O9edM9ZS_FQ
