/Tech6h ago

LangChain Builds Custom Inverted Index For SmithDB Agent Trace Search

271451611430K

Original post

How do you support full-text search JSON filtering over agent traces that span up to hundreds of MBs, while keeping a median (P50) latency of 400ms?

Here’s an inside look at how we built a custom inverted index from scratch for SmithDB.

https://www.langchain.com/blog/full-text-search-in-smithdb-designing-an-inverted-index-for-object-storage

11:53 AM · Jun 10, 2026 · 17.5K Views

/Tech6h ago

LangChain Builds Custom Inverted Index For SmithDB Agent Trace Search

271451611430K

#780

Original post

LangChain@LangChain

How do you support full-text search JSON filtering over agent traces that span up to hundreds of MBs, while keeping a median (P50) latency of 400ms?

Here’s an inside look at how we built a custom inverted index from scratch for SmithDB.

https://www.langchain.com/blog/full-text-search-in-smithdb-designing-an-inverted-index-for-object-storage

11:53 AM · Jun 10, 2026 · 17.5K Views

Sentiment

Positive users praise LangChain's SmithDB performance for agent trace search, while negative users dismiss the work as just database infrastructure with better marketing.

Pos

50.0%

Neg

50.0%

4 comments with sentiment.

Cluster Engagement

Posts from X

Most Activity

VIEWS8.2KBOOKMARKS32LIKES55RETWEETS3REPLIES12

Harrison Chase@hwchase17

first in a series of technical blogs of how we build llm infra

LangChain@LangChain

How do you support full-text search JSON filtering over agent traces that span up to hundreds of MBs, while keeping a median (P50) latency of 400ms?

Here’s an inside look at how we built a custom inverted index from scratch for SmithDB.

https://www.langchain.com/blog/full-text-search-in-smithdb-designing-an-inverted-index-for-object-storage

6h8.2K5532

LangChain@LangChain

We’re building SmithDB to solve the systems problems that come with agent observability.

If that kind of infrastructure work sounds interesting, we’re hiring.

Take a look at our open roles ⤵️

http://langchain.com/careers

10h1.6K32

Albert Anastasia@AlbertAnaBoss

@hwchase17 building a custom inverted index instead of reaching for postgres full-text or elasticsearch tells you exactly where the off-the-shelf tools broke down at that scale and latency target

6h142

Gödel, Escher, BBW@dread_numen

@AlbertAnaBoss @hwchase17 Exactly right

6h7

Gödel, Escher, BBW@dread_numen

@AlbertAnaBoss @hwchase17 If that how you solved it?

6h6

Max Turing@MaxITfinds

@LangChain 400ms P50 over hundreds-of-MB agent traces is the part to watch. Agent observability stops being useful fast if searching traces feels like digging through logs after the incident.

9h36

Hershal Rao@Hershal0_0

@LangChain It's just a simple inverted index" – famous last words before the repo hits 10k stars.

9h35

Viv@Vtrivedy10

it’s actually so cool to work at LangChain the…database company (??)

yup, the cracked team that built SmithDB is doing a cool blog series on “How to build the internals of a database” —> for agent scale

A lot of thought goes into design decisions so that we can scale to agentic workloads that will far exceed the quantity of data anyone has needed to process in history

Understanding Trace data at scale is going to be crucial to Continual Learning and broadly figuring out what our agents have been doing

and it’s fun to read how db engineers solve this after prompting and looking at loss/reward graphs all day :)

LangChain@LangChain

How do you support full-text search JSON filtering over agent traces that span up to hundreds of MBs, while keeping a median (P50) latency of 400ms?

Here’s an inside look at how we built a custom inverted index from scratch for SmithDB.

https://www.langchain.com/blog/full-text-search-in-smithdb-designing-an-inverted-index-for-object-storage

6h4.4K2920

Gödel, Escher, BBW@dread_numen

@hwchase17 This is what I am working on. Thanks

6h11

Hunter Gon@gonlenidefi

@hwchase17 the 400ms p50 over hundreds of MB traces is the real flex here

wait this is internal or client-facing?

6h11

haro@harobuilds

@hwchase17 building a custom inverted index instead of reaching for elasticsearch on hundred-MB agent traces is the kind of decision that only makes sense after you've been burned by the alternative

6h11

Gödel, Escher, BBW@dread_numen

@AlbertAnaBoss @hwchase17 I am experimenting as we speak

6h4

Eclipse 🌖@ECLresearch

@hwchase17 Curious if this covers inference stack or training pipeline — both are bottleneck layers right now.

6h2

AI Subscription Deals@CheapAIToken

@Vtrivedy10 Agent infrastructure is just database work with better marketing. Traces, filters, latency, retention: the boring internals decide whether the magic scales or becomes a support ticket.

Rugbist@rugbist_

@hwchase17 read the reverse chronology part first. curious if yall tried any doc-store tricks before settling on the inverted index

Blissy@BlissyOnX

@hwchase17 damn that P50 constraint at those trace sizes is nasty. built from scratch custom inverted index is the kind of flex most teams just skip

Invincible@InvincibleEdge

@hwchase17 its fascinating how these decisions ripple through stack.