/Tech6h ago

LangChain Builds Custom Inverted Index For SmithDB Agent Trace Search

271451611430K
Original post
LangChain@LangChain

How do you support full-text search JSON filtering over agent traces that span up to hundreds of MBs, while keeping a median (P50) latency of 400ms?

Here’s an inside look at how we built a custom inverted index from scratch for SmithDB.

https://www.langchain.com/blog/full-text-search-in-smithdb-designing-an-inverted-index-for-object-storage

11:53 AM · Jun 10, 2026 · 17.5K Views
Sentiment

Positive users praise LangChain's SmithDB performance for agent trace search, while negative users dismiss the work as just database infrastructure with better marketing.

Pos
50.0%
Neg
50.0%
4 comments with sentiment.
Cluster Engagement
Posts from X
Most Activity
Most Activity
VIEWS8.2KBOOKMARKS32LIKES55RETWEETS3REPLIES12
Harrison Chase@hwchase17

first in a series of technical blogs of how we build llm infra

LangChain@LangChain

How do you support full-text search JSON filtering over agent traces that span up to hundreds of MBs, while keeping a median (P50) latency of 400ms?

Here’s an inside look at how we built a custom inverted index from scratch for SmithDB.

https://www.langchain.com/blog/full-text-search-in-smithdb-designing-an-inverted-index-for-object-storage

6hViews 8.2KLikes 55Bookmarks 32
LangChain@LangChain

We’re building SmithDB to solve the systems problems that come with agent observability.

If that kind of infrastructure work sounds interesting, we’re hiring.

Take a look at our open roles ⤵️

http://langchain.com/careers

10hViews 1.6KLikes 3Bookmarks 2
Albert Anastasia@AlbertAnaBoss

@hwchase17 building a custom inverted index instead of reaching for postgres full-text or elasticsearch tells you exactly where the off-the-shelf tools broke down at that scale and latency target

6hViews 14Likes 2
Max Turing@MaxITfinds

@LangChain 400ms P50 over hundreds-of-MB agent traces is the part to watch. Agent observability stops being useful fast if searching traces feels like digging through logs after the incident.

9hViews 36
Hershal Rao@Hershal0_0

@LangChain It's just a simple inverted index" – famous last words before the repo hits 10k stars.

9hViews 35
Viv@Vtrivedy10

it’s actually so cool to work at LangChain the…database company (??)

yup, the cracked team that built SmithDB is doing a cool blog series on “How to build the internals of a database” —> for agent scale

A lot of thought goes into design decisions so that we can scale to agentic workloads that will far exceed the quantity of data anyone has needed to process in history

Understanding Trace data at scale is going to be crucial to Continual Learning and broadly figuring out what our agents have been doing

and it’s fun to read how db engineers solve this after prompting and looking at loss/reward graphs all day :)

LangChain@LangChain

How do you support full-text search JSON filtering over agent traces that span up to hundreds of MBs, while keeping a median (P50) latency of 400ms?

Here’s an inside look at how we built a custom inverted index from scratch for SmithDB.

https://www.langchain.com/blog/full-text-search-in-smithdb-designing-an-inverted-index-for-object-storage

6hViews 4.4KLikes 29Bookmarks 20
Hunter Gon@gonlenidefi

@hwchase17 the 400ms p50 over hundreds of MB traces is the real flex here

wait this is internal or client-facing?

6hViews 11
haro@harobuilds

@hwchase17 building a custom inverted index instead of reaching for elasticsearch on hundred-MB agent traces is the kind of decision that only makes sense after you've been burned by the alternative

6hViews 1Likes 1

@AlbertAnaBoss @hwchase17 I am experimenting as we speak

6hViews 4
Eclipse 🌖@ECLresearch

@hwchase17 Curious if this covers inference stack or training pipeline — both are bottleneck layers right now.

6hViews 2

@Vtrivedy10 Agent infrastructure is just database work with better marketing. Traces, filters, latency, retention: the boring internals decide whether the magic scales or becomes a support ticket.

6h
Rugbist@rugbist_

@hwchase17 read the reverse chronology part first. curious if yall tried any doc-store tricks before settling on the inverted index

6h
Blissy@BlissyOnX

@hwchase17 damn that P50 constraint at those trace sizes is nasty. built from scratch custom inverted index is the kind of flex most teams just skip

6h
Invincible@InvincibleEdge

@hwchase17 its fascinating how these decisions ripple through stack.

6h