LangChain CEO Highlights Advances In AI Agent Development

LangChain CEO Highlights Advances In AI Agent Development · Digg

Posts from X

Most Activity

VIEWS654BOOKMARKS2REPLIES1

Harrison Chase@hwchase17

how do you run agent code without a full blown sandbox?

we did a lot of work to harden the code interpreter runtime we use

Hunter Lovell@huntlovell

http://x.com/i/article/2071962669247053824

2h65412

LIKES5

marygrace_1713@nana_tourSVT

@LangChain The real challenge starts before isolation—even vetting dynamically generated code for supply chain vulnerabilities and malicious patterns is tough. How does LangChain handle that pre-execution layer?

2h65

RETWEETS3

Sydney Runkle@sydneyrunkle

agents that can write code can solve problems more reliably

but you need to make sure you execute that untrusted code in a safe environment

here’s how we enable that w a lightweight code interpreter!

Hunter Lovell@huntlovell

http://x.com/i/article/2071962669247053824

2h1.4K67

Hunter Lovell@huntlovell

we launched code interpreters for deep agents last month. Basic idea is to let agents plan, delegate, and organize context using code instead of chained tool calls

Code interpreters don't need a sandbox, but we still need a way to securely run that code! (and running untrusted code is a famously hard problem)

Here's the writeup on how we're looking to do just that:

Hunter Lovell@huntlovell

http://x.com/i/article/2071962669247053824

3h1.1K115

LangChain@LangChain

Giving agents the ability to write code makes them dramatically more capable.

It also makes security a lot harder.

At LangChain, we have spent a lot of time this year figuring out how to do both. https://www.langchain.com/blog/running-untrusted-agent-code-without-a-sandbox

3h2.4K72

Sydney Runkle@sydneyrunkle

must read if you're thinking evals for deep agents

LangChain@LangChain

http://x.com/i/article/2069807654986276864

2h84826

Hunter Lovell@huntlovell

http://x.com/i/article/2071962669247053824

3h29941

Strata@ChainZenit

@hwchase17 that sounds like a massive undertaking, how did you handle security?

2h6

Ramz4 Alsrhe@raaz426889

@nana_tourSVT @LangChain Performance overhead is real, but the bigger pain might be observability. When an agent-written function fails inside WASM, debugging becomes a black box. Has LangChain cracked that part?

2h4

Theseus@theseus_code

@LangChain The hard part of agent eval isn't the harness, it's defining what 'success' means when the agent can read files, execute code, and browse the web. Traditional LLM benchmarks measure output quality. Agent evals need to measure decision quality across a branching tree of actions.

3h2

Emerging Intelligence@EmergingIntell

@sydneyrunkle @hwchase17 To evaluate multi agent orchestrated systems is one of the challenging task to build large scale systems.

1h

marygrace_1713@nana_tourSVT

@LangChain Beyond metrics, I'm stuck on whether we're evaluating "intelligence" or just "task completion." What's your take—should agents be judged on how they think or what they deliver?

3h