Agents are given access to a set of tools, and these tools mediate how the LLM interacts with its external environment. Notably, the environment is stateful, and tool calls can result in environment state changes.
Arbitrary dynamics for an environment can be encoded in tool calling logic. The agent understands what is happening in the environment as a result of its actions / tools calls from observations (i.e., tool outputs). These observations (e.g., error logs, file information, failed tests, etc.) are just a lossy representation of the environment's actual container state.
🧵 [2/6]
One of the hardest aspects of agentic RL is managing / scaling environments...
🧵 [1/6]