Today I'm launching a new project called SynthTraces 🔥
It is a minimal codebase to generate synthetic coding agent session traces using Pi (from @badlogicgames)
I wanted a large number of coding-agent traces, so I built a tiny harness where two models talk to each other:
- an open model (served via HF Inference Providers) plays the coding agent. It gets read + bash access to a real open source codebase (the huggingface OSS projects)
- a small local model (llama.cpp) plays the human user, asking simple questions like "how do I run this?" or "how is CI set up?"
The result is more than 2,000 Pi session traces which can be used to train or fine-tune LLMs, and optimize them for Pi 🤯
And ofc everything is published on @huggingface ✅