Converts an input image into 3D meshes, Gaussian splats, and audio files using Claude skills and external AI APIs.
Archives experiments of AI agents autonomously tuning optimizers, schedules, and hyperparameters to reach target validation loss in fewest steps on a small LM benchmark.
Integrates Ghostty into a macOS terminal with vertical tabs, sidebar metadata, OSC notifications, and in-app browser for AI coding agents.
Provides AI agent toolkit with coding CLI, unified LLM API, TUI/web UI libs and Slack bot support.
Automates articulated 3D asset creation by prompting LLMs to generate executable Python code defining parts, geometry and joints.
Constructs capability calibration datasets via repeated sampling and implements methods for estimating LLM confidence such as verbalized confidence and response consistency.
Implements lightweight training runtime, primitives and Qwen MoE model layers for Megatron.
Evaluates language models on reconstructing complete codebases from compiled binaries and documentation.
Benchmarks language agents on long-horizon tasks via 600+ tools in Docker-isolated software environments.
Benchmarks autonomous agents on recovering historical human speedups in GPT-2 pretraining via LLM-judged submissions and retimed runs.