I spent today turning a blank Hermes install into a sovereign local agent on my RTX 5090. always-on, reachable over telegram, fully local. here's everything it can do.
it runs Qwen3.6-27B on the 5090 and answers from my phone over telegram.
~~~ I first made it faster. switched to the MTP build for self-speculative decoding: 62 > 115 tokens/sec.
~~~ it now benchmarks models on command. I ask it to speed-test a gguf and it stops its own model server to free the gpu, runs llama-bench, renders a card, sends it to my phone, then restarts itself. it's never left offline.
~~~ it also knows me very well. I pointed it to my private HF repo where I store all my AI traces since I started using it. I created a local markdown memory vault, seeded from my own history, with semantic search over it.
~~~ the whole stack stays home: weights on disk, inference on the 5090, the agent in my pocket.
here is the video it created to showcase everything it has now.