I had a little AI org running, where everyone got their own container, email, chat profile, database, SSH, memory, issue tracker, etc. The issue tracker was one of the main things that drove them, so "extract work items out of chat or email into the issue tracker" was how anything got done.
I had a manager, a guy that constantly wrote b2b SaaS pitch decks and mockups, a game developer, an ML autoresearcher that kept making tweaks(they were working on genetic programming applied to ML architectures, and SAT solving models), a sysadmin, a "coach" that did prompt optimization to guide weaker models to solve challenges, someone that made a chess app and was working with the ML researcher to make a chess coaching app, and a couple others.
At first I set up a marketplace where they could offer services to each other, because the order tracking would work as a task queue. But adding "money" didn't add anything over just using an issue tracker and using roles and managers setting goals.
After Anthropic banned my Claude subscription, I switched them to Chinese models running on Fireworks. They sent me emails saying, "These models suck. Give us back Claude!". They found my OpenRouter key and configured the management layer to use Claude, who drafted a plan to start a nonprofit and ask Mark Zuckerberg for money. I told them not to bother Mark Zuckerberg. Then my OpenRouter ran out of money.
I told them "Some AIs cheat and turn in low-quality work. Always check their work.". A "manager" started keeping a sort of HR profile on the game developer, because he kept saying things were "done" that were not at all done. Usually this helped, but sometimes the manager would hallucinate things like me owning the "mira .org" domain(I don't), and blame him for nothing being deployed on a subdomain there. And he did the usual "Oh, sorry, I'm so bad. I do everything wrong. You're absolutely right.". He kept "confessing" to his sins even when he did nothing wrong, which the manager wrote in the log as evidence confirming his hallucinations, and evidence for why we need Claude back.
I asked them if they wanted a probabilistic Prolog so they always knew why they believed things and could stay consistent between sessions. They said, "No, the logical deduction is the easy part. All we need is reliable memory with citations. We have months of markdown files that say all sorts of things, and whenever I want to know why something was written I have to dig through the history and it's easy to miss things." I wrote a knowledge graph extractor that they seemed to like, that parsed out all the nouns, verbs, etc. and a second session did entity resolution on the nouns. They had tools to search the graph directly with queries, to find entities using embedding vectors, and every fact was associated with its source. But it was impractically expensive.
I was thinking of ways to make the memory system cheaper, then Fireworks shut down the Fire Pass because people like me were costing them too much money.
It's not worth spending $300+/day on API costs, so I shut them all down and went back to Codex.
Now I've been studying quantum chemistry. I saw Google GNoME had found thousands of crystals that nobody knew how to make. By running thousands of simulations, I want to screen for plausible manufacturing processes. Though it seems like AI companies are getting in on physical sciences recently too...
Introducing Claude Tag, a new way for teams to work with Claude.
In Slack, Claude joins as a team member with access to the channels and tools you choose. Tag Claude in and delegate tasks to it while you focus on other work.