OpenAI and Thrive just built a self-improving tax agent with up to 97% accuracy.
Tax AI processed 7,000 returns across 30+ accounting firms, saved about one-third of preparation time, reached up to 97% accuracy, and raised throughput by about 50%.
The hard part was not reading W-2s or 1099s, but handling messy K-1s, rental schedules, notes, spreadsheets, prior-year files, and values that must match across documents.
The system records the full trace: source file, extracted field, citation, tax-engine mapping, accountant correction, and final filed value.
Repeated corrections become eval targets, so Codex gets a narrow task with evidence, code, tests, and a pass condition.
A wrong tax field can come from many places: bad extraction, weak mapping, unsupported workflow, prior-year carryover, or human judgment.
The clever part was not simply using Codex to write fixes, but building a product environment where repeated practitioner corrections became bounded, testable engineering tasks.
In the rental-property example, the agent could inspect source documents, extraction traces, mapper behavior, expected outputs, and regression tests before proposing a change.