WallStreetPrep did a very practical AI benchmarking exercise for real-world finance.
It tested financial modeling agents on a real analyst assignment, not a toy prompt with a neat answer key.
The task was a serious analyst job: build Apple’s historical and forecast financial statements, cite sources, link assumptions, add schedules, and make the workbook auditable.
Primer, an AI financial modeling tool, came out ahead in this test, but the more useful point is why: its output looked less like a spreadsheet patched together cell by cell and more like a connected financial system that could be audited.
Primer treats Excel as the final output format, not the agent’s working language, so the AI can build a stronger 3-statement financial model first and then convert it into an auditable spreadsheet.
Primer represents the workbook as structured records such as revenue, cost of sales, cash, debt, assumptions, formulas, source links, comments, and dependency checks.
That means the AI can query and validate the finance logic directly, for example “show me every formula feeding cash flow” or “find balance sheet plugs,” instead of visually navigating Excel and editing fragile cell references one by one.
This is what I am seeing in many areas, that professional AI agents will be judged less by chat quality and more by whether their artifacts survive audit
