/Tech45d ago

Databricks AI Research forms team for AI agent evaluation

Julia Neagu leads Databricks AI Research in assembling a team to measure and improve AI agents operating on enterprise data at scale. The initiative focuses on converting evaluation results into better performance across development, training, and production deployment. Work targets complex analytical tasks in domains such as biotech and finance.

931.5K681K173.1K

#118

Original post

Jonathan Frankle#118

Julia Neagu@julianeagu

I'm building a new team at @databricks AI Research and we're hiring.

We're focused on one of the hardest open problems in AI right now: how do you measure and continuously improve agents that operate on enterprise data at scale. We're looking for founding engineers to build the flywheel that turns evaluation results directly into better agents — from development and training all the way to production.

If you want to work on problems that actually matter at the frontier of AI research, I'd love to talk.

Link in comments 👇

12:38 PM · May 15, 2026 · 157.9K Views

Sentiment

Positive users see Databricks building a team to evaluate enterprise AI agents as tackling a great problem worth solving, while negative users worry the work may amount to benchmark theater that fails real analytical workflows.

Pos

50.0%

Neg

50.0%

4 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS12.2KBOOKMARKS21LIKES68REPLIES4

Matei Zaharia@matei_zaharia

This is a fantastic team. Check it out if you want to help build agents and models that reliably answer the most challenging analytical questions in biotech, finance, etc for our customers.

Julia Neagu@julianeagu

I'm building a new team at @databricks AI Research and we're hiring.

If you want to work on problems that actually matter at the frontier of AI research, I'd love to talk.

Link in comments 👇

45d12.2K6821

RETWEETS40

Julia Neagu@julianeagu

I'm building a new team at @databricks AI Research and we're hiring.

If you want to work on problems that actually matter at the frontier of AI research, I'd love to talk.

Link in comments 👇

45d157.9K1.4K982

Andrew Drozdov@mrdrozdov

An incredibly exciting opportunity. Come work with us! Lots of fun and impactful problems to work on.

Julia Neagu@julianeagu

I'm building a new team at @databricks AI Research and we're hiring.

If you want to work on problems that actually matter at the frontier of AI research, I'd love to talk.

Link in comments 👇

45d2.9K201

Julia Neagu@julianeagu

Apply here: https://www.databricks.com/company/careers/engineering---pipeline/staff-software-engineer---agent-quality-8532681002

45d58963

Thomas Tao@Thomas_Tao_1

@julianeagu @databricks This problem is where a lot of agent demos fall apart. Static evals look fine, then the data shifts or permissions get weird. I keep thinking enterprise agent quality is mostly a continuous eval problem.

45d38111

Tidianez@Tidianez

This resonated with me.

I’ve been working on agent measurement and production improvement from two angles.

First, I created AnthroMetrics AI and the HEWU framework, focused on measuring AI/agent output as human-equivalent work, productivity, and enterprise value.

HEWU research paper: https://doi.org/10.5281/zenodo.19353678

AnthroMetrics demo: https://anthrometrics-ai.preview.emergentagent.com

Second, I’m building SafeRun AI around the production-control loop:

Replay → Understand → Create Rule → Prevent

My view is that the agent-improvement flywheel needs measurement + replay: measure what agents produce, replay what they actually did, understand failures, create rules, and improve/control the system in production.

Your team’s focus on measuring and continuously improving agents on enterprise data at scale is exactly the kind of frontier problem I’ve been thinking about deeply.

I’d love to connect and explore whether I could contribute to what you’re building at Databricks AI Research.

SafeRun: https://saferun.dev

45d22211

John Martin@dialectforge

I’m working on an AI, not an LLM. Doesn’t use a transformer, doesn’t need retraining, and the evaluation problem you’re describing doesn’t really exist in it because every output already shows you exactly which knowledge atoms fired. Different architecture entirely. Happy to share more if you’re curious.

45d1091

tralallama@Tralallama

@julianeagu Remote friendly or in person?

45d197

Julia Neagu@julianeagu

@craig_certo @databricks yes!

45d6331

Teddy Albina@talb_bluecurve

@julianeagu @databricks I'm building StellarFS it's an experiment about compiled semantic layer https://github.com/BlueCurveCorp/StellarFS/tree/dropzone

45d9

Osman R.@UsmanReads

@julianeagu @databricks Hi Julia, Just a day ago, in similar space, I did a research work and published my finding in this article. I will definitely apply at the career link, but I am also open to conversation here in case you find below work interesting.

45d527

Alexander Johansen@AlexRoseJo

@julianeagu @jefrankle @databricks How important is agentic memory management to you? We built a new memory system we’re presenting at ACM CAIS would love to discuss memory optimization, interpretability and on-GPU databases https://arxiv.org/abs/2605.06997

45d1221

giyu_codes@giyu_codes

@julianeagu @databricks its easy you just look at the data.

jokes, but its a good problem to tackle.

45d1041

NOIR_BD@KanishkNoir

@julianeagu @databricks Oh I use genie almost daily for my SLM fine-tuning, and I work rigorously on optimizing multi-agentic workflows as well. Particularly my interest is in memory architecture of ai assistants and stateful agentic workflows.

PS: I'm not a 6 year exp tho. 1-3 yr exp is where I'll be

45d1011

Barath Velmurugan@barathvelmu

@julianeagu @databricks Hi Julia, I'm interested in the team and would love to learn more! Just sent a dm :)

45d981

Mog@GlockyMog

@julianeagu @databricks MIT engineer here. Interested!

45d741

Julia Neagu@julianeagu

@Tralallama in person!

45d147

saurabh singh rajput@SaurabhSin15850

@julianeagu @databricks Hi Julia,

I’m Saurabh Singh, a second-year student and an aspiring AI/full-stack developer. I’m still a fresher, but I’m deeply interested in AI systems, agents, and scalable engineering.

I’d love to connect and learn more about the team

45d140

Ajjay@Ajjay763856

@julianeagu @databricks Applied to this link, never received a response.

45d130

Farhan H@FarhanSoftware

@julianeagu @databricks Would you sponsor a visa or support remote role?

45d78