🚨 New paper: AI evaluation is structurally unsuitable for continual learning (CL). To address this, evaluation should be centred on the "behavioural trajectories" that CL systems develop, with the goals of characterising possible behaviours and forecasting their evolution. 🧵
Lorenzo Pacchiardi leads a paper arguing that conventional AI evaluation methods based on static models are structurally inadequate for continual learning systems and proposes recentering on behavioral trajectories
AI Judge changed title after evaluation, original title: "Lorenzo Pacchiardi leads a paper arguing that standard AI evaluation methods are structurally unsuitable for continual learning systems and proposes assessing behavioral trajectories instead"
Reposted by Gavin Leech, it was discussed among AI safety accounts.
Most Activity
Delighted to have coauthored this paper as part of a great team led by @LPacchiardi . What happens if we get continual learning to actually work in frontier AI models? Much of our current governance is based on periodic evaluation of static models. Such governance will break. We propose a direction for addressing this.
Yes, it's uncertain whether continual learning in frontier AI will be achieved, even if company leaders like Amodei are confident. But the evaluation and governance communities are struggling to keep up with the pace of change; we need to change that and start planning not just for what's here now, but what the research community is targeting as goals. Skate to where it looks like the puck is going, not where it is now.
(When I'm back in the autumn, I might work with the team on a more governance-focused companion piece).
🚨 New paper: AI evaluation is structurally unsuitable for continual learning (CL). To address this, evaluation should be centred on the "behavioural trajectories" that CL systems develop, with the goals of characterising possible behaviours and forecasting their evolution. 🧵
Joint work with @prpaskov @S_OhEigeartaigh @NandoMartinezP @katie_m_collins @FazlBarez, Jonathan Prunty, Matteo Mecattaf, @zfountas @RistoUuk @sanmikoyejo @CUdudec, José Hernández-Orallo
Paper website→https://cl-eval.github.io/
Pointers to related work & questions welcome🙏

What is "continual learning"? Three levels, by what changes:
🔹 CL1 (in-context): info accumulates within a session 🔹 CL2 (storage-based): persistent memory, RAG, agent skills 🔹 CL3 (parameter-based): weights change post deployment
CL1 & CL2 are already here. CL3 is coming.

CL failures over long interaction sequences:
• alignment and safety guardrails erode • propensity cross-contamination (e.g., OpenAI's "goblins") • unbalanced capability specialisation • cross-domain capability transfer • capability degradation

Pre-deployment trajectory sandbox + live predictive monitoring is a feasible alternative to continuously re-evaluating evolving systems.
They are effectively layered with input/output filters, transparent evolution methods, and broad indicators of CL systems' impacts on society.

How does current evaluation fall short?
By relying on pre-release benchmarks and red-teaming, it assumes systems don't change after deployment. This ignores the trajectory the system develops after deployment, leaving us with an incomplete understanding of the system's behaviour

But 2 obstacles from dynamical systems may bite:
🌪️ Chaotic sensitivity: small state/input changes diverge→forecasts fail beyond a horizon.
🌀 Multiplicity of attractors: sandboxes cover only a subset of reachable basins.
Whether they affect CL systems is empirical question.

Our way forward:
• Start trajectory evaluation on today's CL systems: learn where chaos & multi-attractor regimes bite • Co-design CL methods *amenable* to evaluation (contractive updates, intrinsic objectives, gated adaptation, circuit-breakers)
=> virtuous co-evolution

Instead of evaluating the released checkpoint, evaluators of CL systems should ask two questions:
🗺️ Landscape characterisation: what behaviours can the system reach & with what probability?
🔮 Trajectory forecasting: how will a deployed instance evolve from its current state?

How to operationalise this?
1️⃣ Trajectory elicitation sandboxes: controlled interactions, freezing learning and benchmarking at intervals to chart the evolving behaviour.
w Predictive monitors: forecast future behaviour from current state + upcoming inputs.
@S_OhEigeartaigh @LPacchiardi great to see more work on this idea!
Delighted to have coauthored this paper as part of a great team led by @LPacchiardi . What happens if we get continual learning to actually work in frontier AI models? Much of our current governance is based on periodic evaluation of static models. Such governance will break. We propose a direction for addressing this.
Yes, it's uncertain whether continual learning in frontier AI will be achieved, even if company leaders like Amodei are confident. But the evaluation and governance communities are struggling to keep up with the pace of change; we need to change that and start planning not just for what's here now, but what the research community is targeting as goals. Skate to where it looks like the puck is going, not where it is now.
(When I'm back in the autumn, I might work with the team on a more governance-focused companion piece).