9h ago

Pedro A. Ortega, former Google DeepMind AI Safety Team lead now at daiosai, outlines a revealed-preference method to recover lower and certificate frontiers from black-box model behavior in a new arXiv paper

The work connects rate-distortion theory to generalization bounds and bounded rationality.

0
Original post

Following revealed-preference logic, the lower frontier and certificate frontier can be recovered from black-box behavior using loss scaling, operating paths, and local loss perturbations.

8:23 AM · May 19, 2026 View on X
Reposted by

Paper: https://arxiv.org/pdf/2605.15340

Sparked by a conversation with @roydanroy on rate-distortion and generalization and his prior work with @gkdziugaite, which helped me to understand the connection to bounded rationality.

Pedro A. OrtegaPedro A. Ortega@AdaptiveAgents

Following revealed-preference logic, the lower frontier and certificate frontier can be recovered from black-box behavior using loss scaling, operating paths, and local loss perturbations.

3:23 PM · May 19, 2026 · 188 Views
3:23 PM · May 19, 2026 · 1.1K Views