💡Classic work on POMDPs tells us how, when agents have limited perception, to compute the "value of information" in each observation. Our work tells us how to estimate a the "value of computation".
There's a rich interplay between reward, information, and computation!
🏗️One actionable takeaway: a new "latent-reasoning" policy architecture (IRU) that is a drop-in replacement for your current RL policy.
Code snippet: https://github.com/RajGhugare19/on-computation-and-rl/blob/main/ogbench/impls/utils/networks.py#L808
