@jeffreyhuber compressing many many trajectories into O(100K) tokens is always gonna be lossy, tokens are a very expensive form of memory in that a small number of bits gets expanded into a large memory size (KV) via a static transformation. vs model weights themselves have params ~= bits
@willccbb sure but that just could be bad compaction / memory
assume perfect context - what’s the limit?
