@DimitrisPapail @Samhanknr wouldnt "tokens_turn_3” activations include also "pointers" to previous locations, which will messed up when masking?
what do you mean? the point is the following if i have <tokens_turn_1> <tokens_turn_2> mask turn one <tokens_turn_3> the KVs of tokens in turn 3 still have non textual information of turn 1.
now if you do <tokens_turn_1> <tokens_turn_2> <tokens_turn_3> idle session recompute KV states on <tokens_turn_2> <tokens_turn_3> you lose everything from turn 1 that's not captured by the TOKENS of turn 2 and 3

