Freeform preferences let the supervisor define relevant axes and then specify preferences along those axes.
Axes can be either a fixed rubric or freeform language.
This eliminates ambiguity, allows for thorough coverage of all axes, and provides more dense supervision.
Sparse rewards, progress metrics, and preferences are popular, but they - often neglect many aspects of a task - collapse many axes into one measure - frequently yield ambiguity and disagreement across annotators
We instead propose freeform preference learning


