I guess there's a few directions I could take this. Either 1) try on a larger corpus, 2) try to make cutting plane discovery faster, 3) drop the pretokenizer entirely. What's most interesting?
Thanks @Jan55028368 for suggesting I try IPM. Hilariously my final LP solve was simply stalling forever before this switch. I guess the perfectly cut LP is very degenerate 😅