@DimitrisPapail Amazing, fable could one shot this, but may ban you.
Another question was chebyshev lr schedule to avoid momentum
Is momentum allowed?
@_arohan_ pushing to Idea descent by codex. see what happens..
Comparing both methods would double Papailiopoulos's compute burn rate.
@DimitrisPapail Amazing, fable could one shot this, but may ban you.
Another question was chebyshev lr schedule to avoid momentum
Is momentum allowed?
@_arohan_ pushing to Idea descent by codex. see what happens..
Users approved suggestions on momentum with Chebyshev learning rate schedules as a good idea and fondly recalled enjoyable prior work at Anthropic.
OK i am allowing momentum. let's see
still looking bad
@DimitrisPapail This should go much better
OK i am allowing momentum. let's see
@_arohan_ no allowed yet but i will allow it in 2-3 hours when i lose hope about vanilla SGD (and by hope i mean money)
@DimitrisPapail Amazing, fable could one shot this, but may ban you.
Another question was chebyshev lr schedule to avoid momentum
Is momentum allowed?
@_arohan_ do you think fable would be legitimately better? i can give it a shot in parallel but that means 2x burn rate :(
@DimitrisPapail Amazing, fable could one shot this, but may ban you.
Another question was chebyshev lr schedule to avoid momentum
Is momentum allowed?
@giffmana no init play, not sure if it's allowed in the original speedrun leaderboard thingy
@DimitrisPapail But, did it play with warmup and inits at all?
@DimitrisPapail But, did it play with warmup and inits at all?
@DimitrisPapail This should go much better

@DimitrisPapail Yeah being at Anthropic and one shotting stuff was really fun. I was pretty addicted

@DimitrisPapail If you push to github, we can use it as ledger. I am going on a vacation soon, so will have some time

@_arohan_ ok ill try it
@giffmana its going a bit better but not crazy better. trying weird LR schedules and grad normalization and clipping.
@DimitrisPapail This should go much better

@_arohan_ good idea, will do that later today
Comparing both methods would double Papailiopoulos's compute burn rate.
@DimitrisPapail Amazing, fable could one shot this, but may ban you.
Another question was chebyshev lr schedule to avoid momentum
Is momentum allowed?
@_arohan_ pushing to Idea descent by codex. see what happens..