Since pseudo inverse was discussed recently
Here are some interesting modifications from Fletcher and his comments on Marquardt
The post is from 2020, so like 6 years ago. Just thought of it while riding the Pinocchio ride at disney land
History: ⏳ Fletcher modified this method in 1971, he had this insight that instead of mixing in eps * I, one could use eps * diag(JJtr).
This neat insight is also useful when implementing Shampoo where you can blend between Shampoo and diagonal AdaGrad variant instead of SGD.



