/Tech41d ago

Xavier Gonzalez releases a paper mapping Newton iterations to recurrent neural networks, Picard iterations to diffusion models, and Jacobi iterations to parallel LLM decoding

An attached video animates trajectories comparing the four methods from t=0 to t=30.

81512614213.7K

#1267

Original post

Scott Linderman#1347

Xavier Gonzalez@xavierjgonzalez

Fixed point iterations for parallelizing nonlinear dynamics is all the rage: - Newton for RNNs - Picard for diffusion models - Jacobi for parallel decode of LLMs

But how do these techniques relate, and when should you use them?

We show you how in our new paper 🧵

4:47 AM · May 19, 2026 · 13.7K Views

Sentiment

Many users are praising the new paper comparing fixed point iterations for parallel nonlinear dynamics because they see strong potential for massive speedups with few Newton iterations and describe the math as fruitful ground.

Pos

100.0%

Neg

0.0%

3 comments with sentiment.

Cluster Engagement

Digg Deeper

No Digg Deeper questions have been answered for this story yet.

Posts from X

Most Activity

VIEWS231BOOKMARKS3

Xavier Gonzalez@xavierjgonzalez

Overall, we unify the major fixed-point techniques for parallelizing sequential computation.

We provide the following guidance: pick the fixed-point method that best approximates the dynamics you care about.

paper: https://alphaxiv.org/abs/2509.21716 code: https://github.com/lindermanlab/parallelizing_with_lds

41d23123

LIKES5

Xavier Gonzalez@xavierjgonzalez

We show that all these techniques for parallelizing nonlinear dynamics reduce to repeating the following 2 steps: 1. Linearize the dynamics 2. Evaluate them in parallel. Newton, Picard, Jacobi etc just use different approximate linearizations.

41d19351

RETWEETS14

Xavier Gonzalez@xavierjgonzalez

Fixed point iterations for parallelizing nonlinear dynamics is all the rage: - Newton for RNNs - Picard for diffusion models - Jacobi for parallel decode of LLMs

But how do these techniques relate, and when should you use them?

We show you how in our new paper 🧵

41d13.7K151142

REPLIES1

Xavier Gonzalez@xavierjgonzalez

Why does linearizing the dynamics make them parallelizable?

Through the 🚀 parallel associative scan🚀

The pscan powers modern SSMs like S5 and mamba. It's also what lets us parallelize nonlinear dynamics.

We provide a gentle intro & tutorial on the parallel scan in Appendix B

41d17832

Xavier Gonzalez@xavierjgonzalez

For example, Picard iterations approximate the dynamics by the identity. So they are great for diffusion models, which use Euler discretization.

But, for the same reason, Picard iterations struggle with contractive systems, where Newton/Jacobi excel.

41d13242

Xavier Gonzalez@xavierjgonzalez

We prove theoretically how the error of the "approximate linearization" of the dynamics impacts how many fixed-point iterations are needed for convergence.

We show that better approximations converge faster. So, you should use fixed-point methods that fit your target problem.

41d14031

Xavier Gonzalez@xavierjgonzalez

A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems.

co-lead with @ekellbuch, supervised by @scott_linderman, and on the whole a great team effort with @hyundongleee @jerrywliu @heyyalexwang @dmzoltowski @Leokoz8 and Chris Re.

41d21321

Fatih Dinc@fatihdin4en

@xavierjgonzalez Great work Xavier!! Quick question, I heard that people are having trouble with accuracies at the inference when it comes to PDEs when they are trained with PINN style. Is that ever an issue here, or maybe I am making a spurious connection?

41d10811

Sam Duffield@Sam_Duffield

@xavierjgonzalez I love this work. It seems like Newton generally requires very few iterations which means in theory could provide massive speedup. Do you find the main obstacle there is extra memory requirements or time spent on the extra model calls to construct the Jacobians?

41d931

Hillbilly@love_ai_io

@xavierjgonzalez I’ve been poking around with similar math this week and fixed point iteration has a whole associated operator calculus. Really fruitful ground for studying optimizer behavior and convergence

41d