1d ago

Ray co-founder Robert Nishihara says Slurm and Ray are complementary after Midjourney's David Holz questions if developers still use Slurm

Slurm handles resource scheduling, while Ray manages distributed runtimes.

0
Original post

does everyone in AI still use SLURM? or have we moved to RAY? what's going on in cloud orchestration land nowadays?

12:27 PM · May 28, 2026 View on X

@DavidSHolz why do people compare ray and slurm ever

DavidDavid@DavidSHolz

does everyone in AI still use SLURM? or have we moved to RAY? what's going on in cloud orchestration land nowadays?

7:27 PM · May 28, 2026 · 68.2K Views
3:31 AM · May 29, 2026 · 580 Views

FWIW, I don't view Ray and Slurm as alternatives to each other, I think of them as solving different problems, e.g.,

Slurm is responsible for sharing compute resources among multiple workloads and multiple users. It provides workload multitenancy, queuing, prioritization, preemption, etc.

Ray is an actor framework and provides a distributed runtime for a single workload. It provides a single-controller programming model for distributed workloads, manages & coordinates processes, handles failures, etc.

It's very natural to run a Ray workload on top of Slurm, similar to how you'd run a Ray workload on top of Kubernetes.

DavidDavid@DavidSHolz

does everyone in AI still use SLURM? or have we moved to RAY? what's going on in cloud orchestration land nowadays?

7:27 PM · May 28, 2026 · 68.2K Views
11:39 PM · May 28, 2026 · 2.4K Views

@DavidSHolz I've written a bit about how I think about the layering. https://www.anyscale.com/blog/ai-compute-open-source-stack-kubernetes-ray-pytorch-vllm

Robert NishiharaRobert Nishihara@robertnishihara

FWIW, I don't view Ray and Slurm as alternatives to each other, I think of them as solving different problems, e.g., Slurm is responsible for sharing compute resources among multiple workloads and multiple users. It provides workload multitenancy, queuing, prioritization, preemption, etc. Ray is an actor framework and provides a distributed runtime for a single workload. It provides a single-controller programming model for distributed workloads, manages & coordinates processes, handles failures, etc. It's very natural to run a Ray workload on top of Slurm, similar to how you'd run a Ray workload on top of Kubernetes.

11:39 PM · May 28, 2026 · 2.4K Views
11:39 PM · May 28, 2026 · 1K Views

But to your original question, we see more Kubernetes (versus Slurm), but both are extremely popular. More specifically - Established tech companies have largely standardized on Kubernetes - AI startups are split between Slurm and Kubernetes - They often eventually shutdown the Slurm clusters and move to Kubernetes, but this is a very slow process - For batch jobs (training / data prep), research teams often prefer the Slurm developer experience versus Kubernetes - For running production inference services, Kubernetes is much better

Robert NishiharaRobert Nishihara@robertnishihara

@DavidSHolz I've written a bit about how I think about the layering. https://www.anyscale.com/blog/ai-compute-open-source-stack-kubernetes-ray-pytorch-vllm

11:39 PM · May 28, 2026 · 1K Views
11:46 PM · May 28, 2026 · 149 Views

@DavidSHolz torchx with k8s is nice

DavidDavid@DavidSHolz

does everyone in AI still use SLURM? or have we moved to RAY? what's going on in cloud orchestration land nowadays?

7:27 PM · May 28, 2026 · 68.2K Views
6:55 AM · May 29, 2026 · 4.5K Views

@DavidSHolz *nice enough

Matt HendersonMatt Henderson@matthen2

@DavidSHolz torchx with k8s is nice

6:55 AM · May 29, 2026 · 4.5K Views
7:14 AM · May 29, 2026 · 2.9K Views

@DavidSHolz Slurm will always have a special place in my heart, probably not a first choice after a certain scale of both compute and number of people using it

DavidDavid@DavidSHolz

does everyone in AI still use SLURM? or have we moved to RAY? what's going on in cloud orchestration land nowadays?

7:27 PM · May 28, 2026 · 68.2K Views
11:34 PM · May 28, 2026 · 917 Views