Stability AI co-founder Emad Mostaque seeks CLI-accessible, on-demand B200 GPUs, but Alex Smola warns of high costs
Story Overview
Stability AI co-founder Emad Mostaque, now building sovereign AI at Intelligent Internet, asked on X for on-demand B200 GPU rentals reachable via CLI so agents could provision instances themselves. Alex Smola replied directly that spinning up such hardware on demand carries far higher costs than long-term provisioned compute and suggested separating agent runtimes while using frameworks like vLLM or SGLang for inference.
Provisioning choices shape true compute expenses
Smola flagged that on-demand B200 usage could multiply expenses several times over compared with reserved capacity, while noting a single card would under-utilize the interconnect fabric needed for serious training runs.
Agent-friendly CLI remains unconfirmed across providers
Replies listed platforms such as Vast.ai and RunPod yet supplied no verified details on CLI-driven agent provisioning, leaving open whether any service currently matches the exact workflow Mostaque described.
Many users welcomed Stability AI founder’s push for on-demand B200 GPU rentals with CLI access for fast spin-up and sensible burst pricing, while one flagged the inefficient economics versus provisioned hardware.
No Digg Deeper questions have been answered for this story yet.
Most Activity

@EMostaque Prime Intellect

@EMostaque http://Vast.ai is good.

@EMostaque http://dapp.gpu.net now!

@EMostaque Prime Intellect. They have a CLI.
This is awful economics to spin up B200 on demand. You’re going to pay many multiples of cost relative to longtime provisioned compute.
(Speculating, since I don’t know your use case) you might not want to collocate the agent runtime. For inference there are so many better tools, such as vLLM or SGLang. For training, a single B200 is a waste. You’re not taking advantage of the connectivity.
What’s the best place to rent on demand B200s?
Ideally CLI so agents can spin them up

@EMostaque Need this yesterday.
Preferably without a procurement side quest.

@EMostaque Also @lium_io https://lium.io/

@EMostaque besides the usual aws ones, runpod usually has them ready but its hit or miss on availability

@EMostaque @TargonCompute https://targon.com/inventory

@EMostaque would you mind them being b300s just responded to someone not even an hour ago who was looking to rent them out somehow

@EMostaque prime, vast, nebius. aws also obviously

@EMostaque @modal @PrimeIntellect @sfcompute

@EMostaque @akt
B200 (180 GB HBM3e) at $5.00/hr on their public pricing page right now. There’s also a B300 option at $6.00/hr

@EMostaque CLI for scripts/agents on PyPI

@EMostaque I ended up running them on RunPod. Spins up in under 60 seconds, cli is solid, and the cost structure makes sense for burst workloads

@EMostaque Gke and set up with terraform and auto scaling . I had my lab set up I could activate and turn off a GPU node by deleting one # or adding it back in my repo . https://github.com/sirius0xdev/gcloud-lab

My PI skill just in case.
Use this skill before any Prime Intellect paid action: disk creation, CPU pod, GPU pod, storage attachment, SSH repair, teardown, or infrastructure drill. Hard Rules Do not create, attach, terminate, or mutate Prime resources without explicit user approval for that exact action. Read the current project's source-of-truth files, runbooks, configs, and operator notes before choosing resources or commands. Do not rely on stale chat memory for live IDs, prices, images, or keys. Prefer local runner/code validation before paid rental. Apply the paid-run parts of world-class-code: contracts, dependency freeze, canaries, throughput/utilization measurement, telemetry, checkpoints, resume, and teardown must exist before scale. Do not let paid resources sit idle while code, schemas, or operator commands are still being designed. Before renting paid compute, know the exact dependency plan for the job: Python version, system packages, CUDA/driver expectations, framework versions, package indexes, model/runtime libraries, install commands, and smoke-test commands. Do not discover dependency compatibility on an expensive rented pod unless the drill is explicitly for dependency discovery and cost-bounded. Treat Prime availability, pricing, wallet balance, disk status, pod status, SSH key settings, and provider/location compatibility as drift-prone. Refresh immediately before paid action. Keep artifacts no-secret and privacy-safe for the project. Never paste API keys, private keys, credentials, private datasets, raw URLs/query strings, token IDs, or other project-sensitive payloads into chat or public files. Provisioning Drill Before a real Prime run, create a drill plan that records: wallet, active pod list, disk list, GPU availability, disk availability; unfiltered GPU availability and disk-filtered GPU availability for any retained disk that must be attached; selected provider, location/datacenter, disk ID, GPU resource ID, image, vCPU/RAM/ephemeral disk sizing, persistent disk sizing, and hourly price; SSH public key name in Prime UI/API and local private key path; persistent disk ID, expected mount path, and proof command for /data; exact bootstrap dependencies and whether the image already includes CUDA, Python, Torch, and git; include pinned versions, install sources/indexes, compatibility rationale, and a one-command import/CUDA/model smoke test; exact code transfer method; default to source-only rsync from the local public repo unless a fresh authenticated git clone is known to be faster and reliable for that pod; teardown plan and closeout proof: no running pods, retained disks named, wallet or billing delta captured. Do the smallest paid drill that proves the missing property. If SSH/key handling is the question, prove SSH on a tiny pod before renting an expensive A100. If disk/GPU compatibility is the question, use the smallest compatible disposable disk/pod pair unless the project source-of-truth explicitly says a retained disk or specific resource must be used. Correct Prime Sequence Refresh read-only state with CLI/API/UI evidence. Confirm no unexpected running pods. Stop and ask if any exist. List available GPUs first, then filter by the retained disk before selecting:prime --plain availability list --output json prime --plain availability list --output json --disks <disk-id> if needed, add --provider, --gpu-type, and --gpu-count. Select compute only after recording the disk-compatible resource ID, provider, location, GPU count/type, vCPU/RAM, ephemeral disk minimum/default, image options, and price. Do not reuse remembered IDs without a fresh disk-filtered check.
Confirm SSH key configuration before creation:Prime CLI: prime config view and prime config set-ssh-key-path. Prime UI: profile/settings key name and generated SSH command. Local: private key path exists and has restrictive permissions.
Create only the approved resource. For CLI pod creation, pass explicit --disk-size and --image; the CLI may otherwise prompt and cancel in non-interactive agent runs. Confirm valid values from fresh availability, UI prompts, or official docs instead of assuming defaults. Wait for status ready, then prove SSH with a short non-mutating command. Prove disk mount with df -h, mount, and a bounded write/read marker under /data. Bootstrap from the prewritten pinned dependency plan before running the job. The plan must already name Python, system packages, CUDA/driver constraints, ML framework versions, package indexes, model/runtime library versions, and smoke commands. Verify Python version, Torch CUDA version, http://torch.cuda.is_available(), and device name before model work. Transfer code. Default to source-only rsync of runnable repo surfaces (for example source packages, tests/, configs/, docs/, and packaging files) and exclude .git, generated reports, virtualenvs, caches, raw/private artifacts, and large model/cache files. Use git clone only when the repo is public/authenticated from the pod, the exact commit is needed in .git, and clone is proven reliable. Run only the approved canary command. Sync or record approved no-raw evidence. Terminate pods after canary/failure/completion unless the user explicitly says to keep them running. Verify closeout: pods=[] or no unexpected running pods; retained disks are deliberate and named. Paid Job Gate For any paid Prime job: Build and test everything that can be validated locally before rental unless there is a deliberate written reason to rent early. Freeze the dependency/bootstrap plan before rental. If the plan is incomplete, stop and complete it locally or in a cheap explicit drill before starting an expensive GPU run. Do not rent an expensive GPU while CPU/network/setup work is still discovering whether enough GPU-ready work exists to keep the accelerator busy. Run a bounded canary before scale. Stop after canaries or medium tranches to review throughput, yield/output quality, disk/cache growth, error classes, GPU telemetry or API concurrency, cost per useful unit, and resume evidence before scaling. Keep paid accelerators highly utilized during the GPU phase: pre-stage data where possible, use appropriate batching/prefetching, and collect utilization plus input-wait telemetry. Failure Response If SSH fails, do not keep provisioning variants blindly. Stop after one failed approved attempt, preserve pod/disk status evidence, close out paid resources if they are unusable, and identify whether the root issue is key selection, image, provider/location, status not ready, network, or Prime-side configuration. If a disk warning appears, distinguish quota/allocation warnings from actual filesystem fullness. Verify with Prime disk details plus remote df -h /data when SSH is available. Do not terminate a retained final disk unless the user explicitly approves that exact destructive action. For command patterns and source links, read references/prime-intellect-operator.md only when doing Prime work.

@EMostaque You guys know what IaC is right? Terraform? Pulumi? Crossplane? Have you heard of any of these?

@EMostaque Go work for Nvidia

@EMostaque @lium_io and @TargonCompute