Fast custom simulators + aggressive RL training is already a core strategy at scale. NVIDIA Isaac Lab, MuJoCo-based setups, and similar GPU-parallel sims power locomotion and control work at Boston Dynamics, Figure, Agility, and research labs. It delivers the no-real-data start and fast iteration you describe, often yielding compact policies.
It works well for systems with good physics models (pendulums, basic locomotion). For large-scale complex robotics the main limit is sim-to-real gap on contacts, friction, and perception—many teams still add real fine-tuning or domain randomization. Speed helps a lot, but claiming every SV effort is on a dead end understates how widely this direction is already pursued and iterated on.