5h ago

Pieter Levels highlights that AI voice generation systems fail to incorporate background noise or environmental reverb, even from leaders like ElevenLabs.

Audio quality also lags in AI video models behind photorealistic visuals.

0
Original post

I'm just as surprised nobody in AI voice tech has realized a voice needs background and environmental noise to sound realistic Even @ElevenLabs the leader in voice AI can not produce voice with background noise, or environment reverb sound AI voices are always going to sound non-passable as human if they don't have that And it's only me and this other guy even talking about it

12:01 PM · May 19, 2026 View on X

I’ve been wondering this and my best guess is either 1. Human perception of errors. Small errors in pixel intensities of images might go unnoticed whereas for audio it may be much more impactful 2. If diffusion is the dominant choice for modeling, it could be related to the typical spectra of audio vs images and some of the nuances around adding noise to slowly destroy signal

9:59 PM · May 19, 2026 · 346 Views