it's BASED. they are linearly projecting raw samples into the transformer as patches for audio conditioning and its working. no freq domain priors, all the redundant phase info still present at the input, not even hardcoded STFT decomposition. 25 patches per second
Prime Intellect's kalomaze highlights an audio conditioning method that projects raw waveforms directly into transformers
The system processes 25 patches per second without STFT.
Most Activity
it's BASED. they are linearly projecting raw samples into the transformer as patches for audio conditioning and its working. no freq domain priors, all the redundant phase info still present at the input, not even hardcoded STFT decomposition. 25 patches per second