Torchaudio Transforms. Dimension (, freq, time), where freq is n_fft // 2 + 1 where