New Frequency Encoder (a.k.a. Cochlea Encoder)

Hi all,

Thomas Miconi and I made a Frequency Encoder. This is loosely inspired from how the human cochlea encodes sounds. The design and properties of the FrequencyEncoder are detailed in the README.

Additionally, there is a notebook where this encoder is applied to a collection of phonemes.

Let me know what you think :slight_smile:

Marion

10 Likes

Thanks Marion,

Coincidentally, I’m struggling with something like that right now :slight_smile: I thought that approach, but I dont know if it is “enouhg” variation-robust. If you add some noise, the resulting SDR might be not close enough to respect the “rules”. Aditionally if you change the voice (i.e. changing the glottis fundamental frequency), the SDR representation might change significanlty for the same phonem.

In any case I think we have a higher frequency resolution in the lower bands. Perhaps the frequency bins can be non-linear.

I was thinking the “LPC way” (i.e. formant analysis). Do you have considered that approach?

Hi @vpuente,

To increase robustness to noise, I am using the freqBinW parameter (this sets the parameter w in the scalar encoder used by the frequency encoder). There is an example with the phonemes where w=1 and another where w=3 . You can increase the value more, and also play with the value of freqBinN (the param n in the scalar encoder).

In the phonemes experiment it looks like it’s increasing robustness to noise.

However, there might be better ways to extract frequencies information than with the power spectrum. I have not tried other approaches. You can look at the method getFreqs(). The logic extracting the frequencies could be changed there.

Marion

Hello @marion,

That’s pretty cool.

Do you think it’d be possible to set it up to use alternate number encoders as an option, like the adaptive scalar or log encoders? I think it’d be interesting to see which number encoder performed best on recognizing different audio.

Hi @SimLeek ,

Absolutely. In place of the scalar encoder, you could use the RandomDistributedScalarEncoder. I used the ScalarEncoder here because it makes it a bit easier to see the power spectrum discretization in the encoder visualizations.

You could try to see if the test on the Phonemes (see ipython notebook in first post) improves by using the RDSE.

Marion