Hello!
Today I wanted to share my project that I’ve built for my bachelors thesis (which is still in progress, but soon should be finished).
It combines cochlea simulation (CARFAC) with the HTM model to recognise notes from a piano. I find this very interesting, because it completes a loop of perception in one tiny package, all biologically inspired and might have some neuroscience to back it up. This also shows that CARFAC might be a great way to encode sound into SDRs for various tasks. Many interesting thoughts here, but I’ll write those in my thesis.
Also wanted to thank Numenta for developing Thousand Brains Theory and HTM model, I believe this work is the way forward! And at some point will be revolutionary. Also wanted to thank NuPIC community for creating such a clean c++ repository, the code quality / interfaces and ease of use is much appreciated.
So far I’m proud of what I did, and even in current state I like the project, but it has one last unfinished quirk - I wanted to train multi-label prediction (meaning classifying multiple notes at the same time), but the classifier seem to output one label anyway - pdf has one class with 0.9 and everything else is very low. I’ve tried to figure out why this happens, but I have not much time left, so I leave it there for now. If you have some suggestions or willing to look into my repo, it would also be appreciated.
This is awesome. I don’t understand what everything means, which could be because I haven’t encoded sound yet. Is the top right square the sdr you’re passing into the spatial pooler?
I don’t know what the 2 blue squares below it are, or what the red/green channels mean. Could you please explain?
It looks like you really are feeding in sound in real time, which means holding down a single note will mean dozens of steps with almost the same sdr coming in before the next note is pressed, and the predictions below are just that; the same note as the one you’re pressing. This pretty much makes the temporal memory aspect useless in your app, unless I’m missing something? I haven’t used nupic, but I imagine it doesn’t expect your application to be realtime. It needs a lot more tweaking to make it useful. This is fine, I personally think the Temporal Memory part of HTM is wrong.
Where’s your repo?
Yes, the small black square is thresholded and binarized output from CARFAC, the stabilized auditory image, also known as auditory correlogram - the large image. First blue square is SDR after spatial pooler and second one is SDR out of temporal memory. SDR is a 3D structure, so the colors are color mapped to represent activations if we look top-down and through.
Yes, you are correct that temporal memory isn’t very useful in this particular case, except that it is a memory. So, songs like metal or lofi basically do not activate TM. But! If we take a harder task of detecting a melody in a song… then it will be crucial.
Also, I think that TM might be a little useful here as well to provide temporal stability and correlating multiple frames into the current prediction.
As I’ve said, I didn’t have much time to experiment with this, but there’s certainly many more things to do.