HTM for music analysis

Welcome Jamie. A few of us HTM enthusiasts have placed a variety of information into the Community project [1]. From myself, a lot of that was during a time of starting to developing knowledge of sub-cortical and cortical auditory processing, HTM theory, and other machine intelligence and learning techniques. It’s certainly a place that can be used to enhance HTM related aspects of music/audio/speech analytics, including MIR.

Regards, Richard.



Thanks Richard - what a great resource, thanks for this. I am probably more in the symbolic music data space, but getting really interested in these audio projects. thank so much for for sharing this!

Hey Jos. This sounds really interesting. I am still mostly playing around with nupic tutorials and have not done anything substantial yet, but this kind of thing is where I am hoping to head. Can I just ask, what does the structure of your input data look like?

I have been coding up a musicXML to timestamped JSON node module recently to try and deal with how nupic seems to like input data(??) at and an example of kind of data it produces is at data is at:

The example data is a Keith Jarrett jazz piano solo which has lots intricate melodies, but also has repetitive chord structures sitting underneath it. There is also no repetition in the improvised melodic phrases (really unusual for a jazz musician) but within melodic phrases alot of similar structures start to appear, so its an interesting kind of data set to work with.

Anyway, feel like I have wandered off the introduce yourself topic, but would be interested to hear more about this.


Hey Jamie. For nupic, I only needed to encode each note/chord combination into some binary vector. Timing wasn’t important in my case, because I simplified Bachs chorales to only quarter note/chords. I tried several encodings, because the rule is that notes or chords that are almost the same, should have a lot of bits overlapping when encoded. That didn’t work very well, because notes close to each other make a big difference. If you take a C-chord (CEG) and you change the C in a C#, you get C#EG, a diminished C#-chord, something really different. So I used for the notes simple structures, like C = 10000000000, C# = 01000000000, D = 00100000000, … , B = 00000000001. For the chords I did the same. A Cmin chord becomes 100100010000 with an extra basenote to show the difference between chords like CEG, EGC and GCE. I didn’t look at the octaves. Because the input for the SP needed to be a lot bigger, each zero and one were replaced by a lot of zeros/ones.

Hey guys, I started this new topic about HTM for music to give you all a place to chat about it outside the Introduce Yourself thread (#htm-hackers is more visible to people than the #other-topics:off-topic forum).

Thanks for creating this new topic (forum anomaly detection…spooky :grinning:)

Thanks Jos - this is really cool, I can see what you mean, I am really interested to see where this goes as you feed in more information and more layers.

I am thinking this kind of thing must become really complicated very quickly right? Like if you have a large corpus of symbolic data from musical works, how do you manage the similarity between the relationship of a C maj-> C#dim progression in one place and Gmaj -> G#dim in another place?

Some (but not all) of the issues I am struggling with in dealing with this type of data are:

  • The problem that the human brain can a infer a notion of a tonal centre, and then move this tonal centre to something else (hearing a new key centre as as music modulates), so the semantics of the individual frequencies are affected by the local context of the music and this needs to be accounted for.

  • The problem of rhythm, not only in that offbeats and onbeats change the sound of how music is heard, but also how time signatures affect music, placing different weightings on certain beats (i.e. the sound of 6/8 time signature is really different from 4/4).

  • The problem of similarity - for example, having a melodic theme that recurs during a symphony, maybe goes from major to minor, or is augmented in some way, that is still identifiable by the brain but looks like a completely different type of pattern when its converted into data.

I guess all this speaks to the more general question of how on earth do you encode semantics of symbolic music data for nupic type analysis. I would love to hear if people ideas on this.

I saw an interesting paper from ISMIR 2009 at which people might find interesting.

I think that you have to start with simple things. I hope that nupic in music can be used for simple things like remembering a melody (has already been done), find chords for a simple melody (hope to do that) or learn some simple rhythms (seems doable). And maybe later, some of those things can added together to solve more complicated problems. Nupic is good at learning patterns, and nupic can in a hierarchical way learn patterns of patterns of patterns. And in that way hopefully solve some difficult problems.

Look at the Neural Nets. Single NN’s didn’t work very well. But when they started to add lots of NN’s in some kind of NN-network, suddenly it became capable of solving more difficult problems. And when you compare nupic (using a single encoder, SP and TM) with parts of your brain (a region) you know that only several regions together can solve more complicated things. The visual system in your brain need loads of regions, each outputting their results to other regions.

Thanks for the article. But one thing I noticed is that the writers were very positive in the conclusion about their HSMM-project. However when you look at newer articles of those same writers, you see that they only have an new article 4 years later, in which they don’t mention HSMM. See