Learning a collection of features where order doesn't matter

This is the first of a series of tangential theories based on HTM to explore what I believe to be capabilities that could help in simulating parts of the sensory motor system. These may have no basis in reality, and may ultimately have no bearing on that system, but could still be useful for other machine learning problems. The purpose of these threads will be to collaborate in exploring different problems in the hope of distilling useful information.

This first theory involves tweaking elements of HTM so that it can be used to learn and recognize a collection of features in a system where order doesn’t matter. This would be useful in a system where there is some finite number of features, and the order in which they are input can be random. It assumes that the order the features are input doesn’t matter – what matters is the overall collection of features.

In the below explanation of the theory, I use examples touching features of a physical object with a finger. Touch is not a necessary element of this theory, but is helpful in explaining and exploring the theory (and is how I originally thought about it). I decided to explore touch rather than vision because it’s relatively easier for me to think about. Initially, I am also just focusing on sensing differences of pressure. Obviously there is a lot more that can be detected through touch, like temperature, softness, stickiness, etc, but starting with pressure for simplicity.

One thought I had is that objects can be learned and recognized without involving the motor control system. For example if I close my eyes and hold out my finger while someone moves an object against it randomly, I can just as easily recognize a pen, coffee cup, etc as if I were moving my finger to explore the object. I can also learn new objects this way. The important thing to note about this scenario is that I have no way of knowing what order the inputs will be coming, but I can still recognize and learn objects nearly as well as if I were controlling the input through my own motor commands. My thought was to see if I could take this simpler case even a step further to design a system for recognizing an object without requiring depictions of “object space” or “sensor space” at all.

The first idea I had was it might be possible to use semantic meaning of SDR bits to avoid the need for transforming and translating between coordinate spaces. The idea would be that similar sensory inputs should have similar semantic encoding, regardless of the orientation. Imagine a square pressure sensor at the end of your finger. You can detect an edge from many different orientations. Regardless of the orientation, the same edge should always have virtually the same semantic encoding:

Another example of three inputs that should have virtually the same semantic encoding (maybe this could be something like a gap between a point and an edge):

And one more example (side of something long like a pen):

The idea is that an object consists of some collection of semantically dissimilar features. As such, the object can be depicted as a union of those features. So for example, the above three features might be encoded as (using dense SDRs here for explanation purposes):

=

=

=

By touching the object in various orientations, I can learn a union of these three features, which would be something like:

From the above, you can see that learning an object doesn’t require any particular order. This is where the theory starts to diverge from the typical sequence learning. In sequence learning, the difference between ABC and CBA matters, but in this theory, when learning an object both are the same.

So to move away from a system where the order of input matters, to a system where the order doesn’t matter, but instead the unique features are what matters, a few tweaks have to be made (this is where we start to diverge from HTM)

The first thing we need to do is move away from thinking in terms of discrete time (T-1, T, T+1, T+2, etc) and instead have cells learn within some range of time (which I will refer to as the “learning period” for lack of a better term). Enforcement of synapses should not be limited to only between cells that are currently active and cells which were active at T-1. Synapses should be enforced between any cells that were active during the learning period.

Next, the order of activation does not matter. Synapses with cells that become active after a particular cell was active during the learning period should be reinforced as well as synapses with cells that were active before the cell was active. From a sequence perspective, that would be like learning ABC ABC ABC ABC… At this step in sequence learning, C would enforce its synapses with B. In the new system, A and B would also enforce their synapses with C. In other words, a future input of B should put both A and C into the predictive state, not just C.

What this means is that inputting one feature of an object should result in the cells which represent all other features of that object going to the predictive state. In other words, any of the features of the object might be sensed next, in no particular order. Therefore the prediction should be all other features of the object.

Obviously this is a very rough theory, and may have no basis in reality, so looking for folks to poke holes in it as a way of hopefully distilling out something useful. I’ll start the hole-poking myself by pointing out one of the main problems with this theory, which is the concept of the “learning period”. As written, it would require learning one object at a time, and waiting some time between objects for the "learning period’ to end. If I were to start touching a second object too quickly, it’s features become encoded along with the first object. This becomes particularly a problem if you try to extend this theory to vision, where jumping quickly between objects is the norm and not the exception.

Another problem is when a feature is shared between multiple objects, and that feature is input. Initially, the features of all objects which share that feature would be predicted (which should be expected), but I haven’t thought of how sensing other features would “narrow down” the prediction to one object. I am tackling this problem first before I go back to the “learning period” problem.

I’m also developing a simple proof-of-concept app to try and see how well the theory holds up in practice, so I’ll post a link to that when it is complete.

I thought I would talk about where I am at with solving the case of a feature shared between multiple objects. I have started looking at the problem from the perspective of a system that has already learned two objects. Each object consists of three distinct features. One feature is shared between the objects. My thought is if I can determine how the system should behave in this scenario, I should be able to work backwards from there to determine how the objects should be learned in the first place.

Lets call the features A, B, C, D, and E. Object 1 consists of features A, B, and C. Object 2 consists of features C, D, and E. Feature C is shared between the two objects.

One thing that will help is if we can distinguish between feature C when it is part of Object 1 versus when it is part of Object 2. We have something like this in sequence memory (C that comes after a B which came after an A is different than C which comes after a D which came after an E). The difference is that with this theory, order doesn’t matter. So we should be able to distinguish a C which is part of collection ABC from a C which is part of collection CDE. Unlike sequence memory, however, a C which is in ABC should be the same as a C which is in BCA, BAC, CBA, etc. And the C which is in CDE should be the same as a C which is in DCE, EDC, etc.

Lets say the system is not in a “learning phase” (I don’t particularly like that concept, but I’ll tackle that problem later). In other words, there are no cells in a predictive state. With that in mind, the first input of C should result in bursting columns, which causes the other features of both objects to enter the predictive state:

C:

From there, if the next input is A or B, I can reinforce the synapses predicting Object 1 and weaken those predicting Object 2:

A: B:

Or if instead the next input were D or E, I can reinforce the synapses predicting Object 2 and weaken those predicting Object 1:

D: E:

Paul,
A comment on this statement.
“One thought I had is that objects can be learned and recognized without involving the motor control system. For example if I close my eyes and hold out my finger while someone moves an object against it randomly, I can just as easily recognize a pen, coffee cup, etc as if I were moving my finger to explore the object.”

I dont agree , You are able to recognize the pen be cause you have previously seen\touched\sensed the pen using your own motor controls. Further you are moving the object against your fingers so there is motor information playing into the senses here, those are the same senses you sense when you move your fingers across an object. Try the experiment again. Get the help of a friend. Blind fold your self and try objects that you have not seen or use often. Also let your friend just push the part of the object against your finger and not move it agains your finger. Tell us what you learn. It is possible to recognize objects which have prominent features that can be identified in the 1sq cm space of your finger tip. So try larger objects with spatially disperse features. I think you will not be able to identify them.

My 2nd point is Sensori Motor methods are essential, but the word motor is misleading. You need the ability to identify features and the relative postiion and direction btw those features. “Motor” is a necessity of biological sensory organs and may or may not be needed with computational systems where data is already captured.

Regards
Chandan

Excellent point, and something I had tried. Without sensing directional movement, a 3 dimentional mental model of the object cannot be built.

I am making a couple of points that I hope can be used to allow tackling a small subset of the larger problem. Firstly, that I don’t need to control the input to be able to learn or recognize objects (so can ignore the “copy” of my own motor commands that is mentioned in a few videos, as it is not a necessary component of learning an object). And secondly (more controversial), even without sensing movement (dragging), objects can still be learned and recognized, as long as they include enough distinct (small) features that can be felt with the end of one finger.

Of course, I won’t be able to build a 3D mental model of the object without a concept of object space, and I won’t be able to recognize the object visually when I look at it later, so of course coordinate information is important. But can useful elements of the larger system be understood and simulated without solving those other aspects of the system? Maybe not – coordinate spaces may be the entire foundation of the system. :slight_smile:

I have renamed this thread and updated the original post to be more precise about the scope and purpose of the theory. I have several other tangential theories that will make more sense to post in separate threads for discussion rather than mixing them all together in one thread. Hopefully that will help with the flow of conversation, and reduce possible confusion.

Hi @Paul_Lamb,

Thanks for sharing your ideas here. I agree with @chandan.maruthi that touch might not be the best example for your thought experiment, because touch is almost invariably dependent on the detection of relative motion of object or sensor.

Perhaps a better example would be using hearing to identify an animal by the sounds it makes. Thus a cat would be identified if one heard purring and miaowing and scratching at the door, while a dog would be indicated if one heard panting, sniffing, and scratching at the door. The common feature would be the scratching, the other features would be needed to make the correct decision.

This example certainly does not require any idea of ordering, as the choice of which sound to make is up to the animal, and can be considered random from our point of view. I hope this might serve as a clearer example for your idea, if it avoids the most important weaknesses of the example of touch.

Regarding your description of the connection to HTM, I don’t agree that there is a divergence. We don’t yet have a consensus on how Temporal Pooling works, but one thing most of us agree on is that a version of TP must exist for sets of SDRs which may appear in arbitrary order (or at least which do not always appear in the same order). Saccading over a face is an example which is commonly used for this version of TP.

To make this kind of Temporal Pooling work, all you need is a mechanism to accumulate evidence (in the form of activation) over multiple timesteps in the pooling layer, and for the lower layer to learn to predict all the common transitions from one member of the set to another. In my cat-dog example, panting would predict scratching and sniffing, purring would predict miaowing and scratching, while scratching would predict sounds from both cats and dogs.

The features in an unordered sequence could also be encoded with positional information. As in my other thread Learning an ordered sequence out of order this would rely on the concept of “current position”, which would be provided by some external system or process. Current position could be a number of things: the slider bar on a media player, positions on an object, etc.

“Current position” could be encoded along with each feature using the spacial pooler. For example, connections could be grown from the columns to another set of inputs that represent the current position. This would give the ability to control the weight of the current position in the generated SDR. The result would be not just a collection of features, but also their positions on the object.

That’s correct, and you could also encode the “movement” which causes the transition. This is one of the areas of research in sensorimotor integration. One theory is that, early in life, the subcortical systems generate movements, the cortex first learns by receiving copies of both movement and sensory inputs, and then learns to generate motor outputs which generate the same outcomes.

I’ve given the “learning period” some thought, and have a couple of strategies to address the problem.

The first thought I had was that the “learning period” doesn’t need to be very long. It actually could be a very short “cooldown” period after a neuron was active. I had been thinking of the problem from the perspective of associating every feature of an object with every other feature of the object. This is actually not necessary. Instead, we can have each feature of an object associated with a couple other features of the object, which themselves are associated with a couple other features of the object and so-on. The more time spent encountering the object’s features in various different orders, the more features will be associated with each other (and the easier it will be to recognize the object later). But the point is that strong associations can be learned over time, and don’t need to happen all at once.

Another benefit of this weaker association and short learning period is when I go from a recognized set of features to a set of features that was not predicted (for example I was expecting to see feature A at a certain position, and instead saw feature C because I moved to another object). This enables transition between different two objects without having to “turn off” learning or wait a long time between objects to prevent encoding features of the second object as part of the first object. It should also make the transition from wrong predictions to correct ones faster, by having fewer of the second object’s features being encoded into the first object. It also provides a pathway of remembering associations between two objects that are frequently encountered close to each other in time.

Paul,

This is an interesting and very amusing exercise, thank you for sharing your ideas. I just want to interject a brief comment of something that occurred to me. As I was reading over your theories and ideas the thought that occurred to me is:

Is there any experiential occurrence that happens without order?

Is it even possible to process an existential instance that occurs outside the context of time? This eventuality was discussed in On Intelligence using the example of a person waking up with their hand in a bucket of sand and not knowing what their hand was in. Essentially, until the person moved their hand and fingers and the individual grains of sand moved over their skin exciting receptive sensors on their hand - they are unable to tell what it is that their hand is in? An interesting thought.

EDIT: Also I’d like to entertain the realization that 1.) The existence of a thing, and 2.) The observation of the existence of a thing, are two different things.

You make an excellent point about the concept of “order” in a general sense. To clarify, what I specifically mean by order in the context of this theory is the sequential order in which features are experienced, not a more general sense of order and design in the object or system as a whole. There is of course that second type of order as well, and it does matter in the context of this theory. There could even be a sequential element to it, as I theorized on my other thread, or that “order” could just be information about position of features without any sequential element to it.

The point I am theorizing here is that although I experience inputs as sequences over time, the order of those sequences are not always important and need to be learned (depending on the problem of course). For example, I can touch the features of an object in any random sequential order (and I can spend more points in the sequence touching certain features than others). I don’t need to remember what order I touched things in, only that they are all associated with the object. Hopefully that clarifies my thinking on this theory.

EDIT: I should also point out that “where order doesn’t matter” doesn’t imply “where time doesn’t matter”. Inputs changing over time are how things are learned in HTM, and that central concept still applies in a system “where order doesn’t matter”.

It also occurred to me that if I were to wake up with two pins stuck into the end of my finger, I probably wouldn’t need to move my hand to recognize it (although that’s one theory I’m not going to test :smile:)

1 Like

Hi Paul,

That’s a great point (the difference between sequence and “order”). I think I was collapsing the idea of sequence and order into the same thing. I wonder if when touching the cylindrical body of a pen and its tiny “pocket clip” with ones eyes closed; whether the “size” of the thing you touch eliminates a wide range of possibilities and must be realized first before one can say with certainty that what one is touching is a pen?

Said differently, I wonder if there has to be a preliminary context such as now I’m in my hallway, and it’s supper time, so I can expect to smell certain “dinner smells” coming from the kitchen. And whether context as a first occurrence, is part of a necessary sequence, although one is merely judging that what one is smelling is dinner.

And so I wonder if context could be counted - and prepended to isolated events to, in effect, grant them occurrence within a sequence? Just another thought…?

Context is another thing that has fascinated me. My thinking is similar to yours, that it is itself just another collection of features (perhaps more abstract or a higher-order collection of lower-order objects). In your example, the “object” includes features of your location, time of day, and odors. Fergalbyrne gave another great example of this, where two “objects” sharing some features (at the door + scratching) and have some distinct features (panting + sniffing or purring + miaowing). You could think of the two shared features as the context (features that predict multiple objects) or they could also be thought of as simply part of the object.

A simpler example of context to think about is Pavlov’s studies of classical conditioning. Dogs were fed food upon ringing a bell, and they later begin salivating when they hear a bell. The bell ringing is a feature of an object which includes bell ringing and food. And it is also a context which the dog recognizes and comes to predict that there will be food.

“This first theory involves tweaking elements of HTM so that it can be used to learn and recognize a collection of features in a system where order doesn’t matter. This would be useful in a system where there is some finite number of features, and the order in which they are input can be random. It assumes that the order the features are input doesn’t matter – what matters is the overall collection of features.”

So, here is a working implementation of that idea, for the cat/dog example:

– start with this knowledge:
sa: dump
|context> => |context: animal sounds>
sounds-it-makes |cat> => |purring> + |miaowing> + |scratching at the door>
sounds-it-makes |dog> => |panting> + |sniffing> + |scratching at the door>

– input “scratching at the door”
– and observe it could be equally likely to be a cat or a dog
sa: normalize similar-input[sounds-it-makes] |scratching at the door>
0.5|cat> + 0.5|dog>

– this time input “scratching at the door” and “sniffing”
– and observe it is now more likely to be a dog
sa: normalize similar-input[sounds-it-makes] (|scratching at the door> + |sniffing>)
0.667|dog> + 0.333|cat>

Anyway, a trivial example, but it is easy to extend it to larger examples.
eg, I have a worked example where given face features we can guess who it might be.

So, the relevance to HTM?
Well, we can map this example to “SDR space” and get the same result:

– build up some knowledge:
context animal sound SDR’s
full |range> => range(|1>,|100>)

– encode our concepts to random 5 bit on out of 100 SDR’s:
encode |purring> => pick[5] full |range>
encode |miaowing> => pick[5] full |range>
encode |scratching at the door> => pick[5] full |range>
encode |panting> => pick[5] full |range>
encode |sniffing> => pick[5] full |range>

– generate cat and dog sounds SDR’s:
– in this case just adding SDR’s but we could alternatively union them
sounds-it-makes |cat> => encode (|purring> + |miaowing> + |scratching at the door>)
sounds-it-makes |dog> => encode (|panting> + |sniffing> + |scratching at the door>)

– have a look at what we now know:
sa: dump
|context> => |context: animal sound SDR’s>

full |range> => |1> + |2> + |3> + |4> + |5> + |6> + |7> + |8> + |9> + |10> + |11> + |12> + |13> + |14> + |15> + |16> + |17> + |18> + |19> + |20> + |21> + |22> + |23> + |24> + |25> + |26> + |27> + |28> + |29> + |30> + |31> + |32> + |33> + |34> + |35> + |36> + |37> + |38> + |39> + |40> + |41> + |42> + |43> + |44> + |45> + |46> + |47> + |48> + |49> + |50> + |51> + |52> + |53> + |54> + |55> + |56> + |57> + |58> + |59> + |60> + |61> + |62> + |63> + |64> + |65> + |66> + |67> + |68> + |69> + |70> + |71> + |72> + |73> + |74> + |75> + |76> + |77> + |78> + |79> + |80> + |81> + |82> + |83> + |84> + |85> + |86> + |87> + |88> + |89> + |90> + |91> + |92> + |93> + |94> + |95> + |96> + |97> + |98> + |99> + |100>

encode |purring> => |49> + |16> + |8> + |95> + |46>
encode |miaowing> => |90> + |1> + |49> + |57> + |43>
encode |scratching at the door> => |44> + |26> + |13> + |9> + |39>
encode |panting> => |23> + |14> + |19> + |65> + |24>
encode |sniffing> => |43> + |33> + |85> + |99> + |44>

sounds-it-makes |cat> => 2|49> + |16> + |8> + |95> + |46> + |90> + |1> + |57> + |43> + |44> + |26> + |13> + |9> + |39>
sounds-it-makes |dog> => |23> + |14> + |19> + |65> + |24> + |43> + |33> + |85> + |99> + 2|44> + |26> + |13> + |9> + |39>

– now input the “scratching at the door” SDR:
– and observe it could be a cat or a dog, taking noise into account.
sa: normalize similar-input[sounds-it-makes] encode |scratching at the door>
0.545|dog> + 0.455|cat>

– this time input “scratching at the door” and “sniffing”
– and observe it is now more likely to be a dog
sa: normalize similar-input[sounds-it-makes] encode (|scratching at the door> + |sniffing>)
0.625|dog> + 0.375|cat>

– and once again, let’s see if we have a cat from “purring” and “miaowing”:
sa: normalize similar-input[sounds-it-makes] encode (|purring> + |miaowing>)
0.909|cat> + 0.091|dog>

And we see our example works as desired. So the question becomes, how would you reproduce this example in HTM?

As for “learning period” my guess is that it is different at different places in the brain. In the visual system, say watching TV it is roughly at 25 Hz. Anything faster blurs together, which is why TV’s work. When in conversation with a friend, the learning period is probably the time it takes to hear a phrase or sentence. When looking at a face, the learning period is probably the time to scan eyes, nose, lips, hair and so on. So we will still have discrete time steps, but we just have clocks running at different speeds. And the SDR’s seen within a single time step on these clocks are added/unioned together to make a compound SDR.

Just my 2c.

I can extend my example to reproduce that. To do this, we need a couple more layers (ie, operators), making it a 7 neuron system. 5 “encode” neurons, and 2 “sounds-it-makes” neurons.

– input panting and observe scratching and sniffing are the most likely:
sa: similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |panting>
0.4|scratching at the door> + 0.4|sniffing> + 0.333|panting> + 0.067|miaowing>

– input purring and observe purring and miaowing is the most likely:
sa: similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |purring>
0.4|purring> + 0.4|miaowing> + 0.333|scratching at the door> + 0.133|sniffing>

– finaly, input scratching and all the sounds are predicted:
sa: similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |scratching at the door>
0.37|scratching at the door> + 0.279|sniffing> + 0.218|miaowing> + 0.182|panting> + 0.182|purring>

Let’s tidy it up by defing an operator, so that it reads more like natural language.
First, we need this operator (yeah, now we are up to a 6 layer system):
I-predict-from |*> #=> list-to-words drop-below[0.18] similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |_self>

– now use it:
sa: I-predict-from |panting>
|scratching at the door, sniffing and panting>

sa: I-predict-from |purring>
|purring, miaowing and scratching at the door>

sa: I-predict-from |scratching at the door>
|scratching at the door, sniffing, miaowing, panting and purring>

I guess this is somewhat dense and abstract to understand, but underneath it all, the fundamental data-type in this notation is the SDR. The essential idea is we are using operators to step/map from SDR to SDR, starting from the input at the right hand side. So in the operator composition regard, it shares some similarity with APL, Forth or pipe on the commandline. So perhaps the easiest way to explain it, is to unpack it operator by operator:

– map “panting” to its’ SDR:
– which is sparse matrix multiplication in disguise, where “encode” is the matrix, and “panting” is the only on bit in the input vector
sa: encode |panting>
|23> + |14> + |19> + |65> + |24>

– use our similarity measure to compare the panting SDR against all patterns defined with respect to the “sounds-it-makes” operator:
– in this case “cat” and “dog” making this a 2-neuron system
sa: similar-input[sounds-it-makes] encode |panting>
0.333|dog>

– apply the “sounds-it-makes” matrix to the “dog” vector:
sa: sounds-it-makes similar-input[sounds-it-makes] encode |panting>
0.333|23> + 0.333|14> + 0.333|19> + 0.333|65> + 0.333|24> + 0.333|43> + 0.333|33> + 0.333|85> + 0.333|99> + 0.667|44> + 0.333|26> +

0.333|13> + 0.333|9> + 0.333|39>

– use our similarity measure again, this time against all patterns defined with respect to the “encode” operator:
– in this case “miaowing”, “panting”, “purring”, “scratching” and “sniffing”, making this a 5-neuron system
sa: similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |panting>
0.4|scratching at the door> + 0.4|sniffing> + 0.333|panting> + 0.067|miaowing>

– remove from the SDR all elements with coefficient below 0.18:
sa: drop-below[0.18] similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |panting>
0.4|scratching at the door> + 0.4|sniffing> + 0.333|panting>

– apply the list-to-words operator that converts |a> + |b> + |c> + |d> to |a, b, c and d>:
sa: list-to-words drop-below[0.18] similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |panting>
|scratching at the door, sniffing and panting>

Here is the knowledge that we have in the background.
The “encode” matrix that maps concepts to random SDR’s:

sa: matrix[encode]
[ 1  ] = [  1  0  0  0  0  ] [ miaowing               ]
[ 8  ]   [  0  0  1  0  0  ] [ panting                ]
[ 9  ]   [  0  0  0  1  0  ] [ purring                ]
[ 13 ]   [  0  0  0  1  0  ] [ scratching at the door ]
[ 14 ]   [  0  1  0  0  0  ] [ sniffing               ]
[ 16 ]   [  0  0  1  0  0  ]
[ 19 ]   [  0  1  0  0  0  ]
[ 23 ]   [  0  1  0  0  0  ]
[ 24 ]   [  0  1  0  0  0  ]
[ 26 ]   [  0  0  0  1  0  ]
[ 33 ]   [  0  0  0  0  1  ]
[ 39 ]   [  0  0  0  1  0  ]
[ 43 ]   [  1  0  0  0  1  ]
[ 44 ]   [  0  0  0  1  1  ]
[ 46 ]   [  0  0  1  0  0  ]
[ 49 ]   [  1  0  1  0  0  ]
[ 57 ]   [  1  0  0  0  0  ]
[ 65 ]   [  0  1  0  0  0  ]
[ 85 ]   [  0  0  0  0  1  ]
[ 90 ]   [  1  0  0  0  0  ]
[ 95 ]   [  0  0  1  0  0  ]
[ 99 ]   [  0  0  0  0  1  ]

The “sounds-it-makes” matrix that maps animals to their SDR’s, made by adding the relevant concept SDR’s.
Cat is miaowing + purring + scratching at the door
Dog is panting + sniffing + scratching at the door

sa: matrix[sounds-it-makes]
[ 1  ] = [  1  0  ] [ cat ]
[ 8  ]   [  1  0  ] [ dog ]
[ 9  ]   [  1  1  ]
[ 13 ]   [  1  1  ]
[ 14 ]   [  0  1  ]
[ 16 ]   [  1  0  ]
[ 19 ]   [  0  1  ]
[ 23 ]   [  0  1  ]
[ 24 ]   [  0  1  ]
[ 26 ]   [  1  1  ]
[ 33 ]   [  0  1  ]
[ 39 ]   [  1  1  ]
[ 43 ]   [  1  1  ]
[ 44 ]   [  1  2  ]
[ 46 ]   [  1  0  ]
[ 49 ]   [  2  0  ]
[ 57 ]   [  1  0  ]
[ 65 ]   [  0  1  ]
[ 85 ]   [  0  1  ]
[ 90 ]   [  1  0  ]
[ 95 ]   [  1  0  ]
[ 99 ]   [  0  1  ]

Or, in my notation:
encode |purring> => |49> + |16> + |8> + |95> + |46>
encode |miaowing> => |90> + |1> + |49> + |57> + |43>
encode |scratching at the door> => |44> + |26> + |13> + |9> + |39>
encode |panting> => |23> + |14> + |19> + |65> + |24>
encode |sniffing> => |43> + |33> + |85> + |99> + |44>

sounds-it-makes |cat> => 2|49> + |16> + |8> + |95> + |46> + |90> + |1> + |57> + |43> + |44> + |26> + |13> + |9> + |39>
sounds-it-makes |dog> => |23> + |14> + |19> + |65> + |24> + |43> + |33> + |85> + |99> + 2|44> + |26> + |13> + |9> + |39>

Just a tweak to our operator to remove our input from the predicted output:

I-predict-from |*> #=> list-to-words drop-below[0.18] (similar-input[encode] sounds-it-makes similar-input[sounds-it-makes] encode |_self> + -|_self>)

sa: I-predict-from |panting>
|scratching at the door and sniffing>

sa: I-predict-from |purring>
|miaowing and scratching at the door>

sa: I-predict-from |scratching at the door>
|sniffing, miaowing, panting and purring>