Preliminary details about new theory work on sensory-motor inference

I did a rough back of envelope calculation a few weeks ago, it suggested that a CC L4 could learn 100,000 Location/Feature pairs. This is limited but not teeny. It could for example represent 1,000 objects with knowledge 100 locations on each object. In a test system we should be able to show the learning of complete objects in a single CC, in a biological system it would almost certainly require some sort feature decomposition in a hierarchy.

You can get these googles, we borrowed one for a few days at the Redwood Neuroscience Institute. You started noticing improvements in your performance within minutes. I have always assumed that your ability to learn the new reality was just a function of continuous learning, nothing special.[quote=“BrainConstellation, post:46, topic:697”]
my intuition tells me that while all cortical regions have universal properties at all levels of the hierarchy, the representations (SDRs) that emerge at much higher levels of the hierarchy are for very different semantic objects and concepts than SDRs at the primary levels. In other words; all cortical regions at all levels of the hierachy in the neocortex have universal functions. But the semantic content being represented in those regions differs very greatly, depending on the level in the hierarchy. And considering that we also know that the synapses of distal dentrites have varying degrees of permanence, then it does not seem all that unlikely, that everytime we learn something new from our new experiences, that we may be adding entire new levels to certain hierarchical regions, perhaps by reassigning (rededicating) certain less used regions. Do you think these ideas are plausible?
[/quote]

I also assume that the SDRs at different levels of the hierarchy represent different things. And neurons are always forming new synapses and deleting others. But the empirical evidence strongly says that the arrangement of regions in the hierarchy is not changing. All regions are being used all the time but what the regions represent surely changes with every experience. The one exception I know of is, I once read a paper about people born blind, no input to the cortex from the eyes. What would normally be V1, V2, and V4 learned to represent touch and audio. However, the surprising thing was that the hierarchical connections between these regions became reversed such at V1 was hierarchically higher than v2, etc. My guess is that this sort of whole scale rewiring can only occur during critical development periods right after birth and not in an adult. There are examples of adults whose lifelong blindness was reversed via surgery. They never learn to see properly. A fascinating book on this is called “Crashing through” by Robert Kurson.

4 Likes

Hi Jeff,
Thank you very dearly for your very helpful response. In order to understand the neocortex it is indeed very important to understand the empirical evidence on which structures have higher permenance or even life-long permenance. Based on your response, I seems to be, that the large-scale hierarchies are all formed in very early childhood, some perhaps already in the embrionic development. This would also help explain why learning a second or third language, never quite reaches the level of performance of a native language learned in the first 7 years of life. I myself, speak English and Spanish as native languages and German, I learned in College. I use German the most on a daily basis, because I live and work in Germany. While my abilities and fluency in German are almost as good as native speakers, there are always little, hard-wired, quirks, sometimes just in a given transition of phonemes or in the usage of a particular noun. Interestingly, if I focus all my attention on the one flaw, I can train myself to fully elliminate it. We humans are great imitators, and since I also have two native languages as a reference base, that enriches my pool of phonemes and grammars. I am fascinated by the plasticity (formability) we are endowed with, and yet the fixed structures we also have. Thank you very much for recommending the book “Crashing Through” from Robert Kurson, I will make sure to read this. Regarding my language use, I find it interesting that focussed attention is enough to overcome any single flaw. (Getting them all right is another thing. Life is just not long enough, or the motivation also has its limits. :slight_smile: With my current German, it is interesting that Germans ask me whether I am from Holland. So I take this as a great compliment, since Dutch is very close to German).
I am also very positively surprised to learn that your researchers at Redwood Institute also tested those inverse-sight gogles. That was one experiment that never left my mind since High School, and I am now 50. On a more personal level, I wanted to share an interesting anecdote. Back in the 1970’s and 1980’s I attended American Schools in Spain, which belonged to the Department of Defense (DODDS). My high school was inside a US Air Force base (Torrejon, near Madrid) and my Mother taught elementary school on the same base. One day, while waiting to get a ride with her after sports, I went into the Newspaper and Magazine store called “Stars and Stripes” and saw the issue of Scientific American with the special on the brain and neuroscience. I dug in my pocket for all the cents and dollars I had and bought it, which was a rare exception since I was subscribed to Discover Magazine. This issue, which you mentioned in several talks, also inspired me and was probably what turned this subject into a lifelong pursuit. Even when I studied Computer Science at the University of Connecticut, I had a special meeting with the Dean of Engineering and discussed getting an individualized major in AI, because in CS they only had expert systems approaches and I insisted on a more biological and linguistic approach. But she succeeded in scaring me away from that path, with threats of having to get only straight A’s and never stopping till I reach a PhD. I regret every day, that I did not take that path. Now in Germany working for an automotive giant in normal IT for years, I have been offered to join a new lab for research in machine learning. I have opened a discussion on the paths toward machine intelligence and am advocating the HTM concept, while educating on the pitfalls of the other mathematical approaches. My transition to the lab has not yet taken place. It is still pending some approvals for I need someone to replace me. I dearly hope, that I will get a chance to dedicated my work toward this goal. Joe

2 Likes

I believe that this thread has primarily concentrated on sensory input and the interpretation of that input into the neocortex but little on how to “transform” that into an action. I find through my own work (Connectomic AI) that sensory input to cortex to be a much simpler problem than motor output. When Numenta discussed transforms in the Office Meeting video, it took me to the same linear algebra I use in programming quadcopters. This could be very useful but from my work, the problem of managing cortex output to a motor action is not trivial. In brief, and for what its worth, I have found using nervous system emulations and robotics, there are a number of things that need to be considered. To name a few:

  1. There is a right/left/dorsal/ventral component in sensory input and motor output that is built into animal nervous systems. There are “rules” that seem to govern the interconnections between these regional aspects. The symmetrical regions play a significant role in how we process our world and how we react.
  2. Just as sensory input is a reduction (limited number of peripephral/sensory nerves) that expands into the neocortex (a much more vast number), the neocortex reduces into motor neuron/muscle output. There are billions of neurons in our brain but only 640 skeletal muscles.
  3. There is a temporal output (muscle) sequence for given types of input. In the connectome, this is pretty much wired so the network temporal dynamics can determine how we move our muscles. Deciphering this from a connectomic point of view is a true nightmare. There are several muscle movements that are the same at some point in a given action but we use those same muscle movements for different results. As an example, picking up a cup of coffee to take a sip could be similar to the same movement to pick up a marker to draw on the board.

For my neurorobotics/neuroapplications work, I have found that there are patterns of motor neuron to muscle output that I can capture in order to “decipher” and control the actions of what I want to accomplish. Even with a small set of muscles, I see distinct temporal patterns given specific sensory input (usually sensory input is in pairs, left/right or like SPL0001/SPR0001). I capture the resulting temporal sequences and then use those patterns to determine motor control. For example, one can get a pattern of MUSL001, MUSL002, MUSR001, MUSL002, … that has a completely different meaning than MUSL001, MUSL002, MUSL003, MUSL002, …; i.e. just subtle sequence differences can be the difference between lifting a cup of coffee or lifting a pen to write on the board. This is a tedious process but allows me to control a robot or app by keeping “built-in” control features in place and be able to override those features to obtain primary goals. As an example, I can put a prospecting robot in the desert to wander around with a metal detector attached to it and the connectomic structure can allow the robot to roam without getting stuck or falling off a cliff but I can override the roaming behavior when the detector finds a precious metal and I want to mark the GPS coordinates.

I feel that using the cortex alone as a means to determine output will not yield anything very interesting and you need other structures in place to create a true sensory input to motor output system; i,e, you have encoders and now you need decoders but unlike sensory input, you either need one big decoder to handle all actions or a series of decoders that can handle simultaneous actions and most likely, a hierarchical array of decoders. I am following your efforts with great interest because it is a real problem and one I have been struggling with for the last few months. Although I have found a solution, I’m not that pleased with it from a biological POV.

Thanks,

Tim

4 Likes

I have a few questions:

  1. Could the “what” pathway include information about movement relative to the sensors, or should frame of reference have no effect on the representation?
  2. What sort of problems were there before the new ideas? I’m wondering if the conversion to object space needs to be before SP, TM, or TP.
  3. If I’m correct, LGN core-type cells only project to V1. Does that mean matrix cells are involved in the “where” pathway, V1 is responsible for both pathways, one pathway is higher in the hierarchy, or something else?

8 posts were split to a new topic: Synaptic Pruning

I just wanted to add the comment, that I have now just finished reading “Crashing Through” from Robert Kurson. Thank you very much for the recommendation, the book is very informative as well as emotional. Kurson does a great job explaining this fascinating case study of long-term vision deprivation as well as delivering a very touching real-life account of a very exceptional person. I enjoyed it both for its science as well as for the very emotional biographical account.

Regarding neurobiological understanding of vision, I am currently reading a paper called: The Visual Neuroscience of Robotic Grasping, by E. Chilnellato and A.P. del Pobil, Cognitive Systems Monograms 28, DOI 10.1007/978-3-319-20303-4_2, Springer International Publishing, Switzerland, cc 2016. This is very highly recommended reading for NuPIC collaborators. Specially Chapter 2: The Neuroscience of Action and Perception. In this paper you get all the theoretical background on the two main visual information pathways the ventral stream and the dorsal stream. It turns out that Ungerleider, Mishkin and Goodale and Milner, (1982) detailed the structures and proposed that the ventral stream was labelled the “what” stream and the dorsal the “where/how” visual pathway. This corresponds very closely to this latest discussion thread. Check this great document out if you are into the neuroscience. Joe

5 Likes

@BrainConstellation Thank you for that pointer - it does appear to be relevant to our current work. Looks like it was a PhD thesis and the PDF is available on the web. The problem of grasping objects reliably given visual cues requires a number of reference frame transformations that are analogous to the ones Jeff has discussed. The thesis contains a nice review of relevant neuroscience literature.

4 Likes

Hi Subutai, Thanks for your feedback. I can indeed confirm that the document with the PhD work you linked above, does contain the exact same material from my publication, which I had referenced above. It is very interesting in-depth reading, in my opinion. Not being a neuroscientist, this reading takes me much longer to grasp (no pun intended). I would be very, very interested to read your opinion as to how the newest HTM research fits in and possibly compliments this publication with its collection of sources on sensory-motor research, like Rizzolatti and Luppino. Here is a direct link: http://old.unipr.it/arpa/mirror/pubs/pdffiles/Luppino-Rizzolatti%202000.pdf It is my impression that, while this research thesis does provide a very comprehensive coverage of the regional specialization in the cortex, mostly based on fMRI and TAM explorations, it does not go as deep into the columnar activity patterns in focus in HTM research. But it is my impression that it may provide some good tips on selecting specific cortical targets allowing for HTM researchers to focus more precisely on key regions, for specific sensory-motor transformations. I certainly understand that this type of analysis would take a good deal of time. I would be very interested if, some day, we can read some comments on this. Joe
PS: Interesting that the PhD student is from Castellón, Spain. I am now living in Germany, but I have lived in Spain in the early years of my life and speak the language as a native speaker as well.

2 Likes

Just one question: Would you reject (or discard) the two streams hypothesis (Goodale & Milner, 1992), postulating that visual information in the brain is processed along two parallel pathways, based on your current empirical evidence? Just curious to know what your stand on this is. Thanks in advance.

Joe

2 Likes

@BrainConstellation Thanks for that reference. I do think it is pretty relevant. Another, more recent paper I read by Rizzolatti et al is [1]. Graziano’s work [2] is also very appropriate. One of the properties these studies show is that reference frame transformations must be going on somewhere between levels of the motor hierarchy.

For example cells in M1/F1 code for movements that are very local, such as motion for a particular finger joint in a particular direction. But in area F4, cell firing is related to reaching a target location in global body coordinates. For example, regardless of where your hand happens to be, cell firing in F4 might correspond to moving your hand to the front of your mouth. F4 might be specific to a body part, such as your right hand. Cells in F5 however can be independent of body part and might correspond to either hand going up to your mouth, or even lowering your head until your mouth is in front of your hand. In order to actually reach a goal such as put food in our mouth, you have to be able to transform high level intentions in body centered reference frames down to individual joint movements. If you want to understand a movement, you have to go the other way. This is all very consistent with our recent work where a cortical region represents multiple coordinate frames, and must contain mechanisms for transforming between such coordinate frames.

The general work on what/where (dorsal/ventral) pathways is also very relevant. As I see it, the what pathway represents objects and object centered information, whereas the where pathway represents ego-motion and body centered information. Separating out the two solves a bunch of scaling problems (see Jeff’s video at the top of this thread) but it also requires that you have mechanisms for transforming back and forth between the various frames. The latter is a difficult problem that we are currently working on.

–Subutai

[1] G. Rizzolatti, L. Cattaneo, M. Fabbri-Destro, S. Rozzi, Cortical mechanisms underlying the organization of goal-directed actions and mirror neuron-based action understanding, Physiol. Rev. 94 (2014) 655–706. doi:10.1152/physrev.00009.2013. http://physrev.physiology.org/content/94/2/655.long

[2] M.S.A. Graziano, Ethological Action Maps: A Paradigm Shift for the Motor Cortex, Trends Cogn. Sci. (2015). doi:10.1016/j.tics.2015.10.008. https://www.princeton.edu/~graziano/graziano_2015.pdf

6 Likes

3 posts were split to a new topic: Are there specifically mapped motor areas in M1

Thank you, so much for that explanation! It makes much more sense, after reading your comments on how the two pathways tie in with the HTM-laminar focus on layers 4 and 5. The only part that is difficult to reconciliate between these to levels of abstraction is the fact that the two-stream theory is proposing a regional specialization, while the HTM-Laminar neuro-computational premise is that sensory-motor inference and motor commands are universal in all parts of the cortex. However, based on your answer, I can see that despite the universality of the laminar functions of the cortex, there can still be a split in the input paths (afferent nerve-bundles of axons) which then force the different regions to specialize on the different tasks (the What-task and the Where task). The confusiing, or just challenging part, to reconcile here, is that in Jeff’s explanations above, at least with my first interpretation, most of the “what” and the “where” transformations are being accounted for within the laminar structure (layers 4 and 5) of a single CC mini-region. This insinuated to me, that every single CC mini-region is already doing both the “what” and “where” processing and then we get this two-stream theory with the dorsal and ventral streams, each specializing at a macro-level of the cortex, which seems to suggest a hierarchical level of specialization in the separation of the “what” and the “where” processing. I probably missed some very important cues in Jeff’s video which I will re-watch very soon. I learned a lot from each pass, because it is so full of information. But I take your words, “Separating the two (streams) solves a bunch of scaling problems” to also imply, that this separation of the streams, does fit in with the laminar functions of layers 4 and 5, which are universal in all parts of the cortex, but are probably processing different “content” (the what and where inputs) depending on the macro-region they are in. As you mention, the challenge then is finding the mechanisms for transforming back and forth between the various frames. Perhaps this is solved with some specific hierarchical organization. Then we should find some confluent regions in which the two are merged.

I will read up on your suggestions in the links and re-watch Jeff’s video to close my gaps in understanding this paradigm. Thanks for your pointers and comments.

Joe

2 Likes

I see your confusion. I think there are two separate issues and it might be worth spelling it out. The first is that each stream has its own hierarchy. The what pathway has a well documented hierarchy of increasing levels of abstractions for objects. The examples I gave for the motor cortex could be seen as examples of increasing levels of abstractions for body motions.

The second issue has to do with interactions between the two streams and the operations performed within every region and every level of the hierarchy. In the what pathway converting body centered information into object-centered information allows a region (independent of level) to make accurate predictions with a lot less training. Imagine recognizing a coffee cup by touching it with your fingers. If you did this conversion your object representation can be independent of whether your finger is pointed down, up, or sideways. You don’t need to touch the cup in every finger configuration to form an accurate model, even though your sensations are actually completely different in different finger configurations. This is a huge win from a scaling standpoint.

This second issue is more fine grained. It’s the focus of Jeff’s video and our current research. The hypothesis is that this conversion has to occur in every region, regardless of location within the hierarchy. The conversion must be happening in both what and where pathways.

Hopefully this helps clear up the confusion. I am still struggling with how to explain these concepts more clearly. :smile:

4 Likes

A post was merged into an existing topic: Are there specifically mapped motor areas in M1

Thanks for spelling out the separation of the two issues, we are discussing. I can now see how they both tie in to the object-centered transformations, while the two main pathways (ventral and dorsal) each have their separate tasks and goals.

I can imagine that the object-centered transformations require some involvement of object-memory. Object recognition can only take placed for objects that have previously been learned.

I am very excited about the new progress being made with sensorimotor inference! I’ve watched the Office hour and Jeff’s latest whiteboard talk a few times now to better understand it. I’ve been working on my own application of HTM for a couple years that I can’t wait to update with an implementation of ‘real’ sensorimotor theory. Recently I’ve been trying to model L4 sensorimotor temporal context by simply connecting inputs of motor command SDRs to distal dendrites of the L2/3 pyramidal cells, which I think is a naïve approach that Numenta attempted at one point (the idea came from old discussions on nupic-theory). In my application, motor commands consist of encoded representations of the function API of ELF executables. It’s still a work in progress. Trying to abstract this proprioceptive/somatic transformation to my non-biological application is mindbending.

1 Like

Or those found to be close enough by an implied category or analogy?

.[quote=“cogmission, post:75, topic:697”]
Or those found to be close enough by an implied category or analogy?
[/quote]
Yes, I would fully agree with that statement, since SDRs that overlap (even slightly) need to be closely related. In my opinion it is the hierarchical level that determines whether it is the object category that matches or a more specific individual object. But, I guess that in some specific regions, it may be the amount of overlap, that determines whether it is only the general category being recognized or a narrower identification which is taking place

One of the difficulties I am seeing in trying to test my own theories is the fact that when position is encoded with feature, you tend to see a lot of overlap due to the cells representing the same position on different objects.

Now of course “position” can be encoded in the active columns, and “context” unique to different objects can be encoded in the active cells within those columns, but the difficulty comes when the position columns burst or have multiple predictive cells, and you end up encoding significant numbers of the same cells into multiple different object representations.

This is actually a similar problem I am seeing with sequence memory (which I mentioned on another thread), so guessing there is probably a common solution for both cases.