Cortical.io encoder algorithm docs

Is cortical.io encoder (word -> SDR) description of the algorithm available somewhere or is it trade secret ?
thanks

3 Likes

I’d like to know too!

I’d also like to know, if it’s out there somewhere.

Yes, their whitepaper is available on arXiv.
And if you didn’t know already, you can demo their API in the browser: api.cortical.io.

2 Likes

Brilliant! Thanks @alavin !!!

The exact algorithm is AFAIK proprietary, but involves a sequence of steps which are simple uses of old ML ideas. First, the corpus of documents (imagine sections of Wikipedia pages) is formed into a bag-of-words. Then a simple TF-IDF process is applied to connect words and documents. Finally, a self-organising map (SOM) process is used to produce a 2D representation for each word. The pixel coordinates in the SOM represent documents (really documents grouped by the SOM), the intensities are how high the TF-IDF score is for each document group. This is k-sparsified using global inhibition to produce a binary map which is the Retina SDR.

This is based on what I’ve heard in public from people in cortical.io. Please correct any errors.

2 Likes

Thanks… what do I think of the following algorithm (off the top of my head, just looking for feedback).

Two Scalar Encoders, one for Word number (out of dictionary built by simply collecting all of the words from the corpus in dict, sequentially) and one for Document number.

word_sdr = words.encode(“word”)
doc_sdr = docs.encode(doc_id)
sdr = join(word_sdr, doc_sdr) #splice the two together

then use SpatialMapper/SOM (http://ifni.co/spatial_mapper.html) to train with those sdr’s.
You have your Word->SDR encoder i.e. Spatial Mapper.
(In Nupic case that will be Spatial Pooler I suppose).

PS> What about if the document ID is a 2D coord of doc-number and paragraph-number in this document ? Would this be beneficial ?
In this case you would not need TF-IDF, may be ??

The inner workings of how Cortical IO builds the retina database are a proprietary algorithm. Without seeing the goods I have to assume that Cortical IO forms a Self-Organizing-Map (SOM) with the snippets. (word pairs)

There are some guiding principles that I am using to come up with an online learning version that is compatible with HTM principles.

The mini-columns are not able to reach the entire map space with dendrites. Yes, you could do this with a toy model but we know that the brain does not do this so I will try to discover the method that it is using to do the same task. This is what we are doing here, eh?

This means that the encoder will have to distribute some factors of the perceived word over the map to be processed.

Extending this concept to how the brain does it, the word store unit (Wernicke’s area) has massive links to a grammar store unit (Broca’s area). This forms a template to parse the parts of speech.

One of the concepts I keep coming back to is the Visual Thesaurus tool. In my mind’s eye, I can see how the recall of related words would form links to context as the new words are introduced. As a long-time connectionist I can see how this would be implemented in neural networks.

It would be up to the “grammar unit” to place the activation in the correct parts of “word store” area as the word sounds are introduced.

The other bit is how to get a SOM function with Calvin Tile hex-grids.

I am thinking of an “inside-out” SOM. Inside-out from what? A normal SOM is a big map where activation is directed to some XY location in the map. The global connections are adjusted to shift the winning unit in concept space to pull it into some desired logical configuration. While the logical function is very desirable the implementation is not very biologically plausible.

The inside-out part is taking a large global pattern and tweaking the local connections to move the pattern in representation space. This pattern could cover a significant part of a single map. The cortex is all about local functions and distributed representation. What does that look like for Calvin tiles implemented with hex-grids?

tiling

Recall the basic signalling methods of a hex-grid. The left side is the “before” version, the right is the “after” version, with the red being the signal and the black being a reference grid:



Instead of adjusting map-wide global connections to form the SOM what is tweaked is the local connections that anchor the nodes of the hex-grid to particular mini-columns. This changes are distributed as learning at the local node level. The basic operation of SOM is to reward the strongest winner AND partially reward the nearby units. In this case, the nodes are physically nearby, but in the special sense of the phase spaces of the nodes defined by this signaling protocol; the training rewards are distributed to the mini-columns that are part of the resonating hex-grid and the nearest neighbors to these nodes.The net effect is to add the columns that match the spacing/phase/angle of the imposed hex-grid pattern. Other stored patterns in the sea of mini-columns would not be affected.

3mmarray

The reason I say phase space is that this is a repeating pattern distributed over some larger areas. The larger pattern is built out of the cooperative efforts of the individual lateral connections in each sub-cell of the grid.

So in practice, the grammar unit steers activation to this part of the map and the responding units are sequentially activated by context words and then learn the new target word. The learned items are distributed as parts of speech (action of the grammar unit) and sequential proximity in the corpus of text.

It goes without saying that the grammar unit is trained in parallel with the word store unit. I am thinking that the concepts discussed in the “Dad’s song” group apply here. I expect that high-frequency “sugar words” heavily influence the context in the grammar unit.

There are many details to be worked out but this is the basic concept I am chasing.

5 Likes