HTM Cheat Sheet

This is a community-created and managed reference card about HTM theory and implementation for easy lookup for common information. Please add to it if you have something interesting to add.
If you see “?” and know the answer, please fill it up.

Just the numbers as rule-of-thumb in 10’s :

itemcountcapacitymisc
pattern ....10001010......(2000 choose 40)2000/40
synapseseqiv to a bit10 syns to detect pattern
dendrite segment 100 synapses10 patterns
neuron100 dendrite segments1000 patterns10000 synapses
mini-column100 neurons50 microM
    - capacity100 n * 100 segs * 10 pat100 000 patterns
macro-column?100 mini-cols500 microM
cortical-column2000 mini-cols5M transitions
cortex150 000 CC750B transitions20B neurons

Brain

  • Weighs ~3.3 lbs (1.5kg)
  • 5% of a human’s total body weight
  • it uses 20% of the body’s oxygen
  • Total number of neurons in the brain: 86 billion
  • Energy consumption: 20 W
  • 1 sq. mm. contains: 170 000 neurons

Neocortex

  • 75% of the brain
  • Total number of neurons in the cerebral cortex: 21-26 billion
  • Total number of neurons in the neuro-cortex: ??
  • Total number of connections: ??
  • Total number of synapses: 100-180 Trillion (depends on sex and age)
  • Neo-cortical sheet is about 1000 cm^2 in area and 2.5 mm thick
  • One mm^2 of cortex has about 100K neurons and 1B synapses
  • Cortex is (horizontally) sliced in 6 layers (actually, there are more, we could say it has nine layers: 1, 2, 3a, 3b, 4, 5a, 5b, 6a, 6b)

Neuron

  • It’s also referred to as “cell” in the HTM theory
  • Pyramidal neurons are the most common type of neuron in the neocortex
  • The pyramidal neuron is the core information processing element of the neocortex
  • A pyramidal neuron has dendritic segments
  • Each dendritic segment possibly has multiple synapses (or, in layman’s terms, “connections”) to other neurons
  • Synapses are the “substrate of memory”
  • Number of synapses : 5000 - 30_000
    • 10% proximal, 90% distal
    • 8-20 coactive-synapses (patterns) generate dendrite spike
  • Number of detectable patterns : ~hundreds
  • Average number of dendrite segments: ??
  • Average number of synapses per dendrite segments: ??

The following is a diagram of an HTM neuron (representing a real and biological pyramidal neuron of the neocortex)

Note: need to add apical dendrites and synapses to the cell model visualization.

Cortical Region

  • Although different regions of the neocortex process different inputs (vision, hearing, touch, language, etc.), at a fundamental level, these are all variations of the same problem, and are solved by the same neural algorithms.
  • The regions are defined by connectivity.
    • Regions pass information to each other by sending bundles of nerve fibers into the white matter just below the neocortex. The nerve fibers reenter at another neocortical region.
    • The connections between regions define a logical hierarchy.
  • Neuroanatomy tells us that every region of the neocortex has both sensory and motor functions.
    • Therefore, vision, hearing, and touch are integrated sensory-motor senses:
      • we can’t build systems that see and hear like humans do without incorporating movement of the eyes, body, and limbs.

Cortical Column

  • A cortical column is about 1.0 - 1.5 mm^2 in area and contains about 2000+ mini-columns
  • ~ 100k neurons
  • ~ 500M synapses (1 mm^2)
  • 10 cellular layers
  • It is also called hypercolumn or macrocolumn
  • In the current HTM model, 2048 minicolumns are associated with 1 spatial pooler (see below the SP section)
  • All cortical columns are learning a complete model of the world of everything they get exposed to and they are all doing it in parallel.
    • Each cortical column basically learns the same thing in parallel and votes (via layer 2 communication).

Mini-column

  • About 30-50 microns wide with 100-120 neurons across all 6 layers
  • Capacity : 100 neurons * 100 segments * 10 patterns = 100 000 patterns
    • mCol field (say 2000 mCols) :
      Capacity = (neurons * segments * patterns) / sparsity
      C = 100 000 / 0.02 = 5 Million transitions
  • ~ 150M - 200M mCol in the cortex
  • In the current HTM model, there are 32 cells per mini-column in layer 3 implementation

Encoder

  • Takes sensor inputs and converts them into SDRs
  • HTM Examples: Scalar Encoder, Random Distributed Scalar Encoder (RDSE)
  • Take stimulus from the environment and translate them into a stream of SDRs that are neural activity going to the brain.
  • An encoder takes some type of data–it could be a number, time, temperature, image, or GPS location–and turns it into a sparse distributed representation that can be digested by the HTM learning algorithms.
  • The HTM learning algorithms will work with any kind of sensory data as long as it is encoded into proper SDRs.

Biological Examples:

  • Vision: Retina
  • Hearing: Cochlea
  • Touch: Nerves

SDR

  • A data structure which represents the activity in the neocortex, which is sparse and distributed
    • It also represents the input (from the sensors, e.g. the eyes)
    • Hence, it can be thought of as “the data structure of the brain”
  • It can be implemented as a binary vector (or matrix), whose elements are either 1 (on) or 0 (off)
  • iSDR i.e. indexed SDR is different representation which can be implemented as a tuple of (size, idx1, idx2, …) where ‘idx’ tells which bits are ON. F.e. 10100 <=> (5,1,3)
  • The bits in an SDR representing an input (from a sensor) are associated with “features” of the input (i.e. some real or abstract object)
    • The 1 bits of an SDR represent the fact that the specific SDR (representing some specific object) contains the corresponding feature
    • The bits of other SDRs (e.g. the ones that the spatial pooler outputs) represent minicolumns
      • 1 (or 0) bits represent the fact that the corresponding minicolumn is active (or inactive)
  • An SDR is sparse because the percentage of 1 bits is very low compared to the percentage of 0 bits
  • SDRs have useful properties
    • High capacity (i.e., a lot of stuff can be represented)
    • Robustness to noise (because of their mathematical properties)
    • Efficient storage (as it’s sufficient to only store the “on” (or 1) bits)

Spatial Pooler

The following descriptions are, to some extent, a simplification of what the SP actually does.

  • It’s a learning algorithm
  • Its function is to identify common spatial patterns in the input
  • In the current HTM theory, it’s an algorithm that is “executed” in the L3 a/b layer
  • It receives an SDR as input and produces another SDR as output
  • It maintains a fixed sparsity:
    • the number of 1 bits in the SP output (which is an SDR) is always constant, even if the sparsity of the input SDR continuously changes
    • So it can be thought of as a “normalizer”
  • The meaning of the input SDR is maintained in the output SDR:
    • the overlapping properties of the input SDR are maintained in the output SDR
  • The bits in the output SDR of the spatial pooler represent minicolumns
    • The 1 (or 0) bits represent the active (or inactive) minicolumns
  • The minicolumn (associated with a bit of the output SDR) has a “potential pool”, which is a term used to indicate the set of bits of the input SDR which that minicolumn may be “connected” to
    • Initially, the potential pool of each minicolumn is usually randomly initialized:
      • In other words, the subset of input bits that may be connected to a specific minicolumn is, initially, randomly initialized
    • The potential pool can contain both 1 or 0 bits
  • Each of the bits of the input SDR that belongs to that potential pool has a numerical value associated with it, called “permanence value”
    • The permanence value determines if that input bit is going to be “connected” to that minicolumn or not
      • If the permanence value is greater than a “connection” threshold, then it is connected, otherwise, it is not
  • The number of connections between a minicolumn (associated with a bit of the output SDR) and the 1 (or “ON”) bits of the input SDR is called the “overlap score” (of that minicolumn for that specific input)
    • Given that the potential pool is randomly initialized, the overlap score changes from column to column
    • The overlap score also depends on the specific input SDR
    • Hence, for a specific input (an SDR), the overlap score of each SP minicolumn induces a ranking of the SP minicolumns
    • Minicolumns that have an overlap score over a certain threshold are called “active columns”, the other columns are called “inactive columns” (assuming a global inhibition area, i.e. every minicolumn is a neighbor of every other minicolumn of the SP)
  • Active columns can now “learn”
    • Inactive columns (i.e. the ones whose overlap score is not bigger than a certain threshold) do not learn
  • Active columns learn by
    • incrementing the “permanence value” of the connections to the 1 bits (of the input), and
    • decrementing the permanence value of the connections to the 0 bits (of the input).
    • This implies the formation and destruction of connections (during the learning phase).
      • The connections that are formed or destructed during the learning phase depending on several aspects:
        • The specific inputs
        • The random initialization of the SP
  • “Boosting” and “inhibition” are regulatory mechanisms (the HTM counterpart of “homeostasis”) which balance the contribution of all minicolumns

See this, this and this HTM School videos for a more fluid (but still simplified) exposition of the concepts.

Temporal Memory

  • In the current HTM theory, it’s an algorithm that is “executed” in the L3 a/b layer
  • implements variable-order markov chain

HTM Implementation Parameters

  • Num Columns (N): 2048
  • Num Cells per Column (M): 32
  • Num of active bits (w) : 41
  • Sparsity (w/N) : 2%
  • Dendritic Segment Activation Threshold (θ): 15
  • Initial Synaptic Permanence: 0.21
  • Connection Threshold for Synaptic Permanence: 0.5
  • Synaptic Permanence Increment and Decrement: +/- 0.1
  • Synaptic Permanence Decrement for Predicted Inactive Segments: 0.01
  • Maximum Number of Segments per Cell: 128
  • Maximum Number of Synapses per Segment: 128
  • Maximum Number of New Synapses Added at each Step: 32

CC algorithm

  1. motor input arrives before the sensory input and is processed by the location layer, which consists of grid cell modules. If this layer has an active location representation, it uses the motor input to shift the activity in each module, computing the sensor’s new location.

  2. this updated grid cell activity propagates to the sensory layer and causes a set of predictions in that layer.

  3. the sensory layer receives the actual sensory input. The predictions are combined with sensory input. The new activity is a union of highly sparse codes. Each sparse code represents a single sensory feature at a specific location that is consistent with the input so far.

  4. the sensory layer activity propagates to the location layer. Each module activates a union of grid cells based on the sensory representation. The location layer will contain a union of sparse location representations that are consistent with the input so far.
    After the fourth stage, the next motor action is initiated and the cycle repeats.

35 Likes

I moved this from #other-topics to #htm-hackers.

Not a bad idea… maybe we change the title of this topic to “HTM Cheat Sheet” and make it a wiki?

3 Likes

Good idea. It should specify

  • the respective name of the parameter in Nupic,
  • its estimated size in the cortex, and
  • its suggested size in the implementation, especially whether it’s absolute or relative in nature (e.g. depending on the overall region size / # of columns).

I’ve actually been working on my own HTM notes to help wrap my head around the theory. Some of it may be direct quotes from members of this forum (Jeff comes to mind). It’s still very much a WIP, but I hope some of it is of benefit to the cheat sheet! I will be updating as I go so hopefully I’ll remember to update this post, too.

I moved the rest of this content up into the main wiki - Matt

Side Question: Why are permanence values floats from 0.0 to 1.0? Why not an int8 from 0 to 100?

5 Likes

@ddigiorg This is awesome! Thanks for sharing.

  • Dendritic Segment Activation Threshold (θ): 15

I suppose this is the number of synapses per segment that need to be active (= permanence exceeds threshold) for the segment to become active, correct?

  • Initial Synaptic Permanence: 0.21

Here my understanding was that the initial permanence would be a random value chosen around the threshold. Citing the SP whitepaper:

> Prior to receiving any inputs, the code is initialized by computing a list of initial potential synapses for each column. This consists of a random set of inputs selected from the input space. Each input is represented by a synapse and assigned a random permanence value. The random permanence values are chosen with two criteria. First, the values are chosen to be in a small range around connectedPerm (the minimum permanence value at which a synapse is considered “connected”).

Above refers to proximal synapse initialization though, but I assume distal would be the same.

  • Synaptic Permanence Increment and Decrement: +/- 0.1
  • Synaptic Permanence Decrement for Predicted Inactive Segments: 0.01

Why are 2 different increments/decrements needed? Didn’t see that in the whitepaper.

  • Maximum Number of Segments per Cell: 128
  • Maximum Number of Synapses per Segment: 128

If these are the maximum values, what are the values you start with?

Also, I’d expect these values to have a relationship to the overall number of cells. So if the region has 2048*32=65k cells, the number of synapses would be defined as 0.2% of the total cell count. The reason I’m stressing this point is that in the implementation, this actually should not be a separate parameter to be set because it’s automatically derived from the column/cell count.

Side Question: Why are permanence values floats from 0.0 to 1.0? Why not an int8 from 0 to 100?

With ints you could always have 100 steps only but with floats the range is endless depending on how small you set the increment (learning rate).

1 Like

Thanks @lindmatt! I got the HTM Implementation Parameters from the Appendix of Numenta’s Continuous online sequence learning with an unsupervised neural network model.

I suppose this is the number of synapses per segment that need to be active (= permanence exceeds threshold) for the segment to become active, correct?

Yes, Dendritic Segment Activation Threshold indicates there must be above 15 active synapses above the permanence threshold.

Here my understanding was that the initial permanence would be a random value chosen around the threshold.

I agree that the Initial Synaptic Permanance being a random value chosen around the threshold. Not sure why it wouldn’t be and I think it is actually implemented that way, but I’m not certain.

Above refers to proximal synapse initialization though, but I assume distal would be the same.

The proximal synapses for each column are initialized to a random subset of input bits. However, the basal synapses for each cell are not initialized. Instead (Please correct me if I’m wrong) basal synapses start with 0 segments per cell and 0 synapses per segment and grow or die dynamically in the Temporal Memory algorithm because initializing so many basal synapses all at once takes a looooong time and uses up a lot of memory.

With ints you could always have 100 steps only but with floats the range is endless depending on how small you set the increment (learning rate).

True, but then again one could just use larger integers for a wider range, right? I thought ints are generally a faster representation (totally making this up and probably really embarrassing myself on a basic Comp Sci concept so I will have to research this) but I guess it doesn’t really matter in the end. It’s just a representation of bits after all.

EDIT: I should probably mention that everything I say should be taken with a grain of salt. I am still learning and it’s very possible I could be missing something. Please correct me if I’m wrong because I’d like to learn as much as possible!

Dave

@ddigiorg

The proximal synapses for each column are initialized to a random subset of input bits. However, the basal synapses for each cell are not initialized. Instead (Please correct me if I’m wrong) basal synapses start with 0 segments per cell and 0 synapses per segment and grow or die dynamically in the Temporal Memory algorithm because initializing so many basal synapses all at once takes a looooong time and uses up a lot of memory.

I think we’re talking about 2 different things here:

  1. Initial Permanence values – I was referring to both proximal and distal permanences being initialized with a random value around the threshold.

  2. Initial Segment + Synapse structure – you’re referring to how many segments and synapses per segment the algorithm starts with.

Of course, both are linked, i.e. if you do start from 0 segments and 0 synapses, you don’t need to initialize any permanences. :grinning:

2 Likes

Oops, you’re right. It’s midnight on a Friday night and my reading comprehension has been found wanting…

Time for beer. It kills the weaker brain cells first! :laughing:

2 Likes

Ok all, I converted @mraptor’s first post to a wiki and moved @ddigiorg’s content into it as well. It is now up to you to organize the content and plan how it should be structured. Use this thread to discuss the wiki, but treat the first post as the actual wiki content.

2 Likes

Cortical Column is equivalent to hierarchy of HTM-regions, right ??
Mini-column is equivalent to column inside HTM-region ?

1 Like

Yes that’s right. Let me be explicit just in case I’m mistaken: a Cortical Column (a “macrocolumn” or “hypercolumn” in the biological neocortex) consists of many minicolumns. The HTM that we’re used to that has a Spatial Pooler and Temporal Memory is a model of layer 3 in a Cortical Column. In a HTM Region, many Cortical Columns are connected in parallel and “vote” on the Region’s output. It’s the Regions themselves that are hierarchical.

I definitely will need to make a better graphic to reflect this a bit more explicitly. So many things to keep in mind!

3 Likes

I thought it is the other way around !

HTM-Region (SP+TM) is 2048 mini-columns and CC is hierarchy of HTM-regions.

I modified the graphic under the “Neocortex” heading. I think this is more clear.

In HTM per my understanding a Cortical Column has 2048 mini-columns. The Spatial Pooler selects active minicolumns across the entire CC. Temporal Memory replicates layer 3 neurons in the biological neocortex. The hierarchy comes from the connection of Cortical Regions.

I’d love to have someone from Numenta confirm or correct this.

2 Likes

Hey @Paul_Lamb, have you thought about updating this with more of the graphics you’ve created? You might post them here for discussion if you think they are relevant.

Sure, that’s probably a good way to discuss components of HTM and fact-check any assumptions. I’ll put some together to try and depict graphically where the HTM Implementation Parameters apply in the system.

2 Likes

Does somebody knows those numbers ?

Perhaps this paper could help:

2 Likes

3 posts were split to a new topic: Do minicolumns span layers?

9 posts were split to a new topic: Cell segments vs synapses

It should be noted and emphasized that some of these numbers (if not all) are (rough) estimates. For example, 86 billion of neurons in the neocortex is an estimate (as stated in the BAMI book).

2 Likes