Documentation on Dimensions, Links, LinkPolicy and splitter maps?


#1

I am working on nupic.cpp in the HTM Community Github, the community version of the nupic.core. I am adding an SPRegion class implemented in C++. In the process of writing the unit tests I find that the links are not working quite like I would expect in some of the corner cases. The problem is that I don’t know how they SHOULD work. There is one short paragraph about links in the Network API Guide but not enough to be useful. The source code and unit tests are not very helpful in understanding how they work either without a lot of reverse engineering.

Question: Is there another document someplace that talks about Links and how dimensions are set across a link. How the LinkPolicy should work and the capability of splitter maps? Or at least something that describes what features should exist in links.


#2

To complicate things, the Network API Guide contains this paragraph:

Note: in the current Region model, where a Region is a collection of nodes, a Link performs the node-level mapping between Output and Input. Specifically, it can generate a splitter map, or directly generate the input for a specific destination node. This functionality is not described here because it may be removed. A Link contains a LinkPolicy, also not described here, which is responsible for generating the splitter map.

Question 2: Should I simplify Links as part of this code effort and do that removal? Should links be just simple one-to-one output to input connections? What about automatic data type conversions?


#3

@David_Keeney please show us your code for unit test, so that I can better know how your link policy is used.


#4

My repository is a bit of a mess at the moment, but here is what I have


This is a CPP only (without python or C# interface) compiling on Visual Studio 2017.


#5

Sorry the docs in this area are not good. @breznak might be able to help when he comes back. I honestly don’t use NuPIC at that level so I don’t know.


#6

Thanks for your codes. Could you pls show me which line In the unit test it failed?
From my experience there is no automatically data type conversion over Network API so that the data type of the output must be identical to the data type of input of a node it connects to. As C or C++ programmer sometimes we do not care by assigning a float variable to the double one. But it is not allowed with network API C++.
Maybe you check data type compatibility firstly…


#7

@David_Keeney sorry i did not find your SPRegion implementation in the repository! If you need I can send you my implementation.


#8

My implementation for SPRegion is there… it is at nupic.core/src/nupic/regions/SPRegion.cpp
The unit test is at nupic.core/src/test/SPRegionTest.cpp

The only unit test that is currently failing is TEST(SPRegionTest, testSerialization) but that is because I have not completed the debugging of the YAML serialization.

I made the Link tests work but to do so I had to guess as to how the Links SHOULD work. SPRegion is using singleNodeOnly = true; in the Spec so it appears that implies that the dimensions should be set to a 1 which implies it is a wildcard. With a wildcard the actual Input buffer’s space size comes from the Output of the source region. What I don’t know is what should happen where the far end is something that uses dimensions of various types. What should happen if the link type specification is something other than UniformLink? And what should happen if dimensions are specified for the link. With a wildcard dimension could someone feed my SPRegion with more than one link to my input? The point is that if there were some documentation someplace I would not have to guess as to what the design intention was. If there isn’t then I will have to spend a little more time studying the code and try to get the best guess as to how it was intended to work.

For one test I hooked up the VectorFileSensor and VectorFileEffector and got those to work. Another test I used a ScalarSensor to feed data to my SPRegion. I can make them work but I would rather not have to guess as to how it should work.


#9

As for the automatic conversions…
I did not see any automatic data type conversions in the links however in the Python code those conversions are hidden in the way numpy handles arrays. (from sp_region.py)
inputVector = numpy.array(rfInput[0]).astype(‘uint32’)
outputVector = numpy.zeros(self._sfdr.getNumColumns()).astype(‘uint32’)

The SP algorithm uses a native input of UInt32 values. The Spec for the SPRegion says the input is Real32 so there must be a conversion someplace. I am currently doing it in the SPRegion::compute() just like the Python code does. Since the Link also does a copy, that means the input is copied twice before it reaches the SP algorithm.

What I was thinking is that if the link’s copy could also do type conversion then we could specify the Link type be UInt32 on the SPRegion side and Real32 on the encoder side and the conversion could happen as the Link makes its copy from the source Output buffer to the Destination Input buffer.

Just thinking.


#10

My current focus:

As soon as I wrap up the SPRegion class I will start in on the TMRegion class to match that found in the Python code. Then I will do the same for the encoders and the classifier. The intention is to provide the modules needed so that someone could write some apps using C++ (no Python) using the Network API.

With these in place I could then finish the C# wrappers so that apps could be written in C# using the core library. Christian(@chhenning) has the Python 3.0 interface partly finished.

We removed the Capt Proto serialization and replaced it with YAML to simplify the code. We are using boost and std libraries to clean up a lot of the code…so just about everything is getting touched.
The intention is that this will all compile under Windows (Visual Studio 2017) as well as Linux.

Hopefully we will have something useful that we can check into the Community Github before too long. If anyone would like to help or has suggestions let us know.


#11

Nobody pointed out documentation that I missed so I wrote up some text describing how I guessed that it worked after studying the code. It have concluded that the complicated stuff (the Link Policy and Dimensions) are all obsolete. I really hope I have it right.

Please review what I have here and unless I hear otherwise this is how the C++ version of SPRegion and TMRegion classes will work in the Community version of nupic_cpp.

Link
A Link is logically a connection from a source region’s Output to a destination region’s Input. Your application is created by declaring the functional building block (the regions), giving those regions the parameters that dictate their behavior, and then connecting them up with links that determine how data (normally SDR’s) will flow between your regions.

Each region implementation maintains a Spec structure which, along with the parameter definitions, defines each input data flow it expects and each data output it will generate. Each Output node and each Input node are the ends of a Link and each represent a buffer of data.

Some destination Inputs declared in the Spec are optional. But at runtime, those nodes specified as required must have a Link generated for them before the network’s initialization() function is called. It is at initialization time that the links are analyzed to determine if everything is consistent, determines buffer sizes and type, allocates the input and output buffers, creates the linking pointers, and determines the execution phases. The phases determine the order in which the Network will execute the regions such that the data smoothly flows from source to destination along the line of execution.

Each time the network performs a run cycle, it will do the following:

  1. Copy one buffer full of data from the Output node on the source to the Input buffer on the destination region for each link feeding that region.
  2. Execute the region.
  3. Then repeat for the next region until all regions have been executed in the order of the phases.

Declaration of Outputs and Inputs: In the Spec, each Output and Input definition has a unique name which identifies the node within the region. So a node is then uniquely identified by the Output/Input node name and the name of the region on which it resides. A Link is then created by calling Network.link() with the names of the source and destination regions and the names of the Output and Input nodes (along with LinkType and LinkParams). Note that the Output node or Input node names may be omitted from the call if they are the ones configured as the Default Output or Default Input respectively for their regions.

Data Type: The definitions in the Spec for each Output and Input will declare the data type that is expected or generated. The data types MUST be the same on both ends of a link so they are normally set to NTA_BasicType_Real32 which is somewhat universal. Often the data is actually an SDR which is just 1’s and 0’s but they are converted to the specified type to traverse the link.

Dimensions: The dimensions feature provided in the HTM engine was a means of defining a multidimensional array structure of buffers for Links. This feature is obsolete. Most region implementations set the field singleNodeOnly=true at the top of the Spec. This implies a fixed dimension of [1]. This means that an output has only one buffer and it is connected to a single input buffer with one-to-one mapping.
If a region still requires a dimension, specify a single dimension of [1].

Source Output Buffer width: The buffer width for any Output of any region can be defined in the corresponding Output definition in the Spec of the source region’s implementation (the ‘count’ field). However if this is 0 (and it most often is) then check the other end of the link, check the ‘count’ field of the Spec Input definition of the destination region’s implementation. If that is also 0 (and it most often is) then the width is determined by asking the source region’s implementation what it intends to generate by calling the required function getNodeOutputElementCount(). If this should return a 0, then the buffer width cannot be determined and an error is generated.

Destination Input Buffer width: The input buffer width can also be set in the destination region’s Input Spec using the ‘count’ field. If the ‘count’ is 0 (and it most often is) then it uses the width of the source output buffer to which it is connected.

Normally the links are configured such that an output from the source region is fed directly into the input of a destination region. However, an Output may be connected to multiple Inputs and an Input may fed by multiple Outputs. This is determined by the Links that are created.

If multiple source Outputs are fed into a single destination Input, the Input buffer is essentially a concatenation of data from all connected source output buffers, concatenated in the order that the links were created. The link keeps track of its offset where it should put its contribution within the destination input buffer. The destination region just sees this wide buffer for its input.

Link Delays: Or Propagation Delay is a feature which delays sending output buffers of data over the link for a specified number of time steps (number of network run iterations). At each time step the output is pushed into the bottom of the queue and the buffer at the top of the queue is sent. The delay buffers, if any, are initially populated with 0’s. This feature is specified by setting the ‘propagationDelay’ parameter of the Spatial Pooler.

LinkType and LinkPolicy: Originally the LinkType and LinkPolicy arguments to the network::Link() function controlled how a structure of buffers were distributed among mutiple Output and multiple Input nodes over a single link between two regions. Now that each region has only one Input node and one Output node per link this no longer is used. So for all links use a link type of “UniformLink” and a link policy of “” which means just copy the source Output buffer to the destination Input buffer. Things like TestFanIn2 link policy, Splitter Maps and Region Level vs Node Level are all obsolete.


#12

Very thorough write up! This code isn’t modified too often these days and as you have discovered, some of the functionality is not currently used and may not even work. The one recent change was the addition of link/propagation delays.

Overall that’s a very nice write up. Perhaps you can find a place in the repo for this documentation? Or work with @rhyolight to get it into the docs that live at http://nupic.docs.numenta.org/


#13

Yeah, where should this stuff be in http://nupic.docs.numenta.org/stable/api/network/index.html ?


#14

Maybe somewhere in here:
http://nupic.docs.numenta.org/stable/guides/network.html


#15

Please review: