Brainstorming: Modernizing HTM layer APIs


#1

As I’m working on my HTM framework, I have been using the same API design that NuPIC uses; most implementation seems to use it too. It works. But thinking about it. I’m wondering if there can a better API design.

For example the current TM/SP compue() method basically works as follows:

tm.compute(input_sdr, true); //true means learning enabled
auto predictive_cells = tm.getPredictiveCells();

Which presents the problem.

  1. The compute() function can never be marked as const. Because constness is determined at compile time, but the decision to learn is made a runtime.
  2. Despite the first point, the compute() function still can’t be marked as const. Because the method itself modifies the internal state of the TM (the currently active and predictive cells)
  3. Batch processing become impossible as the state is modified every call. Which may or may not be a problem. For online HTM services, batch processing may bring extra performance to the system. For building AGIs, it doesn’t matter.

So I purpose the more PyTorch-iy API. (Requiring C++17)

auto [predictive_cells, active_cells] = tm.compute(sdr, last_active_cells);
tm.learn(last_active_cells, active_cells)
last_active_cells = std::move(active_cells)

Under this design, the compute() function (maybe it should be renamed as inference, predict or forward?) can be marked as const; it don’t modify any internal state, rather it returns them. So the compiler can do more optimizations on it. And batch processing is possible under this design. And it simply feel more intuitive to me.

The API designs should hopefully be portable to other HTM implementations. As they fundamentally fork in the same way. Everyone benefits from this discussion. I need your feedback. Is this a good API? What do you think? Can anything be improved?
Best,


#2

I like your idea for this potential optimization. You could add this improvement without making a breaking change to the API if you think inside the compute() function. For example:

  • const runCompute(): runs logic, returns new state without alterting
  • compute(): calls runCompute(), updates state propertly

This way the user never knows and optimizers can take advantage.


#3

@marty1885 I am wondering why do not you use nupic.cpp from HTM-community? There, many new ideas for modern API are being implemented…


#4

@thanh-binh.to I started my project for fun. Not to be offensive, but I some NuPIC designs are bugling me. For example:

  1. Very inconstant API. For example, this is the compute method for SP
virtual void compute(const UInt inputVector[], bool learn, UInt activeVector[]);

And the TM.

virtual void compute(size_t activeColumnsSize, const UInt activeColumns[],
                       bool learn = true,
                       const vector<UInt> &extraActive  = {std::numeric_limits<UInt>::max()},
                       const vector<UInt> &extraWinners = {std::numeric_limits<UInt>::max()});

From my experience building HTM algorithms, I can understand why the TM is taking in indices and the SP is taking in binary arrays. And I understand why they are taking in raw pointers. But that’s annoying and inconstant. Feels like a lot of bugs and confused user down the line. This could be solve with the xtensor library (a C++ ND array library; can work in place with numpy). But will introduce a huge API change.

  1. Not really parallelizable

Most loops in NuPIC.cpp are not straight-forward for loops. Most are them are loops with extra condition or have dependencies between iteration. This makes parallelizing them very hard if not impossible.

  1. AGPL.

Why limit yourself to AGPL even for non-commercial use? Making using MIT/BSD code with NuPIC somewhat complicated. I know Numenta makes their profit this way. But NuPIC is free for non profit and you need to pay for patent for commercial use anyway.

  1. No per-allication in NuPIC.cpp

This is just a missed optimization opportunity. But NuPIC.cpp just pushes objects into a vector in a loop. Which causes the vector to copy it’s data every time it hits the storage capacity.


#5

@marty1885 i believe that except the last one, all remain problems can be solved at nupic.cpp. @dmac is working on this topic…


#6

The last one should be the easiest to handle. And pretty performance critical. It’s just there’s a lot of them in NuPIC’s codebase.
The problems are definitely solvable. And I appreciate your work on maintaining the library. It’s amazing. But I think I’ll stick with what works for me until NuPIC.cpp is ready.


#7

@marty1885: OK, thanks


#8

Hi Marty,

I agree that the nupic API is difficult to work with. I too wrote my own HTM project, because I wanted to better understand how the theory works and also because nupic is big, complex, and as you’ve pointed out “very inconstant”. For my project I came up with my own API solutions. Now I’m working to implement my solutions in the community-fork of nupic.

From my experience building HTM algorithms, I can understand why the TM is taking in indices and the SP is taking in binary arrays.

One of the new features in the community fork is a class which handles Sparse Distributed Represetations. It has a lot of cool features. Here is a link to an introduction for it https://github.com/htm-community/nupic.cpp/wiki/Sparse-Distributed-Representations Huge DISCLAIMER: this isn’t a finished product, this is a preview. And of course comments are welcome.


#9

One potential problem with your idea for batch processing is that typically the HTM uses both the const inference and the mutable learning at the same time, since the HTM is always learning. Would batch processing have learning disabled?


#10

Sorry for the late reply.
So a API change is needed. In the proposed API, compute is always const. While state changes are returned to the user. And synapse changes are updated by another function call.

auto X = get_some_sdr_in_batch();
auto predictive_cells = empty();
auto active_cells = empty();
auto last_active_cells = empty();
for(const auto& sdr : X) {
    // Cell states are returned to the user
    std::tie(predictive_cells, active_cells) = tm.compute(sdr, last_active_cells); //TM state doesn't change here
    if(learn)
        tm.learn(last_active_cells, active_cells);
    last_active_cells = predictive_cells;
}