RE: Tiiadic Memory: why's everybody so exited?

Respectfully: a meaningful thread cannot exceed a couple of dozen messages.
Now, to memorize a sequence as an SDR a net has to memorize a specific (optimized or not - like bigger or smaller) set of overlapping NOT substrings but subsequences.

Example: abcdefgh is perfectly encoded by abcd, cdef, efgh. Only original sequence will “excite” (hit) those substrings (simpler then subsequences). Given the “sensor” size is 4 and! {abcd} stays excited for 8 more ticks (cder for 4) - moving a sequence against the sensor (that is what sustained spike trains do).

Memorizing a sequence as a set of subsequences is effectively converts input from time domain into frequency domain - if we kind enough to memorize number of occurrences of particular subsequences.

That’s all: I’m experimenting with that conversion for a long time. Creating efficient stochastic machines like LLMs (sentiment, generation, classification) by collecting subsystems and their frequencies .

The key difference from Transformers is that all the nodes/entries of such hierarchical dictionaries associated with particular multimodal patterns. Transformers’ nodes are not. They stay stochastic machines forever. Hierarchical dictionaries as plastic layered networks with labelled nodes enable different kinds of nodes manipulations (thinking as you will). Explain lateral connections and being explainable themselves.

Easy to implement POC, hard to implement an efficient hierarchical dictionaries. Massive creation/deletions of nodes/connections asks for efficient RAM management. Conventional hashtables, trees or sorted arrays are not good enough. So, Python is not an answer, there is a need for something like Garbage Collector, so I do Java.

Even simple/POC implementation memorizes 100k-1m of integer sequences in 100m -10b frequencies space, effectively compares sets/vectors of up to 1m integer components (as opposed to hyperdimencional 20k binary vectors) , implements SDR-associative memory and very much neuro-feasible. Solves [stochastic] MLCS problem on multiple sources (not all MLCS, but many].

I can sure go on. Can explain how intuition/emotions can be modeled by those hierarchical dictionaries.
The question I’m trying to deal now is “what’s next?”

One (some) out-of-system individual(s) creates nice AGI theory and POC. Curiosity is satisfied. There are no single scenario where such a “group of dedicated guppies” can benefit from trying to make their discovery public. An individual vs system problem. The system (academia, investors, companies) will feel disgraced and fight back. Poor individuals :frowning:

Anyways, use sets of subsequences, count frequencies, inflate when recur deflate with time. That’s easy.

3 Likes

I know its out of the thread you posted… but i want to know "what kind of filter does the brain use to process information? Only the reduced information goes into the brain core " (i suppose). And give me any methods on how to create favourable plasticity in a model.

1 Like

Consider binary sequence and 3 unit sensor. There are 2^3 - 1 = 7 possible input pattern the sensor can pass to next levels {001,…, 111}. You may represent those patterns as a directed graph with 3 and 7 vertices in two levels. You might also represent that graph as a dictionary with 7 entries with keys {001, …,111}.

What values those dictionary entries contain is up to you: could be as little as patterns frequencies (unsupervised), frequencies associated with particular class (supervised) or levels of reward(s), be the reward a scalar or a vector (well, reinforced paradigm).

I omit neuro-feasibility for simplicity, but it is there.

Now, if you work with text the number of combinations is 26^3 -1 (in raw text it is ~100^3 patterns-nodes-entries). If you work with text at words level it is ~300,000^3 (skip-3-grams). And the sensor of size 3 will not produce meaningful dictionary(net). On the character level the [stochastic] meaning starts emerge on at least size 7 sensor: 100^7 search space. The best for a personal computer is 11-12 though - best classification or generation. At size 13 a computer choke - combinatorial explosion catches up.

So, you start active “pruning” to keep you dictionary-net under RAM constrains. You may build hierarchical dictionaries (add layers to your net) and/or keep your net on external memory. I’ve worked with 18 characters long patterns (not substrings! mind wildcards) on 24GB PC.

The short motto is “do not filter anything when percepting, filter when you have an idea what is important and what is not”. I’ve seen your random-connection-principle-based architecture: you might try to do all-connect-firstly-then-discard. Combinatorics is manageable though not trivial.

Hope that helps.

2 Likes

By using dictionaries and graphs with vectors, is it possible to retrieve 50% of information that is somewhat related to the input information (by providing only 50% to the model). In short, is 50% of input can retrieve 100% of output where the 50% is somewhat related to the Input information and other 50% is totally related to the input information.
How do we make the model conscious of the information it is receiving?
Conscious is the sense of differentiability, grouping, eliminates duplication sort of…

1 Like

That’s right, unfortunately it got too long.
Since I just received a rewarding badge for over 50 clicks on that thread, here-s why I think it happened.

  • Peter Overmann ( @POv here) bothered to write a relatively comprehensive paper about that idea and was a coincidence I worked on a related algorithm to understand what it does - I usually I have a hard time understanding most code or papers.
  • The concept behind it is quite simple, easy to code and understand. That’s why quite a few folks here found time and disposition to replicate it in several programming languages. Having a first python example in ~150 lines helped.
  • The algorithm works (within its limits) and has a relatively good tolerance to noise/errors.

Regarding your performance/scalability issue which - without understanding your idea - I get that it is related to hash tables and garbage collection.

  • a preliminary answer that might help in figuring out a solution is whether limitation stems from memory bandwidth, memory latency, compute or simply space limits. Probably a combination of (hopefully no more than) two of the above. One of these four factors is probably dominant.
  • hash tables are designed to be precise and not forget. An unused (or least used) element you want to get rid of must be first identified then explicitly deleted, which is packed with significant overhead.
  • that’s why I think they-re multi thread sensitive. Special care must be taken (locks? pipelines?) if you want to use an ordinary hash table simultaneously by several threads. That impacts performance for both compute and bandwidth

So, do you need your dictionary to be perfect, aka error intolerant? What if instead of garbage collecting you simply write assuming the risk of occasional, but statistically insignificant error. E.G. if two features one “significant” and one “insignificant” collide on the same memory address…

  • if “significant” means “repeating often” and “insignificant” means “once or twice” the chances one reads the right feature could be sufficient.
  • if “intelligence” needs sampling billions of locations then it has to be tolerant to certain level of confusions, forgetting or noise. Think of people seeing red in nines and blue in two-s - its a clear unintended overlap but rare and at a tolerable low level
1 Like