Intelligence is embarrassingly simple

Gentlemen (and Ladies),
I’m not gonna flabbergast you here, this is my last message - I promise.

Imagine a flexible LED matrix - like in a TV - but elastic. Instead of diodes
it is made of neurons. Lets say the left side is where input sensors are.
The next rows to right is where activations go. Those neurons’ rows dedicate
a new node for every new patterns it observes. Like “abc” input produces
2^3 patterns. Sounds awfully many? it does - it expands as C(N, M), faster then exponential.

But do not fear :-), after a few rows it stops growing and starts shrinking, just
like C(N, M) suggests. N is a sensor(s) size, M is the size of a combination.
The LED matrix takes a diamond shape with simple [multimodal] patterns in the West and
complex patterns in the East.

Human Language defines around a few thousand labeled Eastern patterns - that’s
the size of a diamond approximation.

That describes “unsupervised” patterns learning. Just collect frequencies.

Obviously, growing the “LED” matrix is continual [Hebbian] process -
get new input - create new nodes. Get recurrent input - increase synaptic
strength.

Now, each pattern represented by a neuron could be associated with classes
(or rewards) relations, think additional specific synapses.

By design, every next input modifies the Network and does inference as well,
by using “associated” synapses.

That is an insect level. The Net gets input - it responds.

By design, dedicated nodes (a node a pattern) could message laterally
(sideways), thus generating new combinations of nodes (patterns) and
producing new, more complex, combinations (meaning thinking without input).

And I must tell you - irresponsible thinking can produce pretty stupid
and dangerous new ideas-combinations.

By design, dedicated nodes (a node a pattern) can message back to
“predict, inject context, hypnotize”, which undedicated nodes
can not.

That is pretty much it.

There are a few logistics problems:
how to grow that Net, how to contain growth, how to assign and use associations,
how to grow motor Net.

Those are solvable. I’ve done it, so can you. Best!

2 Likes

Naturally, a working example would go a long way to demonstrating how well this works.
Perhaps in a robot?

2 Likes

No robots - cannot afford it. Text/images are nice proving grounds. Here are working examples:
https://youtu.be/_NjjKeGltBw (Sep 2020, breakthrough with finding a distance measure working in feature spaces ~10^6 dimensionality)
IMDB Reviews Dataset - YouTube (same thing - IMDB)
https://youtu.be/CJY0zgMBwb0 (Amazon sentiment)

Fresh repo (used it as backup, kind of messy, but working: classification, generation):

Last repo (backup again, the simplest implementation: local continual Hebbian learning, no BP, single [dah epoch, structurally and synaptically plastic, emulating spiking messaging). Does online clustering of tokens streams (characters, words, tokens, integers). Want to play - start the jar, tune config file: number of streams +++.

Here is video of how it works : Sign Up | LinkedIn

Here is video of how structural (LED) matrix is built:

Here are examples of generations (nano generator ~200 lines of Java, on top of trained “language model” - stochastic parrot:

“Never again she be able to get a better look at the man who had been in the same way as the malfunctioning one.”(c)

“I suspect real gangbangers do not wear T-shirts outside on new years eve in northern Europe.”(c)

Some musing on how lateral messaging between dedicated neurons could work:

(side propagation :slight_smile: )

And if you not tired yet, here is an attempt (#17) to describe what’s going on and how it works:

Enjoy.

Images are not that obvious - have no publishable posts about. But - have no reason to mislead - it works (trains, classifies, generalizes) on the same platform.

I’m leaving tomorrow morning for a week - so, do not miss me much, be back - will respond :slight_smile:

2 Likes

That is out of context. I never said intelligence is LLM-like. I said that language might reside at the top layers (Eastern) of the structurally plastic net. Got to go, later.

1 Like

Right, sorry. I should’ve read more carefully.

I still think this applies:

how do you take an object which you’re never seen before and mentally rotate it? Something to do with the more abstract patterns, but just taking patterns of patterns won’t get you there, because there’s a temporal aspect there, or at least something more.

Have a good trip.

2 Likes

interesting work, but the performance seems pretty lacking - 83% isn’t a good score, especially with 1 BLN (Billion?) parameters - rule based systems can trivially outperform that…

1 Like

Thank you for “interesting work”. It was 87% accuracy of IMDB sentiment, plus ~40% of fine grained sentiment - 8 classes when continually learning. 87% comparable to “vowpal wabbit” trained continually, but there are a few advantages. Achieved on first 10% of samples, trained on IMDB, recognized Amazon sentiment. Same engine processes images - by converting to a sequence of BLOBs. It’s a big talk, got to go. BTW - 1BLN (Billion, correct) - parameters (synapses) generated in a hour on a 128GB RAM PC. Max I did 18BLN by dumping to disk and then merging. Was pissed of by OpenAI(?) bragging about a couple of BLN parameters trained… think of knowledge transfer :slight_smile: (merging RAM and disk).
Anyways, have a good week everybody. Cheers!

2 Likes

I would like to pose a question why the net has to grow to the new inputs.
Lets say we have 10 hours of active input getting into our brain, and the next 1 hour input too gets inside the brain? Why?

1 Like

If except if trees can generate viable sentences with correct spelling using almost no computational resources. You just simple scan through the data once or twice and update some entries in a hash table. The context length is only a few characters though.
I see what you mean that there are potentially simpler LLMs that could beat neural network LLMs. Given that less compute should be required let the person who can imagine such a construct implement it and show proof.

Though I have to agree that showing proof by theory and implementation, by someone outside the social hierarchies involved can simply not be enough.
But then the last laugh is on the higher ups.

I was just thinking about how to explain one aspect of neural networks which is how a composition of weighted sums of weighted sums of … differs from a simple weighted sum. Since the composition is reducible to a simple weighted sum by basic linear algebra. And the difference is that during the simplification of a composition of weighted sums each weight is the result of multiple terms multiplied together. And that means there is exponentiation going. Some of the weights will decay down to zero and others will inflate.
If the weights in each weighted sum in a composition of weighted sums are from the the uniform random distribution, that isn’t going to be the case after simplification.

1 Like

This is very interesting, I also think growing a connectome on new data instead of training an arbitrary large network has several advantages.

Instead (or before) published papers maybe some intermediate description of key concepts might attract interest from people willing to explore it

1 Like

There are robot environments with various degrees of complexity on openai’s gym. I am not sure whether there are any Java equivalents or APIs on top of it

1 Like

There are a couple of people saying I cannot explain what I know. Lets see.
Let me know what point is difficult to understand.

a) conventional neuron model suggests a neuron has
a set of pre-synaptic (input) weighted links (addresses, pointers, IDs),
with real number inputs,
a set of postsynaptic links and an activation function broadcasting a
real number output.

I suggest that a neuron has two sets of pre-synaptic inputs:
a structural one of “binary” links, activating the neuron only
if all the inputs are “ones”. That set answers questions “what”, " where" and “when”.
The second set is conventional analogous inputs reflecting value of
the neuron to the owner. Activation is conventional linear combination,
but activated only if “structural” inputs are al “ones” (logical AND).

b) The neural net organized in several layers with initially disconnected nodes.
Input creates connections. Like “abcd” create “ab” and “cd” on the first layer,
“ab” + “cd” = “abcd” - the only second layer node uniting “ab” and “cd”.

The bigger the sensor the more nodes it generates.

c) Here we must remember that two major ML watersheds are Model Based and
Instance Based approaches (Google). Point b) extends and exacerbates
problems of IBL - but solves many more.
Instead of memorizing “instances” the Net memorizes all the “sub-instances”
of instances. It requires more memory (RAM), but solves comparison of
instances of different length [irregularly sampled],
allows to explicitly calculate Kolmogorov complexity
(as function of set of sub-instances of different sizes),
solves continual learning and knowledge transfer, employs local learning,
and allows associate a node with either unassociated frequency [unsupervised],
frequency associated with a particular [multiple] classes [supervised] and
frequency associated with complex [delayed] [multiple] rewards - reinforced learning.

d) The incarnated IBL seems impossible because of a number of potential
nodes/connections to form the “structural” net. Experiments show that
exponential growth could be contained by removing irrelevant nodes and
with help of natural recurrence of patterns.

e) Inference made by activation of sparse number of “structural” nodes,
accumulating analogous “attitudal” connections, makes stochastic inference
explainable. One knows “who/what” and “how strong” voters are. Tested in experiments.

f) Unlike Model Based Learning, IBL enables meaningful messaging between “structural”
nodes. Because every node bears a particular pattern. At the top layers a node
bears a complex idea/concept. Now communication between nodes enables create new
connections without sensory input. Backward connections support “context injection”
up to hypnosis, sideways messaging implement “thinking”.
IBL bridges stochastic inference with symbolic one - no MBL will ever do it.
By design.

g) All that messaging could be implemented as spiking protocol. Because spikes
both can convey as both “binary” (barcodes) and analogous (integratable) info.
That is the essence of the “neural code”.

h) Simple IBL containers could be implemented as a short program on von Neumann
conventional PC with time complexity as O(nodes number). HW platforms like Intel
Loihi will learn and infer with time complexity as O(layers number).

In one sitting we are here solving “neural code”, Kolmogorov complexity,
local continual training, dismissing principal differences between UL, SL and RL,
explainable inference, bridging stochastic to symbolic.

Many experiments/sources are listed above. What is not clear, I wonder?
I’ll keep my peace now.

2 Likes

This misses much of the known behaviors of cells.
It’s late and I need to go to bed but I will just say that you miss known temporal behavior and some known spatial processing.
It may work like a fancy perception and that is sufficient for many things, but I expect my neural models to do more.

2 Likes

Pardon my mind’s slowness, one of the things I find hard to understand is, the following, especially the highlighted part:

  • What/who is an neuron’s owner?
  • from following paragraph, seems more like a threshold value meant to activate the whole neuron - much like a node’s bias value, but bias (like weights) is just another trainable parameter, not an input. Or in your case, you simply describe it as an AND function over binary inputs, which again, is neither analogue nor input

PS it can be read as the analogue input is like an attention bias, so it allows the mysterious “owner” to look for/highlight specific input patterns. This could be powerful - a substrate to implement attention - yet opens a pandora’s box of questions about who, and by what logic, is controlling the attention patterns.

1 Like

well, if it works, it could be a base on top of which to implement either/both. As in transformers, a certain structuring/encoding of input(s) allows them to process both spatial (images) and sequential (text) data, on top of and without having to change the fundamentals of matrix dot product.

2 Likes

I was convinced the neuron’s owner is the brain owner (intellectual entity?), which tend to adapt and survive.

A presynaptic nodes “A” and “B” must activate (or create) the node “AB”, which is not how bias works.
Can only reiterate my crude visualization:

1 Like

yes… but having 1B parameters means nothing, when NNs can do this task with ~70M params. At the billion parameter regime, we get generalist models which can in-context learn. Those capabilities are missing here

1 Like

But 70M parameters belong to “faceless” nodes, associated with unexplainable real numbers mean nothing. Just parameters of equation.
My 1B parameters belong to “self-labeled” nodes associated with particular patterns (Instance Based Learning). 1B can be reduced to 70M by careful selection, I’ve started the work, but have to drive my truck to Texas tomorrow. Will finish then back :-).
The main point though, there are interesting symbolic operations over “named nodes-patterns” , “faceless” will stay pure stochastic forever. A shortcut to a dead-end. BP is not local.

1 Like

Intelligence is product of neural messages content, not the form. Carbon based protocols could be very different from silicon based. Thus there are two valid approaches:
a) conventional-systemic studying “behavior of cells” and then understanding the neural code (carbon to silicon)
b) risky - conjecturing of what content messaging might convey and experimentally checking completeness of the speculated code (silicon to carbon?)
You do a) I do b).
IMHO the set of “carbon behavior” features in my silicon implementations are most complete at present. One single engine/platform is: Hebbian-local, explainable (IBL), continually learning, multimodal, stable to drifts, structurally and synaptically plastic, maybe something else. It is late, got to go :-). Oh, well, it explains some psychological phenomena: curiosity, hypnosis, awe and sorrow. Too big a topic, so little time (c). Thanks for commenting, later!

1 Like

Ohh, you did write down how your program works, it’s right here: https://www.linkedin.com/pulse/myths-instance-based-learning-alex-semenov

That’s actually pretty neat!

If I could offer a piece of advice: you should take that blog post and turn it into a more complete article. I recommend putting a lot more effort into a one great explanation, instead of trying to explain your ideas ten time to ten different people.

About that blog post:

  • I would add a sentence or two explaining the difference between instance based and model based learning.
  • Also you should show a simplified example of how a sentence gets broken down into a basic tokens and how the basic tokens get combined into more complex tokens. In the example: label each neuron to show what it represents. A simple picture/diagram would help the casual reader.
5 Likes