Chaos/reservoir computing and sequential cognitive models like HTM

I think the power of chaotic systems is that once trained properly, they can generate novel but still plausible instances of its input.

I believe what @robf is saying is more like constructive/destructive interference of ripples generating “groups” but I think there also has to be nonlinearity and damping on the system.


Yes. The ripples analogy is like this: when you throw a stone in a lake initial impact (size, speed, direction, lak shape,etc…) determine a unique pattern of ripples that attenuate in time. If you throw several stones at different points their patterns overlap. By measuring wave states at several random “output” points in the lake, the output pattern tends to be unique to the input sequence of stones. While you can not easily compute initial conditions, similarity between inputs tend to propagate towards their respective outputs thus you can train a simple linear classifier/regressor to identify input temporal patterns

In practice there are many ideas of implementation, from fixed non trainable RNNs to physical analog circuits. Even a damped pendulum was shown to exhibit reservoir properties [2201.13390] Machine Learning Potential of a Single Pendulum

1 Like

This leads me to a deeper / more-philosophical wonder, that “similarity” seems to be some imposed-effect-on-observer per the perception, or some qualia, rather than any innate property of the “inputs”. That’s to say, you “feel” the inputs are “similar” so for them to be, so the mechanism made your “feeling-of-similarity” is what really matters, regardless of any other relationship (however physical/true/significant otherwise) between those inputs.

Then as human feelings converge at large, so do semantics conveyed by utterance (despite of which surface language it’s in), it’s no wonder some unhuman mechanisms can converge similarly?

It may certify the false faith in back-propagation w.r.t. AI today, but I still feel the discovery of such unhuman mechanisms not guaranteed, with HTM or other biologic plausible/inspired attempts.

1 Like

Novel predictive power, yes. This novel recombination mechanism can never have the same predictive power as physical reality. But it can find “objects” in that world. And the ability to resolve the world into different “objects” has a novelty of its own.

I guess this is what we see when we appreciate great art. We see new ways of resolving the world. These new ways in which an artist resolves the world don’t exceed the power of the world to generate its own destiny, but they can be things that the world has not bothered to spell out by explicitly grouping them until the artist, or simply original thinking person, comes along.

There’s nothing pessimistic in being the agent which expresses a new “object” or “truth” just because it was always latent in the world. It’s the fact it was latent in the world, but not expressed before, which makes it great art.

The ripples won’t so much have interference. Because they don’t have to all occur at the same time. It only becomes interference is you try to “learn” them, like a transformer. And I imagine a transformer just hashes them all distinguished by different contexts, so keeps them separate that way.

So I think transformers deal with this “interference” of contradictions by hiding it from us.

That’s why transformers will be working so much better than GOFAI. GOFAI tries to learn a single underlying pattern. The interference will be what washes symbolic representations out, and makes GOFAI ineffective.

Ah, that was a twist of echo state machines I didn’t know. This combination of states to reproduce a sequence.

Anyway, to the extent they reduce to a lookup “hash” that puts more credit on this recombination mechanism I posit evolved to work on top of them. The “Google low light photography” like “stacking” mechanism. In that sense it will only be their dynamical nature which is directly significant to what I am talking about in their operation.


Yeah! Great art as well as beautiful scientific discoveries, they come to you, according to no one’s plan. Comforting enough, at least.

1 Like

Well a “stacking” mechanism “similarity” could be whatever you want in principle, yes. That then generates the “feeling-of-similarity”, so you might be right in that sense. But that’s not the same as saying we humans get to choose what “feeling-of-similarity” we’re going to perceive. I think we’re locked in to this shared context/prediction grouping property. That’s something specific. We will only perceive objects in the world if they share some kind of predictive destiny. Like the different atoms in a football, say. In fact, but also perceptually for our visual system, what makes them an object will be that the different parts of a football tend to share contexts and predictive effects.

But that’s not the only way it could be done. Another one of my favourite references on this is some work done on “reverse engineering” I saw in a talk some years back. In this talk, I see the presenter proposing dynamic, generative, relational principles to generate new perceptual “objects” (as I recall… locality preservation, and… some kind of fractal transform… or maybe it’s the fractal transformation which is the locality preserving one!) It’s only because the dynamic generative way he determines “objects” is a dynamical generative property like the grouping principles I am proposing, that I see what he is doing as an analogue to human perception. These “objects” and the relational principles on which they are generated, are totally non-human. Humans cannot see them. But he finds them useful for other purposes. Purposes humans have not evolved to carry out effectively. For instance, he uses them to find images in binary data. Or crypto-graphic keys. Stuff humans are completely blind to. At best learning a few common patterns with time. It’s a completely inhuman relational principle. Like the mechanism I’m suggesting for human perception, it also constantly generates new “objects”, but these are non-human objects.

I think of it as examples how an alien intelligence might differ from ours (at least in the sense of the way it might resolve the world into perceptual objects.

Christopher Domas The future of RE Dynamic Binary Visualization


Just a comment on this @cezar_t . What you say about reservoir computing needing to be below the level of chaos may be true because reservoir computing is just a hash. But the “enhancement” I suggest developed on top of it would not just be hashing. It would actually be “stacking” past events in new ways. I think these ways could be different attractors of a chaotic system. There’s no reason why they shouldn’t, anyway. Because you do want new structure with these new “stackings”. With reservoir computing you probably don’t want actual new structure. You just want a maximally sensitive hash to known structure. But with this enhancement, it might (occasionally?) step over the line and generate new structure. And this new structure could be at the same time new, chaotic, and meaningful, meaningful because there would be meaning in the way of generating novel combinations.


sounds like this guy is putting in a lot of effort into making hacking/reverse engineering actualy look like what we see in movies… I love it.


Yeah. Very cool. And I love thinking about the objects he generates as perceptual objects for some potential alien being.

But the really cool thing will be when we implement this for a system which works like human vision!

Something which generates “objects” that correspond to objects that people see. Generating them. Novel objects. The way the objects Domas generates are novel, but still meaningful.

Something that doesn’t depend on trying to label every last speck of dust on the planet so it can be “learned” beforehand, the way we do now.

I see what Tesla is doing, heaps of money thrown at artificial situation generators, to generate beforehand every conceivable thing that might happen, so every conceivable thing can be “learned” before it happens.

Or language transformers, only novel by interpolation over stuff that’s been said before.

So much money and effort being wasted.

1 Like

Similarity in this context is about metrics. Or measurable distance between two arbitrary data points.
Define some metrics for inputs, some (same or different) metrics for outputs. Then “similarity is preserved” simply means that when

Din(A,B) < Din(C,D) 
Dout(f(A),f(B)) < Dout(f(C), f(D)) 


  • A,B,C,D are arbitrary input data points,
  • f() is a function that propagates similarity e.g. f(X) is the output of input X,
  • Din(X,Y) is distance between X and Y points from input space,
  • Dout(U,W) is distance between U and W points from output space

Sure the actual metrics can be arbitrary but when you apply them consistently and function f() really preserves similarity, you might be surprised to notice it does it for many distance functions

PS hash functions in general do not preserve similarity, e.g. there are cryptographic hashes that are explicitly designed to do the opposite. Those who do, are locality-sensitive hashes.

Also there is a bit more to reservoirs than just hashes . They can be used for prediction and generating sequences that look like inputs. Which means they are able - when paired with simple learners - to model the “outside world”.

Lets say we have a time series with single scalar value e.g. temperature which is fed into a reservoir, and reservoir’s state (or output) is used to train a simple regression ML model that predicts the next temperature reading. So here we have prediction. And we can feed back the predicted temperature as input and predict temperature at next time step, and so on. Of course for each new loop the predicted temperature drifts more and more from the “real” one, but if we compare plots/charts of self-predicted loop (imagined future) with real data we often see they look similarly which means the reservoir-predictor pair (not reservoir alone) is modelling (mimicking) the outer process that produces fluctuations in temperature.


Not quite true for what I’m proposing @cezar_t . For most of what is done for training conventional neural nets, yes. But not for the similarity I’m proposing.

The similarity I’m proposing is one of shared context.

I don’t think shared context (connectivity) can be plotted as distance between points in a space.

I might be wrong. Perhaps you can show me how.

Like the clusters in the Brazilian paper. Can the clusters in the Brazilian paper be characterized as having similar distance between two points?

I think the difference is something like that between a cosine or dot-product and a cross-product. The cross-product doesn’t project dimensions on each other, it generates new dimensions. Things are similar, but similar in a new way. Not similar by sharing some old way.


I just clarified what a function that preserves similarity from inputs to outputs means.
Keep people informed upon (my understanding upon) how reservoir computing is used in many, if not most researches.
This meaning isn’t limited to neural nets. EG it is used in dimensionality reduction, reverse image search or word embeddings (glove, word2vec.).

Why not, e.g. glove explicitly defines similarity between two words as how often they-re found in same or close contexts. The resulting vectors (aka embeddings) of synonyms are very close to each other.

I’ll have to check the Brazillian paper you mentioned.

Anyway however you define similarity, if you want that definition to be useful, you’ll have to be very specific so it can be measured / computed.

PS I guess the wikipedia page on reservoir computing is more useful illustrating the core ideas and various approaches to implementation

1 Like

What is reservoir computing?
Reservoir computing is a type of machine learning technique that is used for processing sequential data. Unlike traditional neural networks, which require significant amounts of training data and time to adjust their weights, reservoir computing uses a fixed, randomly generated set of connections between input and hidden layers called the “reservoir.”

The input data is fed into the reservoir and allowed to reverberate through the network, resulting in a complex pattern of activations that can be read out by a linear output layer to produce the final output. The output weights are the only part of the network that are trained, making it much faster and easier to train than traditional neural networks.

Reservoir computing has been used successfully in a wide range of applications, including speech recognition, time-series prediction, and control systems. Its ability to process sequential data in real-time with low computational cost has made it a popular choice for tasks such as signal processing and robotics.

Can the reservoir be sparse?
Yes, the reservoir in reservoir computing can be sparse. In fact, the sparsity of the reservoir is one of its key features that makes it computationally efficient and easier to implement.

A sparse reservoir is one in which only a small fraction of the possible connections between neurons in the input and hidden layers are active. This sparsity can be achieved by randomly pruning connections in the network or by using specific algorithms to generate sparse connections.

Sparsity has been shown to have several advantages in reservoir computing. First, it reduces the computational complexity of the network, making it easier to implement and faster to train. Second, it allows for a better separation of input patterns, which can improve the network’s ability to classify or predict different input sequences. Finally, it can improve the robustness of the network to noise and other types of input variability.

Overall, sparsity is an important design feature in reservoir computing, and researchers continue to explore the best ways to generate and optimize sparse reservoirs for different applications.

What do you mean when you say reverberate?
In the context of reservoir computing, “reverberate” refers to the way that input signals propagate through the network and create a complex pattern of activations that can be used to generate the final output.

When the input data is fed into the reservoir, it interacts with the randomly generated connections between the input and hidden layers. These interactions cause the input signals to be transformed in a nonlinear way, resulting in a complex pattern of activations that reverberate through the network.

This reverberation allows the network to integrate information from multiple time steps of the input sequence and create a rich representation of the input data. The output layer of the network then reads out this representation and uses it to generate the final output.

The term “reverberation” is often used because the pattern of activations in the network can persist for a short period of time after the input signal has been removed, similar to the way that sound waves can reverberate in a physical space. This persistence of activity allows the network to process sequential data in real-time and make predictions or classifications based on the current and past inputs.

Does this imply that it has to be used with sequential data?
Reservoir computing was originally designed to process sequential data, but it can also be used for other types of data. In fact, one of the advantages of reservoir computing is its flexibility and ability to handle a wide range of input types and modalities.

While the original application of reservoir computing was focused on time-series data, researchers have also explored its use for other types of data, such as images, graphs, and text. For example, reservoir computing has been applied to image classification tasks by using the input pixels as a sequence of inputs, and to natural language processing tasks by using the words in a sentence as a sequence of inputs.

In general, reservoir computing can be used for any task that requires the processing of complex input data, regardless of whether the data is sequential or not. However, the specific design of the reservoir and the readout layer may need to be modified to accommodate the particular characteristics of the input data.

What is the best method to train the output layer?
The output layer of a reservoir computing network is typically trained using a linear regression method. The goal of training the output layer is to learn the weights that map the pattern of activations in the reservoir to the desired output.

The training process involves presenting a set of input sequences to the network and computing the corresponding reservoir activations. These activations are then used to train the output layer using linear regression or a similar method, such as ridge regression or Lasso regression. The trained output layer can then be used to generate the final output for new input sequences.

Linear regression is a popular choice for training the output layer because it is computationally efficient and can be easily implemented. It also allows for a straightforward interpretation of the learned weights, which can provide insights into the underlying relationship between the input and output variables.

However, other methods for training the output layer have also been explored, such as support vector regression and neural networks. The choice of method may depend on the specific characteristics of the task and the amount and complexity of the training data available.

Overall, linear regression is often considered to be the best method for training the output layer of a reservoir computing network due to its simplicity, efficiency, and interpretability.

Can I use hebbian learning?
Yes, Hebbian learning can be used to train the connections within the reservoir of a reservoir computing network. Hebbian learning is a type of unsupervised learning that modifies the weights between neurons based on their activity patterns. The idea behind Hebbian learning is that when two neurons are activated at the same time, the connection between them should be strengthened, and when they are not activated together, the connection should be weakened.

In reservoir computing, Hebbian learning can be used to adjust the weights between the neurons in the input and hidden layers of the network. This can allow the network to adapt to the statistical properties of the input data and generate more effective reservoir activations for a given task.

However, it’s important to note that Hebbian learning can be slower and less stable than other training methods, such as linear regression. It can also be more difficult to control and interpret the resulting weights. Therefore, the choice of training method may depend on the specific task requirements and the available training data.

Overall, while Hebbian learning is a valid option for training the connections within the reservoir of a reservoir computing network, it may not always be the best choice depending on the specific needs of the task at hand.

How about back propagation?
Backpropagation is typically not used to train the connections within the reservoir of a reservoir computing network.

The reason for this is that backpropagation requires a feedforward neural network architecture with a fixed number of layers and connections, whereas reservoir computing is based on a recurrent neural network architecture with randomly generated connections. In a reservoir computing network, the connections between the input and hidden layers are fixed and randomly generated, and only the weights between the hidden layer and the output layer are trained.

Therefore, instead of backpropagation, the connections within the reservoir are typically trained using unsupervised learning methods, such as Hebbian learning, or they are randomly generated and kept fixed throughout the training process. The output layer of the network is then trained using a linear regression method, as described in the previous answer.

However, there are some variations of reservoir computing that incorporate feedback connections and allow for a form of backpropagation-like training, such as echo state networks with feedback and liquid state machines. These variations may use feedback connections to allow the network to learn more complex dynamics and patterns in the input data, and may use a form of backpropagation to train these feedback connections. However, the overall architecture and training methods used in these variations of reservoir computing are still fundamentally different from standard backpropagation-based neural networks.

Are there any methods based on one shot or even zero shot learning?
Reservoir computing is typically not associated with one-shot or zero-shot learning, as it is a form of supervised learning that requires labeled training data to learn the mapping between inputs and outputs. However, there are some variations of reservoir computing that can be adapted for few-shot learning and zero-shot learning tasks.

Few-shot learning involves training a model to generalize to new classes with only a few examples, while zero-shot learning involves training a model to recognize new classes without any examples. These types of learning can be challenging for traditional supervised learning methods, but they have been explored in recent years using techniques such as meta-learning and transfer learning.

One approach for adapting reservoir computing for few-shot learning is to use a small subset of the labeled training data to initialize the reservoir weights, and then fine-tune the output weights using the remaining data. This approach has been explored in some studies and has shown promise for few-shot learning tasks.

Another approach for adapting reservoir computing for zero-shot learning is to use a semantic representation of the output classes, such as word embeddings or attribute vectors, to train the output layer. This allows the network to generalize to new classes based on their semantic similarity to the training classes, rather than relying solely on the input data.

Overall, while reservoir computing is not traditionally associated with one-shot or zero-shot learning, there are some adaptations and variations that can be used for these types of tasks. However, the effectiveness of these methods may depend on the specific task and the available data.

Are there any similarities to transformer architecture?
While reservoir computing and transformer architectures are both based on neural network models that can process sequential data, they have some fundamental differences in their design and training methods.

Reservoir computing is based on a recurrent neural network architecture with randomly generated connections between the input and hidden layers. The reservoir is a dynamic system that generates complex temporal patterns in response to the input data, which are then fed into a linear output layer for classification or regression.

Transformers, on the other hand, are based on a self-attention mechanism that allows the network to attend to different parts of the input sequence and capture long-range dependencies. Transformers have been shown to be highly effective for tasks such as natural language processing, where the input data consists of sequences of words or tokens.

However, there are some similarities between the two architectures. Both reservoir computing and transformers can handle input sequences of varying lengths, and both can capture complex temporal patterns and dependencies within the input data. Additionally, both architectures can be adapted for tasks such as classification, regression, and sequence-to-sequence prediction.

Moreover, recent research has explored the possibility of combining elements of both architectures, such as using a learned attention mechanism in the reservoir of a reservoir computing network, or incorporating a reservoir-like structure within a transformer architecture. These hybrid models have shown promise in certain tasks and are an area of active research.

Overall, while there are some similarities between reservoir computing and transformer architectures, they have different underlying principles and design choices, and each approach may be better suited for different types of problems and data.

Is there a possible cross-over to transformers?
Yes, there is a possible cross-over between reservoir computing and transformers. Both approaches are based on neural network architectures that can handle sequential data, although they differ in their specific design and training methods.

Reservoir computing is based on a recurrent neural network architecture with randomly generated connections between the input and hidden layers. The reservoir is a dynamic system that generates complex temporal patterns in response to the input data, which are then fed into a linear output layer for classification or regression.

Transformers, on the other hand, are based on a self-attention mechanism that allows the network to attend to different parts of the input sequence and capture long-range dependencies. Transformers have been shown to be highly effective for tasks such as natural language processing, where the input data consists of sequences of words or tokens.

While the specific design and training methods of reservoir computing and transformers are different, there are some potential cross-overs between the two approaches. For example, the random connections in reservoir computing could be replaced with a learned attention mechanism inspired by transformers, allowing the network to attend to different parts of the input sequence and generate more informative reservoir activations. Similarly, the self-attention mechanism in transformers could be combined with the linear output layer of reservoir computing to generate more effective output predictions.

Overall, while there are some potential cross-overs between reservoir computing and transformers, it is an area of active research and more work is needed to explore the potential benefits and drawbacks of combining these approaches.


Is this ChatGPT answering?


I am using it almost daily to develop a feel for what it can and cannot do.

I try to ask it about things that I think I know so I can get a feel for how well it does in coming up with answers.

In general it knows “about” things; it is just very iffy about the exact details. Considering that it is abstracting a statistical choice of the words that appear in a subject area, it will be able to offer the central tendency in general terms. For this to work, there will have to be a lot of similar things said about the thing I am asking.

For obscure things where there are not a lot of words said about the thing I am asking, it tends to be “off the wall” and occasionally, a total confabulation.

I double check any reference it offers and a surprising fraction of named references are totally bogus. If I am not too specific on the search terms I usually find that there are a lot of papers “like” the reference offered, just not the named author or the exact named paper. It feels a bit like when I am searching for a paper that I have saved on my hard drive where I can’t remember the exact name or who the author was; “I remember reading a paper about that!”

Most of the people here know about the reservoir computing method. Feel free to judge for yourself, but I am in agreement with the answers it offered. I pushed it at the edges to see what it would do - I am satisfied that the answers it offered were appropriate.

I particularly like how it keeps the context of the conversation. I have not really seen that in prior chatbots that I have played with. That is one of the features that I try to push in conversations and so far - it has done a really good job of following the focus of a discussion. Notice that I only used the word reservoir twice directly in the start of the entire exchange. It did not jump to transformers when I introduced that.

Look at how well it answered the “crossover” question. It knew that the main topic was reservoir and did a fair job of identifying the key elements that corresponded and what part of “this one” would work in “that one.” Color me impressed.


You’re right. I used a similarity metric for my first attempt at this “amplification” of predictive coding. The “vectors” I tried to use to condense the network connections, had values based on similarity metrics. I used some variations of my own, and then settled on one by Dekang Lin:

Dekang Lin, An Information-Theoretic Definition of Similarity, In Proceedings of the 15th International Conference on Machine Learning, 1998, pages 296–304, Morgan Kaufmann.

How that’s applied is described in my old Vector Parser paper, earlier pre-print here:

Parsing using a grammar of word association vectors

But, obvious in hindsight, I later realized that a single similarity value washed out a lot of structure. I thought I was capturing different dimensions which could be constructed by combining two vectors. But two words can be similar in some contexts and different in others. I finally came to the conclusion it is not possible to abstract the context information more compactly than the network itself.

Each projection you make from that network can be a kind of similarity metric on it. But it is the number of… dimensions of such similarity which can be projected out which really interest me.

So, yes, you’re right, similarity of context will define a kind of metric on sub-sequences. But that similarity won’t be fixed. I think the dimensions possible may be infinite. Or at least as big as all different possible combinations of contexts of said sub-sequences.

I don’t know. Is that the same as talking about distance between two points. Perhaps. It is just that the actual distance measures will vary wildly, and unlike ANNs and “learning” I think that wild variation is the interesting bit, not the similarity itself.

Reference again:

A Network of Integrate and Fire Neurons for Community Detection in Complex Networks, Marcos G. Quiles, Liang Zhao, Fabricio A. Breve, Roseli A. F. Romero [ ]


I agree that the big problem with these transformer chatbots is that they completely deadpan, jam absolute fictions in with a lot of facts.

In experiments I did it seemed more work to pick apart what might be false, than it was to use a conventional search engine.

Why deliberately mix your search results with random falsehood? It means you have to laboriously check everything. And some falsehood might slip through.

Anyway, on the relationship between transformers and reservoir computing.

From memory:

The parallel of reservoir computing as a kind of RNN seems valid.

I’m not sure there is any “learning” taking place in reservoir computers though. Is there? No deliberate structuring, anyway? The whole idea of a reservoir in my understanding is that it is unstructured, and dynamic.

(The main difference I recall between HTM and traditional ANNs was also this focus on a “learning” mechanism. HTM rejected, back-propagation specifically(?) as biologically implausible. Maybe HTM has also drifted in the direction of internally structuring in favour of “learning” recently. I haven’t kept up. But originally it benfitted from not having the same constraints for “learning” as ANNs.)

Anyway, from my point of view, it was the “learning” constraint which crippled RNNs. Originally the problem was that RNNs could not be learned. This was how I remember them being conceived through the 1990s: a solution to model language in principle, but in practice, unlearnable.

As I understand it, the history through RNN, to LSTM, to now transformers, was basically one of reducing an infinite context. To make RNNs learnable.

LSTM (Schmidhuber?) was the first to do this by reducing the infinite context to a single… vector(?) carried forward.

But with LSTM you were forced to carry the context forward all the time. And that context would get overwritten.

The big advance of transformers, as I understand, was that the “attention” mechanism essentially put the focus fully on arbitrary context, and removed the whole “carry it forward” burden. So transformers kept the advantage of LSTM over RNN, that it did not require the system to keep an infinite context. But it lost the disadvantage of LSTM that context was constantly overwritten as it was carried forward.

So the historical progression from RNN, to LSTM, to transformers, was all about innovations in ways to deal with “learning” over what is an essentially infinite context of a whole sequence.

If you don’t try to “learn” though, you can pare it back to the metal. You can have things dynamic as in reservoir computing. You can keep that infinite context. And you can structure over it, and structure dynamically too.

Which I think is necessary, because all the different ways of structuring may be chaotic.

All you need is a meaningful prior to do the structuring.

Notice also though that it is the way they handle novelty which is the big issue with transformers.

This is also my focus. How to deal with novelty. How can we constrain novelty to be “meaningful”? What makes novelty “meaningful”?

I think novelty is at the core of it. Transformers just interpolate between learned forms. At the limit (“not a lot of words said about the thing I am asking”, not enough data points to get a good similarity overlap?) their interpolation is random.

By contrast, I make novelty central. It’s not the point at which similarity metrics start to fade and become random anymore. “Meaning” is no longer only by reference to what is already known, equated to a similarity metric with a standard/label. The essential operation isn’t comparison, but recombination. I don’t need a lot of existing data points to specify a concept. I actively generate new “concepts” as new combinations of contexts. I replace “learning” with the construction of novel (even chaotic) groupings. And I do that by dynamically projecting out different groupings of contexts.

The fact that shared context is in itself meaningful, makes all these new combinations of contexts, inherently meaningful, even though they are new.


I observed that as well, but it quickly shifts context as it makes things up. Still impressive though.

1 Like

I feel this very inspiring! Yet the problem definition may/shall shift to “to make meaning of machine-models converge to human-compatible comprehensions” ?

Novel utilities are great things so long as they are for human purposes, at least comprehensible / controllable by humans.

Apparently we can label things per our knowledge, though I’m afraid we don’t really know “how” such knowledge is represented/possessed in our mind, thus to grant nonhuman mechanisms the same ability “to know” the “label” per specific inputs.

Numeric methods (among which back-propagation is classic) can make machines mimic our reasoning per trained from labels we gave them, then nevertheless, it seems “similarity” should be inherently multi-dimensional, and each “scalar numeric distance metric” reflects one particular aspect of similarity (as far as an observing subject concerns), then can I say:

“higher intelligence” can discover/know the “similarity-metric-dimensions”, while “lower intelligence” just computes each dimension’s “scalar-metric-function” ?

“what to metric” w.r.t. similarity would ultimately fall onto “purpose making” ? I.e. what are real concerns, what’s not, e.g. survival, pleasure, joy.

1 Like