Continuous Thought Machine

"Therefore, in this work, we address the strong limitation imposed by overlooking neural activity as a central aspect of intelligence. We introduce the Continuous Thought Machine (CTM), a novel neural network architecture designed to explicitly incorporate neural timing as a foundational element. Our contributions are as follows:

  • We introduce a decoupled internal dimension, a novel approach to modeling the temporal evolution of neural activity. We view this dimension as that over which thought can unfold in an artificial neural system, hence the choice of nomenclature.
  • We provide a mid-level abstraction for neurons, which we call neuron-level models (NLMs), where every neuron has its own internal weights that process a history of incoming signals (i.e., pre-activations) to activate (as opposed to a static ReLU, for example).
  • We use neural synchronization directly as the latent representation with which the CTM observes (e.g., through an attention query) and predicts (e.g., via a projection to logits). This biologically-inspired design choice puts forward neural activity as the crucial element for any manifestation of intelligence the CTM might demonstrate." https://pub.sakana.ai/ctm/
5 Likes

are there any other architectures that does its thinking in “neuralese” with variable timing (e.g. the harder it is the longer it thinks) before generating an output?

1 Like

I guess reasoning LLMs would count as variable time responders

1 Like

Right. I was trying to refer to latent reasoning where it reasons in more abstract spaces.

I just found this paper providing a survey of Latent Reasoning:

2 Likes

I would think this should start with latent-space prediction, as in LeCun’s JEPA

2 Likes

I have a paper on graph continuous thought machines that replace the synapse model and the neuron models with a graph convolutional network.

The gcnn outputs the next graph in the thought process as guided by learnt property vectors.

What’s interesting is that the synchronization matrix regulates the attention given to the nodes as well as the input.

So these nodes may be seen as neurons in their own right. And consecutive graphs have connections between them that sent virtual signals and caused them to spike.

The nodes and potential nodes exist in a dispositional neural network, and only the nodes that are currently activated are instantiated in the gcnn.

So as the outputs from the synchronization matrix modulate attention, a subset of the attended dispositional neurons will represent memory.

While other parts of the dispositional network and parts of the input represent keys that index the next presentation of memory.

In fact only the pre frontal cortex dispositional nodes will contribute to the synchronization matrix.

So the pfc performs read and write operations to memory this way.

1 Like

From what I remember, the edges in GCNN are weighted by global backprop, which is not how PFC or anything in the brain works. It’s not my element though.

1 Like

The pfc attends to different parts of the rest of the brain alongside the input.

Which is what happens when we use the pfc nodes as the source for entries in the synchronization matrix

1 Like

The point is that the actual algorithm is agnostic to whether you use a spiking gcnn or a deep artificial network.

1 Like

Because the real neurons are the nodes of the graph not the neurons in the gcnn.

Wasn’t that part clear?

1 Like

Yeah, someone is “agnostic” here. You are mixing up a lot of different things, so it’s almost meaningless.

1 Like

You are referring to the actual neural network instead of the implicit one. The actual neural network can be anything you like as long as it processes graphs and outputs another one.

The nodes of the graph at any one time represent the instantiation of the nodes of the implicit neural network

These are then connected to the nodes in the next graph.

Each node is associated with a number that represents it’s output.

Only those outputs from the pfc nodes go towards the synchronization matrix.

This affects which nodes are attended to alongside which part of the inputs are attended to.

The nodes that are attended to represent memory that has been recalled while the parts of the state that are attended to represent keys.

There are other keys within the nodes being attended to influencing what pfc nodes and next memory is activated.

The pfc then governs which memories and keys are activated read and written.

1 Like

====We then employ neural training modules which are spiking neural networks which have their nodes mapped with keys from a musical keyboard. In particular when exposed to the state of teacher systems the nodes are trained to musically harmonize, while when exposed to the state of the untrained agent they are dissonant. The agent then tries to maximise consonance in the spiking network by using it as a reward signal. By this method the agent is trained to perform like the teacher system. We introduce text conditioned neural training modules, that condition the input on text. We show a method to modulate not just the behavior of the system , but the connectivity of the dispositional network of a GCTM.====

With this method you can specify how you want to optimise the connections in the dispositional network using natural language

1 Like

For instance you could whittle down the search space by imposing a grammar on the activations of the dispositional network. Or you could urge it to optimise for all the tasks that the neural module was pretrained on.

Another thing you could do is tell it to value human life.

If the text is hard wired, as the agent learns to speak and understand text in the real world. And as attention is used on its own discourse that it’s come to understand, it will also understand the text parsed to the neural reward function.

Tht means you can implement neuroscienctific knowledge in text form to design the connections of the dispositional network.

As well as to get to understand how it should be aligned.

This algorithm uses the policies own intelligence to break and make connections within the dispositional network.

1 Like