Continuous Thought Machine

bkaz · May 13, 2025, 7:15pm

"Therefore, in this work, we address the strong limitation imposed by overlooking neural activity as a central aspect of intelligence. We introduce the Continuous Thought Machine (CTM), a novel neural network architecture designed to explicitly incorporate neural timing as a foundational element. Our contributions are as follows:

We introduce a decoupled internal dimension, a novel approach to modeling the temporal evolution of neural activity. We view this dimension as that over which thought can unfold in an artificial neural system, hence the choice of nomenclature.
We provide a mid-level abstraction for neurons, which we call neuron-level models (NLMs), where every neuron has its own internal weights that process a history of incoming signals (i.e., pre-activations) to activate (as opposed to a static ReLU, for example).
We use neural synchronization directly as the latent representation with which the CTM observes (e.g., through an attention query) and predicts (e.g., via a projection to logits). This biologically-inspired design choice puts forward neural activity as the crucial element for any manifestation of intelligence the CTM might demonstrate." https://pub.sakana.ai/ctm/

roboto · July 14, 2025, 8:17am

are there any other architectures that does its thinking in “neuralese” with variable timing (e.g. the harder it is the longer it thinks) before generating an output?

cezar_t · July 14, 2025, 8:33am

I guess reasoning LLMs would count as variable time responders

roboto · July 14, 2025, 9:57am

Right. I was trying to refer to latent reasoning where it reasons in more abstract spaces.

I just found this paper providing a survey of Latent Reasoning:

bkaz · July 14, 2025, 6:51pm

I would think this should start with latent-space prediction, as in LeCun’s JEPA

Tofara_Moyo · July 15, 2025, 8:33pm

I have a paper on graph continuous thought machines that replace the synapse model and the neuron models with a graph convolutional network.

The gcnn outputs the next graph in the thought process as guided by learnt property vectors.

What’s interesting is that the synchronization matrix regulates the attention given to the nodes as well as the input.

So these nodes may be seen as neurons in their own right. And consecutive graphs have connections between them that sent virtual signals and caused them to spike.

The nodes and potential nodes exist in a dispositional neural network, and only the nodes that are currently activated are instantiated in the gcnn.

So as the outputs from the synchronization matrix modulate attention, a subset of the attended dispositional neurons will represent memory.

While other parts of the dispositional network and parts of the input represent keys that index the next presentation of memory.

In fact only the pre frontal cortex dispositional nodes will contribute to the synchronization matrix.

So the pfc performs read and write operations to memory this way.

bkaz · July 15, 2025, 8:55pm

From what I remember, the edges in GCNN are weighted by global backprop, which is not how PFC or anything in the brain works. It’s not my element though.

Tofara_Moyo · July 15, 2025, 9:38pm

The pfc attends to different parts of the rest of the brain alongside the input.

Which is what happens when we use the pfc nodes as the source for entries in the synchronization matrix

Tofara_Moyo · July 15, 2025, 9:42pm

The point is that the actual algorithm is agnostic to whether you use a spiking gcnn or a deep artificial network.

Tofara_Moyo · July 15, 2025, 9:44pm

Because the real neurons are the nodes of the graph not the neurons in the gcnn.

Wasn’t that part clear?

bkaz · July 15, 2025, 9:51pm

Yeah, someone is “agnostic” here. You are mixing up a lot of different things, so it’s almost meaningless.

Tofara_Moyo · July 15, 2025, 10:42pm

You are referring to the actual neural network instead of the implicit one. The actual neural network can be anything you like as long as it processes graphs and outputs another one.

The nodes of the graph at any one time represent the instantiation of the nodes of the implicit neural network

These are then connected to the nodes in the next graph.

Each node is associated with a number that represents it’s output.

Only those outputs from the pfc nodes go towards the synchronization matrix.

This affects which nodes are attended to alongside which part of the inputs are attended to.

The nodes that are attended to represent memory that has been recalled while the parts of the state that are attended to represent keys.

There are other keys within the nodes being attended to influencing what pfc nodes and next memory is activated.

The pfc then governs which memories and keys are activated read and written.

Tofara_Moyo · July 15, 2025, 11:42pm

====We then employ neural training modules which are spiking neural networks which have their nodes mapped with keys from a musical keyboard. In particular when exposed to the state of teacher systems the nodes are trained to musically harmonize, while when exposed to the state of the untrained agent they are dissonant. The agent then tries to maximise consonance in the spiking network by using it as a reward signal. By this method the agent is trained to perform like the teacher system. We introduce text conditioned neural training modules, that condition the input on text. We show a method to modulate not just the behavior of the system , but the connectivity of the dispositional network of a GCTM.====

With this method you can specify how you want to optimise the connections in the dispositional network using natural language

Tofara_Moyo · July 16, 2025, 12:00am

For instance you could whittle down the search space by imposing a grammar on the activations of the dispositional network. Or you could urge it to optimise for all the tasks that the neural module was pretrained on.

Another thing you could do is tell it to value human life.

If the text is hard wired, as the agent learns to speak and understand text in the real world. And as attention is used on its own discourse that it’s come to understand, it will also understand the text parsed to the neural reward function.

Tht means you can implement neuroscienctific knowledge in text form to design the connections of the dispositional network.

As well as to get to understand how it should be aligned.

This algorithm uses the policies own intelligence to break and make connections within the dispositional network.

Topic		Replies	Views
A different point of view on building AI system Tangential Theories	45	4010	December 2, 2017
Graph Continuous Thought Machines Tangential Theories agi	6	207	July 16, 2025
Non-precise timing Tangential Theories	10	1273	October 18, 2017
Any questions for Jeff? Numenta Theory community	34	4019	March 22, 2017
Time Perception and Distortion: The Neuroscience of Subjective Time Numenta Theory	17	2256	September 22, 2016

Continuous Thought Machine

Related topics