ML and Deep Learning to automatically create AGI?

a representation of everything you have in memory about your grandma as a kind of schema which supports causality, constraints and other parts of intuitive physics (in the broad meaning of it)

This sounds a bit like the stapler example from the Frameworks paper - the idea that a system learns the behaviors of an object as part of its model, and it must generate behaviors to interact with the object to do so. Does that seem toward the idea you’re circling here @spin & @Paul_Lamb ?

1 Like

Exactly. When I refer to temporal information, I am not just talking about passive observation. Behaviors and their effect on the relationship between components of a composite also consist of spatial (potentially high dimensional) and temporal information. Behaviors can themselves be compressed into (2D) spatial encoding and become components of other composites.

1 Like

That’s a good example, but it’s not only about physical parts and behavior: abstract things are also connected, and not always dynamically.

1 Like

When we talk about objects in the context of HTM theory, we are not talking about only physical objects. If you buy into the idea that the cortex is running the same algorithm everywhere, then even highly abstract concepts are also considered objects.

3 Likes

For example, consider a hot cup of coffee. There are the relatively simple spatial relationships between the cup and its contents, as well as simple temporal/ behavior information (turn the cup, and the coffee pours out).

There are also more complex relationships. If you consider a temperature axis like [cold, hot], then there exists in the model, an abstract displacement on the temperature axis between the contents of the cup and the cup, and abstract behaviors, like the coffee cools over time, blowing on it makes it cool faster, etc. This abstract axis can be dealt with and encoded in exactly the same way as an axis in 3D space.

This can be applied to anything. We humans tend to talk about everything in spatial terms, and I think the reason is because that is how our brains process everything.

6 Likes

Fair warning - this is rather abstract:

I have been reading through this thread and see concepts described as unitary things.
If I understand the sensory hierarchy correctly each tranche of maps (layer? level?) compares and contrasts the stream and adds the output to the representation building in the association region. Part of the side effect of this is that the object is parsed as it traverses the maps. The features that are abstracted at a given map are clusters of whatever that maps specializes in.

I see that the end effect is a multiply connected manifold with each map-2-map connections being a possible portal, each pointing in some direction dictated by the content of what is being parsed. I know that this will sound bizare but I think of this as a big PVC pipe structure where the pipes are the fiber tracts and the joints are sort of like a channel tuner playing some program to the next joint. The finished product is a high level concept that represents the contents of perception. Early in one end of the visual pipe complex is a channel that signals red.

Somewhere in the same plumbing (a little higher up) is a texture property map. The contents of one of those nodes has been probed and visualized in this paper:


I suspect that this same basic technique can be used to probe some of the other map contents in the processing stream.
This process continues until it all terminates in the hub of the parietal lobe.

Similar things are going on in the temporal lobe and frontal lobe, with the lingua-franca being the hex-grid coding. The difference in the frontal lobe is that the high-level coding is the starting point and it is unfolded until it reaches the motor drivers along the central sulcus.

Summarizing: the contents of perception is the current collection of these parsed feature streams assembled in the association region as a stable hex-grid. The feature set that make up that representation are parsed into a distributed representation, both within a map and along the hierarchy of maps. Part of what makes that work is the learned connection patterns between maps. I see connections in this parse tree as a bi-directional so the features are reactivated by thinking of the object. The contents of the parse tree are built by experience and are not preloaded.

2 Likes

I’d agree with that. But the real bottom line is we’re trying to find a way to represent, (and continually represent/refactor, thus the need for autoencoding…) any given concept in the context of all that is known, and I think the real key to that is Sparse Distributed Representations.

SDR is essentially intelligence 101 because it serves as a base to make so many more things possible.

Speaking about spatial representations, it also seems quite fundamental. It seems that we’re always trying to define something in at least two contexts: what is the thing? And where (or in what context) is the thing?

That is, What is it made up of? What does it make up?

Feels like if you can answer those two questions you’ve got a pretty good handle on what it is.

2 Likes

This is a critical part of the architecture. Another way to think of this is that the visual cortex cannot know anything about the temperature of the coffee cup, just as the the somatosensory cortex cannot know anything about its color. These properties must be parsed and transported to another area of the cortext to be associated.

3 Likes

In considering practical ways to implement such an testing environment I’ve thought perhaps the best way is to create semantic encodings for a specific suite of Python’s AI Gym library.

What do you guys think?

The environment interface is exactly what I presume we’re after. For example, it is easy to instantiate an environment:

import gym
env = gym.make("CartPole-v1")
observation = env.reset()

And to put the bot into a sensory feedback mechanism with the environment is as simple as making a loop:

for _ in range(1000):
    env.render()
    # your agent here (this takes random actions)
    action = env.action_space.sample()  
    observation, reward, done, info = env.step(action)

if done:
    observation = env.reset()
env.close()

This is essentially the exact pattern we want: a sensorimotor feedback loop between environment and agent. The only thing missing is that the action space is not semantically encoded, and the representations of the environment state are not semantically encoded.

All we’d need to do to make AI gym fit our needs is wrap it’s input and output in engineered semantic representations (as if we’re creating sense-organs for the brain). I’ve never used HTM’s libraries, does anyone have experience with the various types of encoders? Maybe you could point me in the right direction.

On top of this process we’d have a genetic algorithm or something mixing and mutating the SUI seed (essentially the DNA that produces the fractal nature of the agent’s brain). But that’s further down the line, first we must select a wide variety of environments and generate semantic encodings of their input and outputs (state-representations and possible behaviors).

And by the way I think it’s not a problem if the possible actions for the environment are in fact binary or discrete, just so long as they can be semantically encoded as such.

Once an agent is found that can learn the entire suite of our selected, encoded AI gym environments we can make more complicated environments to hone it’s general intelligence capabilities.

1 Like

I would have thought that this would need to happen first (after building an initial toolbox of functions that can be mixed and matched and tweaked). Before hooking up an agent and turning it loose in an environment, wouldn’t you need the mechanism in place to support its evolution?

Are you imagining environment-specific sensors? Or do you mean encoding the environment from the perspective of the virtual character (for example, in case of a game like SMB, encoding the world to how it would look from Mario’s perspective)?

This ties back to the interface. Initially you could start out just passively model the environment and itself (using anomaly scoring perhaps as a simple fitness function). But ultimately, this needs to somehow be generating a high-level summary of

  1. Areas of interest in its internal model of the world (anomalies and areas that haven’t been recently observed, for example)
  2. Associated motor programs for addressing each area of interest

Then the “sub cortical” part of the agent would consume this summary, chose a motor program, and trigger the network of SUI’s to execute it.

1 Like

And that’s because proximity matters in the real world. Thus, relative proximity * similarity define compositional patterns: objects, processes, concepts. Problem is that the brain represents “what” and “where” in separate networks, which makes it very difficult to correlate them.

2 Likes

I would argue that representing “hot” and “red” in separate networks is a problem of similar difficulty. The system must be able to take scattered pieces of information and transport it to “hub” areas for association.

2 Likes

Yes, distributed representation is a horrible thing :). Biology has no alternative to that, but we do. That’s why I’ve been arguing for encapsulation, in very fundamental way.

1 Like

Yes you would. I was talking about development timeline - whats easiest and most prudent to do first is to get the environment-agent feedback loop setup.

Since I’m assuming we’ll choose environments that are fully observable I’m imagining that those two things are one and the same. For instance, the simplest environment I can imagine is a number line from 0 to 999, so let’s use that as an example here:

If I’m the agent and I get the semantic encoding representing 123, then that is my ‘location’ in the state space of the number line. If I perform some action and then get a new representation of 023 then the action I performed must have been essentially equal to “-100.”

In other words, in a fully observable world, the number of digits needed to (non-semantically) represent the whole universe always remains the same, though the symbols change, ie. 000 - 999.

Consider your SMB example in this context. It would be as if the agent sees the entire level from start to finish at all times, and then sees it’s actions affecting the movement of one of the items in the level - Mario. I’m not sure how that would be represented but it really doesn’t matter, what matters is that the env is fully observable.

Why do I think this? Because, as I see it we’re trying to make the intelligence’s job as easy as possible because if we’re to have any hope of converging on an efficient and effective SUI design we must be shooting for the simplest SUI design imaginable. The only way to do that is to reduce the complexity of its environment as much as possible. Once we have that we can turn up the complexity of the environments.

Imagine representing the world as partially observable, meaning, every given representation is only a small portion of the entire world. What you’ve essentially done is make a state space within a state space. Mario is traveling through the level, and the level is mutating as he performs behaviors. Let’s dumb it down for the poor agent, it’s working so hard as it is.

I think the bottom line I’ve beaten to death is: You want the agent to immediately see every possible change that has or hasn’t occurred from each timestep so it can most easily learn how it’s behaviors mutate the environment.

1 Like

I got your point. If we start with a collection of worlds that are simple enough that a hand-crafted small SUI network can measurably model the world and its own behaviors, then you can evolve it to reach some target efficiency level before moving on to having it generate its own behavior (versus passively observing), introducing slightly more complicated worlds, etc.

1 Like

So I guess I’ve got two questions:

1. How do we categorize environments?

An environment can very very small, such that it only has a few possible states and a few possible transitions between them. An environment that is enormous seems, somehow, categorically different because if you can’t simply memorize the whole thing in one node you must use a network and distribute the information throughout.

So, always staying within the context of fully observable; an environment can be large or small, It could also be simple or complex - meaning entropic or not. Perhaps there are as many states in the environment as particles in the universe (it’s large) but all the states relate to one other in uniform ways (it’s simple), in that case as soon as the uniform rule is deduced the agent no longer needs to remember much.

Are there any other ways to categorize environments? Any other spectrums along which they sit? I’m sure there are, but if not, then the suite of worlds we need to make at a minimum consist of 4:

a. Small, Simple (like a numberline)
b. Large, Simple (like a Rubik’s Cube, b/c it has symmetrical behaviors, it’s state-state transitions are simple.)
c. Small, Complex (like what?)
d. Large, Complex (like what?)

If we’re able to think of other categories for environments we’ll have to take the dot product of those and have an environment for each combination.

2. What are the necessary components of the Smallest Unit of Intelligence?

We must have preliminary list of it’s essential components in order to come up with a first SUI design that can be iterated over by the system.

The SUI is a microcosm of the whole network. So it has to have, at least in part, I would assume, all the faculties that the entire network can carry out.

I made a short list here:

But what components are most vital and most general in your guys’ opinion?

If we model the cortical column as the SUI, we know it must learn patterns of sensory data (spacio patterns and temporal sequences) within higher-level (scale hierarchy) sensory data contexts.

That structure alone is a good start, but perhaps not good enough. That’s basically my understanding of it which really only encompasses 2 of the ~6 layers. I’m not sure what else needs to be added.

1 Like

Maybe I am way off base here but your thread starts out ‘AGI’ and the environments you are describing is more suited to - say - a plant.

It may be that ‘plant’ is a local minima that one could consider as a counter-productive trap; there may not be any usable path from ‘plant’ to ‘AGI.’

5 Likes

In my opinion, the integration of multiple SUIs is important enough that it should be addressed from the start. The bare minimum to touch on each of the basic functions might look something like this (probably still too small)

image

2 Likes

I think these pertinent questions stem from the lack of a perfectly well-defined task. What exactly is the agent is being asked to do? I push this because it seems there are many key levels to a system which generates intelligent-looking behavior, such as:

  • the encoding system of the raw data (ex: an agents’ “eyes”, sensing a physical environment with sonar & lidar)

  • the decision-making system, which maps incoming inputs to motor commands (ex: the agents’ RL policy, attempting to best navigate the environment to achieve a goal, like reaching a location or retrieving an object)

  • the overall system goal guiding the RL policy (ex: the fitness function, evaluating how well its policy-driven movements are achieving the agents’ purpose)

I know the goal here is to create a more general agent, not specific to certain environments or task-types, but my hunch is that its best to start with a very well-defined and task that must navigate real-world complexity – like moving through a busy physical environment towards a goal.

I think this will help to ensure that we’re on the exact same page with the terms we’re using (what we mean by them), and to lay bare all the different things that need either be hard-coded or adaptively optimized.

I bet its really hard to adaptively optimize all the ‘levels’ of intelligent-looking behavior at once so it may be fruitful to highly simplify certain things at first, like the agents’ overall goal and range of potential movements.

I find myself in favor of an agent that’s simpler in its behaviors, but can successfully navigating complex & real-world-like environments towards useful goals. I think achieving that first and then scaling up the agent’s range of motions may be a practical way to go.

4 Likes

I’m surprised it’s so strange to so many people that the path to AGI should start as simply as possible. To me that is intuitive, how else ought it be done?

I suppose I’m working off the premise that the simplest unit of intelligence looks somewhat similar to the most complex unit of intelligence. Therefore if we can find the simplest we can iterate to find the deluxe model.

1 Like