So, I can’t go to sleep for a few hours without you bringing the party over here, is what you are saying ?
17 messages, really ?
Nah… great conversation guys
Now I have to decide whether I shall be nitpicky or not
Lemme see…
I think what @spin’s talking about with respect to eye movements is that ANNs used for vision seem perfectly capable to work without them for recognition. I’d agree with him that chemistry of receptors accustoming to current illumination level is not necessarily something the camera of a T800 would have to replicate.
Yet, until we know otherwise, it is not necessarily something we can leave out either, in a model trying to get a clue how it actually works.
Although, up to a point, we ourselves are perfectly able to massively parallel our way through image recognition without a saccade. A literate reader does not scan through letters and certainly does not saccade across every bar and loop of the calligraphy. Whole words are seemingly photographically recognized, per fixation.
@rhyolight, HTM 2016 ain’t much concerned about saccades either. What matters to the temporal thing of the model is that the input flows. You could as well fixate at a TV screen or through a train’s window. Now I don’t know of Numenta’s more recent SMI insights. I hope you guys tell us about them very soon.
I’m not necessarily aware of each and every paper on this as of 2018. I’m not sure we could say we know nothing of those, though.
I’m tempted to reply to that with two, orthogonal, yesbuts.
Yes, I think it is fair to say that the response of V1 simple cells is very specific and covers only a tiny subset of all possible mathematical functions we could dream of applying to a local area of an image, but…
- But that’s also quite precisely the reason why they are worth studying. Why did they wire to this particular choice of a function. And how is it relevant to an AGI substrate that they do so.
- But again simple cells are not the only cells in V1. You’ll find many others concerned about other aspects. Still a subset of all possible functions, hence still worth exploring how and why, though.
They marked out those based on rough limits for their responses to a stimulus, before testing inside those rough outlines with specific stimuli.
if you look at the following video from the timepoint I linked to (tried to… or about 18 minutes in, if that did not work)… I believe the insert on the bottom-right is a video archive of their actual protocol back in the day.
I think you’d agree that a convolution kernel being applied to a local square does not prevent its sensitivity to any specific subset of shapes within that square either.
Today for more detailed outlines we look at the wiring. Relationship between V1 dendrite and LGN axon makes inferring a precise receptive field in visual space quite a complicated matter, though.
Anyway. I expect that the wiring range is quite standard throughout V1 topology itself. But the fact that V1 retinotropy is distorted (giving much more space for fovea than periphery) means that a given wiring range translates to very different sizes in the visual space indeed according to eccentricity, as @Bitking pointed out.
Studying V1 could give us clues into that black box also. At least I hope it will : LGN and V1 form quite intricate loops. I’m always happy to ask @Bitking to the rescue on many of those interrogations.
I don’t believe cortex and those older parts would work the same at all. Retina to thalamus to V1 looks like a very delicate clockwork, designed by ‘evolution’. And they provide carefully tuned filters, surely very concretely helpful for processing visual data (center-surround, etc). But once in V1 we’re dealing with our elusive universal learner… quite a different matter.