In a ReLU neural network and similar there is an internal predicate (x>=0) which decides the switch state of the ReLU function.
f(x)=x if x>=0 is true. (connect.)
f(x)=0 if x>=0 is false. (disconnect.)
As mentioned in some other post there is a rather weird entanglement of switching decisions and construction of the wanted output of the network, the two not allowed as full separation as they might need. You are mixing together two different things and being optimistic that that is okay.
And maybe it is, I don’t know.
However you can try a detachment. You can substitute locality sensitive hash based predicates based on the input vector for the internally bound x>=0 predicates.
That removes a logical entanglement of switch states that may make training difficult, where one switch state depends on several prior ones.
With locality sensitive hashing the switch states depend directly on the input vector only, not each other.
Yet in a very simple test it seems to work fine:
View:
https://s6regen.github.io/detached/
Code:
The technical question is if such a system is able to find smart dot products through optimization that can be shared with locality sensitive hashing to give better generalization that simple associative memory. I don’t have a good line of reasoning to answer that yet, and the results of a practical experiment were not very clear. I did run a comparison with a net with a more conventional activation function and that did give much better generalization. Anyway I’ll leave it for the moment.