Hello, @dorinclisu, I also believe that is how reinforcement learning is implemented in biology. Some time ago, I presented a structure similar to what you proposed [1]; reward signals altering permanence values. Dopamine is known to alter corticostriatal connections; the connections originating at cortical layers and targeting basal ganglia (striatum). The internal connections of cortical regions however are not modulated by dopamine to my knowledge.
If you wonder about the biological plausibility of your proposal, below are some of the biological references that suggest synaptic plasticity of connections between striatum and cortex are altered by D1 and D2 dopamine levels. [2]
The underlying neural plasticity hinges upon phasic dopamine release, signalling reward or its expectation (Montague et al. 1996; Schultz 1998, 2013), and acting mainly within the striatum of the BG to enhance or depress synaptic strength (Centonze et al. 2001; Reynolds and Wickens 2002).
Apart from their opposing actions, a second key feature of the direct and indirect pathways is their differential regulation by dopamine (Albin et al. 1989; Gerfen and Surmeier 2011). The source of dopaminergic input to the striatum is the substantia nigra pars compacta (SNc), which is fed by a reciprocal input from the striatum but also by external sources, and acts as a modulatory gateway to BG circuits (Schultz 1998). In addition to mediating long term plasticity, noted above, dopamine also has a short-term influence upon striatal activity; it enhances the excitability of dSPNs and has the opposite effect upon iSPNs
This momentary regulation of SPN activity monitors the tonic level of dopamine afferent discharge, and is complemented by plastic changes of synaptic strength regulated by phasic dopamine signals (transitory peaks and troughs in the rate of dopaminergic discharge that reflect the presence and absence of reward (Schultz 2013). Phasic activation of D1 and D2 receptors promotes LTP and LTD (long term potentiation and depression) of glutamatergic synapses upon dSPNs and iSPNs, respectively; moreover, these actions are contingent upon recent spiking history, such that dopamine gates LTP or LTD of a synapse depending on recent conjunctions of pre-and post-synaptic depolarisation (Shen et al. 2008; Paille et al. 2013).