-
Notifications
You must be signed in to change notification settings - Fork 0
/
3-FICN.qmd
1482 lines (1357 loc) · 91.7 KB
/
3-FICN.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# A computational model of basal ganglia and its role in memory retrieval in rewarded visual memory tasks {#sec-chapter:BG}
\chaptermark{A computational model of basal ganglia and memory retrieval}
#### Abstract {.unnumbered}
Visual working memory tasks involve a network of cortical areas such as
inferotemporal, medial temporal and prefrontal cortices. We suggest here
to investigate the role of the basal ganglia in the learning of delayed
rewarded tasks through the selective gating of thalamocortical loops. We
designed a computational model of the visual loop linking the perirhinal
cortex, the basal ganglia and the thalamus, biased by sustained
representations in prefrontal cortex. This model learns concurrently
different delayed rewarded tasks that require to maintain a visual cue
and to associate it to itself or to another visual object to obtain
reward. The retrieval of visual information is achieved through thalamic
stimulation of the perirhinal cortex. The input structure of the basal
ganglia, the striatum, learns to represent visual information based on
its association to reward, while the output structure, the substantia
nigra pars reticulata, learns to link striatal representations to the
disinhibition of the correct thalamocortical loop. In parallel, a
dopaminergic cell learns to associate striatal representations to reward
and modulates learning of connections within the basal ganglia. The
model provides testable predictions about the behavior of several areas
during such tasks, while providing a new functional organization of
learning within the basal ganglia, putting emphasis on the learning of
the striatonigral connections as well as the lateral connections within
the substantia nigra pars reticulata. It suggests that the learning of
visual working memory tasks is achieved rapidly in the basal ganglia and
used as a teacher for feedback connections from prefrontal cortex to
posterior cortices.
## Introduction
During object-based visual search, target templates stored in visual
working memory (WM) can bias attentional processing in visual areas to
favorize the relevant objects [@desimone1995; @Woodman2007b]. Visual WM
can be investigated through a number of different tasks in rats,
primates or humans, among which change detection, recall procedures,
delayed matching to sample (DMS), delayed nonmatching to sample (DNMS)
or delayed pair-association (DPA) tasks are frequently used. These
experiments have allowed to shed light on the psychophysical mechanisms
involved in visual WM [@luck1997a] as well as to delineate the neural
substrates subserving these functions [@ranganath2006]. Visual WM has
several computational aspects: encoding of the relevant items
(potentially in an abstract manner), maintenance of the items through
time in face of distractors, retrieval of the sensory content of the
item, abstraction of the underlying rule. It faces both a structural
credit assignment problem (which item to store and retrieve) and a
temporal assignment problem (how to link encoding in WM with the delayed
delivery of reward).
Specific attention has been directed towards the prefrontal cortex which
is well-known to be involved in WM maintenance and manipulation in
various modalities [@fuster1971; @funahashi1989]. Prefrontal lesions do
not totally eliminate visual WM but impairs the ability to maintain it
during long delays or in front of distractors
[@Petrides2000; @DEsposito2006]. Neurons in PFC exhibit robust
object-specific sustained activities during the delay periods of visual
WM tasks like DMS or DNMS [@miller1996]. However the informational
content of WM-related activities in PFC is still unclear
[@Romanski2007]. Inferotemporal (IT) neurons have been shown to encode
object-specific information [@Nakamura1994] as they are located at the
end of the ventral visual pathway [@ungerleider1982]. They have been
shown to be critical for visual WM [@Fuster1981; @Petrides2000] and also
exhibit sustained activation during the delay period, even if their
responses can be attenuated or cancelled by intervening distractors
[@miller1993], what can be partly explained by feedback cortico-cortical
connections originating from PFC [@Fuster1985; @Webster1994].
The medial temporal lobe (MTL, composed of perirhinal - PRh -,
entorhinal - ERh - and parahippocampal - PH - cortices) also plays an
important also not essential role in visual WM. Compared to IT, a
greater proportion of neurons in PRh and ERh exhibit sustained
activation during the delay-period [@Nakamura1995] and are robust to
distractors [@suzuki1997]. They are especially crucial when visual
objects are novel and complex [@ranganath2005]. Particularly, PRh cells
are more strongly involved in visual recognition when it requires visual
WM processes [@lehky2007]. They are reciprocally connected with IT
neurons and can provide them with information about novelty or category
membership since they can rapidly encode relationship between visual
features [@Murray1999; @rolls2000], as well as the association of
objects to reward [@mogami2006]. @ranganath2006 provided a complete
account of the functional relationship between IT, PFC and MTL in visual
WM. He considers that the visual aspects of the remembered object are
maintained in the ventral pathway at various levels of complexity
(low-level features in V1 or V4, object-related representations in IT)
through sustained activation of cells. Top-down activation of these
neurons by MTL would provide them with information about novelty and
help to reconstruct a coherent mental image of the objects composing the
visual scene, thanks to the link between MTL and hippocampus. Top-down
activation by PFC helps the ventral stream to maintain representations
in face of distraction and also allows stimulus-stimulus associations
(like in the delayed pair-association task) in IT [@Gutnikov1997].
A structure that is absent in this scheme but that is nevertheless very
important in visual WM is the basal ganglia (BG), a set of nuclei in the
basal forebrain. Human patients with BG disorders (such as Parkinson’s
disease) show strong deficits in delayed response tasks [@Partiot1996].
Several experiments have recorded visual WM-related activities in
various structures composing the BG, especially the striatum (STR)
[@hikosaka1989; @Mushiake1995; @lewis2004; @Chang2007]. Almost all
cortical areas send projections to the input nuclei of BG (STR and the
subthalamic nucleus STN), while the output nuclei of BG (the internal
segment of globus pallidus GPi and the substantia nigra pars reticulata
SNr) tonically inhibit various thalamic nuclei, allowing selective
modulation of corticothalamic loops [@Parent1995a]. The BG are organized
through a series of closed loops, which receive inputs from segregated
cortical regions and project back to them quite independently (see
@Haber2003 for a review). The number and functional domain of these
loops is still an open issue
[@Alexander1986; @Lawrence1998; @Nambu2002], but two of them are of
particular relevance for our model. The executive loop involves the
dorsolateral part of PFC (dlPFC), the head of the caudate nucleus (a
region of the dorsal striatum), GPi-SNr and the mediodorsal nuclei of
thalamus (MD). The structures involved in this loop have all been shown
to be involved in WM processes in various modalities and provide a basis
for the maintenance and manipulation of items in cognitive tasks (see
@frank2001 for a review about the functional requirements of WM). The
visual loop involves the inferotemporal and extrastriate occipital
cortices, the body and tail of the caudate nucleus, SNr and the
ventral-anterior nucleus of the thalamus (VA)
[@Middleton1996; @Seger2008]. This loop is particularly involved in
visual categorization and visual discrimination, but also sends output
to premotor areas to link category learning with appropriate behavior.
In addition to IT neurons, the body of the caudate nucleus is involved
in visual WM tasks, what suggests a role of the entire visual loop in
visual WM [@Levy1997].
What remains unknown is how these two loops can interact together in
order to subserve visual WM functions in the context of efficient
behavior. Previous models have particularly addressed the updating of
working memory content as part of the executive BG loop (e.g. @Brown1999
or @OReilly2006). We here focus on how such memory content can be used
to bias the visual loop allowing for a goal-directed memory recall in
the context of rewarded tasks such as DMS, DNMS or DPA. Among the
different mechanisms by which two BG loops can interact, we focus on the
overlapping projection fields of cortical areas: a cortical area sends
principally projections to a limited region of the striatum, but its
axons send collaterals along the surface of the striatum. In particular,
the body of the caudate, which is part of the visual loop and
principally innervated by inferotemporal projection neurons, also
receives connections from the dorsolateral prefrontal cortex
[@Selemon1985]. This model is thus composed of the visual loop linking
PRh with BG and the thalamus, while the executive loop is reduced to
sustained activation in dlPFC which projects on the region of the
striatum belonging to the visual loop. The model is alternatively
presented with specific combinations of visual cues and tasks symbols
that allow the system to perform actions leading to the delivery of
reward (as proposed by @gisiger2006. Our emphasis is on the
reward-modulated self-organization of connectivity between distributed
populations. The model provides hypotheses about how sustained
representations in dlPFC can bias learning in the visual loop so that
object-related activities in the ventral visual pathway can be retrieved
through thalamic stimulation in the context of a particular cognitive
task to provide anticipatory top-down signals for the visual system, as
observed physiologically [@naya2003; @Takeda2005]. In particular,
self-organization in the model relies on the competitive selection of
relevant cortical representations in the output structures of the BG.
## Material and Methods
### Architecture of the model
Each structure used in this model is composed of a set of dynamical
neurons, whose membrane potential is governed by a time-dependent
differential equation and transformed into a mean firing rate through a
non-linear transfer function. These neurons therefore exchange a real
instantaneous value instead of spikes, as it saves considerably
computational costs and allows to use efficient learning rules that are
not yet available for spiking neurons. Although we do not capture some
biophysical details, this paradigm is sufficiently complex to show the
emergence of dynamic behaviors through the interaction of distributed
computational units [@Rougier2009]. The differential equation that rules
the evolution of the activity of each neuron is discretized according to
the Euler method with a time-step of $1$ ms and is evaluated
asynchronously to allow stochastic interactions between functional units
[@rougier2006].
Biological details gave us some insights on the choice of certain
parameters, such as the time constants for the different neurons, as we
know for example that striatal cells are faster than cortical cells
[@Plenz1996]. Other parameters have been set to bring the model into a
functionally meaningful range. Control simulations showed that minor
variations on their values do not change qualitatively the results
presented here.
The architecture of the model is depicted in @fig-ficn:model A. Visual
inputs are temporally represented in the perirhinal cortex (PRh), each
cell firing for a particular visual object. These perirhinal
representations project to the prefrontal cortex (dlPFC) where they are
actively maintained for the duration of the task. These sustained
activations in dlPFC are artificially controlled by a set of gating
signals, leaving unaddressed the temporal credit assignment problem. PRh
and dlPFC both project extensively to the caudate nucleus (CN), which
learns to represent them in an efficient manner according to the task
requirements. Depending on reward delivery in the timecourse of
learning, each active striatal cell learns to integrate perirhinal and
prefrontal information in a competitive manner due to inhibitory lateral
connections. This mechanism leads to the formation through learning of
clusters of striatal cells that represent particular combinations of
cortical information depending on their association to reward. These CN
cells send inhibitory projections to the SNr, whose cells are tonically
active and learn to become selective for specific striatal patterns.
This learning between CN and SNr is also dependent on reward delivery.
Learning of the lateral connections between SNr cells additionally
allows to limit the number of simultaneously inhibited SNr cells. These
cells in SNr tonically inhibit thalamic cells (VA) which have reciprocal
connections with PRh. The connections from SNr to VA and between VA and
PRh are not learned but focused (one-to-one connection pattern), meaning
that the inhibition of one SNr cell leads to the thalamic stimulation of
a unique cell in PRh. A dopaminergic cell (SNc) receives information
about the delivered reward (R) and learns to associate it with striatal
activities. Its signal modulates learning at the connections between
cortical areas (PRh and dlPFC) and CN, between CN and SNr, as well as
within SNr. We now present in detail each structure and the differential
equations followed by their neurons.
![(A) Architecture of the model. Pointed arrows denote excitatory connections and rounded arrows denote inhibitory ones. Circular arrows within an area represent lateral connections between the cells of this area. (B) Timecourse of the visual inputs presented to the network. Top: rewarded trials like DMS, DNMS or DPA. Bottom: delay conditioning.](img/ficn/vitay_figure_1.png){#fig-ficn:model}
### Perirhinal cortex
The input of our model is a high-level visual area with mnemonic
functions which is able to bias processing in the ventral visual stream.
In general, the area TE of the inferotemporal cortex is a potential
candidate, but we particularly focused on PRh, as it has been shown to
be preferentially involved in recognition tasks that require visual WM
[@lehky2007]. We previously designed a detailed computational model of
PRh that is able to learn object-related representations in clusters of
cells based on partial information [@Vitay2008]. These clusters linked
through lateral connections are able to exhibit sustained activation
when the dopamine (DA) level in the network is within an optimal range.
The visual information that they contain can also be easily retrieved
through a partial stimulation coming from the thalamus. We hypothesize
that this memory retrieval through thalamic stimulation under an
accurate level of DA can be a basis for the guidance of visual search.
Here, we reduced the size of PRh to $8$ cells, each of them representing
a particular object that is presented to the network (see @sec-ficn:tasks for the description of these objects). In our previous model,
PRh contained hundreds of cells and each object was represented by a
cluster of different cells. Each cell $i$ has a membrane potential
$m_i(t)$ and an instantaneous firing rate $u^{\text{PRh}}_i(t)$ which
are governed by the following equations:
$$
\tau \cdot \frac{d m_i(t)}{d t} + m_i(t) = V_i(t) + W^{\text{VA}}_{i} \cdot u^{\text{VA}}_i(t) + \displaystyle\sum_{j \in \text{PRh}} W^{\text{PRh}}_{i,j} \cdot u^{\text{PRh}}_j(t) + \epsilon(t)
$$ {#eq-ficn:mp-prh}
$$
u^{\text{PRh}}_i(t) = (m_i(t))^+
$$
where $\tau =20$ ms is the time constant of the cell, $V_i(t)$ its
visual input (see @sec-ficn:tasks) and $W^{\text{VA}}_{i} = 0.5$ the
weight of a connection coming from the corresponding thalamic cell whose
firing rate is $u^{\text{VA}}_i(t)$. $\epsilon(t)$ is an additional
noise whose value varies uniformly at each time-step between $-0.3$ and
$0.3$. The transfer function used for perirhinal cells is simply the
positive part of the membrane potential $()^+$. Each perirhinal cell
additionally receives inhibitory lateral connections from the seven
neighboring perirhinal cells with a fixed weight of
$W^{\text{PRh}}_{i,j} = -0.3$ to induce competition between the
perirhinal cells.
### Dorsolateral prefrontal cortex
We do not model explicitly the executive loop and rather use a very
simple WM representation in dlPFC, including mechanisms of updating and
resetting. Future work will address these questions in the context of WM
gating in the executive loop [@frank2001; @gruber2006]. The dlPFC is
here composed of 8 cells which keep track of activity in PRh through
temporal integration:
$$
\begin{aligned}
\tau \cdot \frac{d m_i(t)}{d t} & = & G(t) \cdot W^{\text{PRh}}_{i} \cdot ( u^{\text{PRh}}_i(t) - 0.5 )^+ \\
u^{\text{dlPFC}}_i(t) & = &
\begin{cases}
0 & \text{if $m_i(t) < 0$} \\
m_i(t) & \text{if $0 \leq m_i(t) \leq 1$} \\
1 & \text{if $m_i(t) > 1$}
\end{cases}
\end{aligned}
$$ {#eq-ficn:mp-pfc}
where $\tau =10$ ms is the time constant of the cell and $G(t)$ a gating
signal allowing the entry of an item in working memory. Each dlPFC cell
receives only one connection from a PRh cell with the weight
$W^{\text{PRh}}_{i} = 1.0$. As soon as the activity of a PRh cell
exceeds 0.5, it is integrated in the corresponding prefrontal cell,
whose activity saturates to a maximum value of $1.0$ thanks to the
transfer function and stays at this value even if the perirhinal
stimulation ends. The gating signal $G(t)$ is manually set to a value of
$1.0$ when objects have to be maintained in WM and to a value of $0.0$
otherwise. The activity of the prefrontal cells is manually reset to
zero at the end of a trial.
### Ventral-anterior thalamus
The portion of the ventral-anterior nucleus of the thalamus we consider
here is represented by eight cells that are reciprocally connected with
PRh. Its 8 cells send and receive a connection with only one perirhinal
cell, forming segregated thalamocortical loops. In a more biologically
detailed model, we would have to take into account the difference in the
number of cells between VA and PRh, as well the more diffuse pattern of
connections from thalamus to cortex. However, this simplification is
justified by our previous detailed model of PRh, where we have shown
that a thalamic cell can activate a functional cluster of cells
representing a single object [@Vitay2008]. The membrane potential and
firing rate of these thalamic cells are ruled by the following
equations:
$$
\begin{aligned}
\tau \cdot \frac{d m_i(t)}{d t} + m_i(t) & = W^{\text{PRh}}_{i} \cdot u^{\text{PRh}}_i(t) + W^{\text{SNr}}_{i} \cdot u^{\text{SNr}}_i(t) + M + \epsilon(t) \\
u^{\text{VA}}_i(t) & = (m_i(t))^+
\end{aligned}
$$ {#eq-ficn:mp-va}
where $\tau = 15$ ms and $M = 0.8$. In addition to the connection coming
from one PRh cell with a weight of $W^{\text{PRh}}_{i} = 0.5$, a
thalamic cell also receives an inhibitory connection from one cell of
SNr with a weight of $W^{\text{SNr}}_{i} = -0.7$.
### Caudate nucleus
The caudate nucleus of the striatum learns to represent the cortical
information in PRh and dlPFC in an efficient manner based on
dopaminergic signaling of reward-related information in SNc. Although
some evidences suggest that the DA level can even influence the firing
rate of striatal cells [@Nicola2000], we here exclusively focus on the
effect of DA on the synaptic learning of corticostriatal connections
[@DiFilippo2009]. The striatum is mostly composed of medium spiny
neurons that integrate cortical information and directly inhibit several
structures such as the substantia nigra or the globus pallidus. These
cells have also lateral inhibitory connections, either directly or
through fast-spiking interneurons [@Tepper2008]. CN contains here 64
cells ruled by the following equations:
$$
\begin{aligned}
\tau \cdot \frac{d m_i(t)}{d t} + m_i(t) & = \displaystyle\sum_{j \in \text{Cx}} W^{\text{Cx}}_{i,j}(t) \cdot u^{\text{Cx}}_j(t) + \displaystyle\sum_{j \in \text{CN}} W^{\text{CN}}_{i,j} \cdot u^{\text{CN}}_j(t) + M + \epsilon(t) \\
u^{\text{CN}}_i(t) & = (m_i(t))^+
\end{aligned}
$$ {#eq-ficn:mp-cn}
where $\tau = 10$ ms and $M = 0.3$. Each striatal cell receives
inhibitory lateral connections from the 63 other striatal cells with a
weight of $W^{\text{CN}}_{i,j} = -0.2$. The corticostriatal connections
$W^{\text{Cx}}_{i,j}(t)$ coming either from PRh or dlPFc are learned
according to a homeostatic covariance learning rule:
$$
\begin{aligned}
\eta \cdot \frac{d W^{\text{Cx}}_{i,j}(t)}{d t} & = ( \text{DA}(t) - \overline{\text{DA}}) \cdot (u^{\text{CN}}_i(t) - \overline{\text{CN}} )^+ \cdot (u^{\text{Cx}}_j(t) - \overline{\text{Cx}}) \nonumber\\
& - \alpha_i(t) \cdot ((u^{\text{CN}}_i(t) - \overline{\text{CN}} )^+ )^2 \cdot W^{\text{Cx}}_{i, j}(t)
\end{aligned}
$$ {#eq-ficn:weightcn}
where $\eta = 100$ is the rate of learning, $\text{DA}(t)$ represents
the synaptic level of DA (considered equal to the activity of the SNc
cell), $\overline{\text{DA}}$ the baseline activity of the SNc cell,
$u^{\text{CN}}_i(t)$ the firing rate of the striatal cell,
$\overline{\text{CN}}$ the mean firing rate of the CN cells,
$u^{\text{Cx}}_j(t)$ the firing rate of the cortical cell,
$\overline{\text{Cx}}$ the mean firing rate of the considered cortical
area and $\alpha_i(t)$ a cell-dependent regularization factor. The
weights are randomly initialized with a value between $-0.1$ and $0.1$.
The first part of the right term of @eq-ficn:weightcn is a classical
Hebbian learning rule (correlation between the activities of the
presynaptic and postsynaptic cells) modulated by the DA level. The
positive function applied to the striatal activity ensures that only the
cells which are significantly activated compared to the rest of the
population will update their selectivity for cortical patterns. The
exact influence of DA on corticostriatal learning is still a matter of
debate and depends on the type of dopaminergic receptor (D1 or D2)
involved, the state of the membrane potential of the striatal cell (“up”
and “down” states) and on the cortical patterns [@Calabresi2007]. We do
not model in detail these mechanisms and consider that a phasic burst of
DA (transient activity of the SNc cell above its baseline) globally
favorizes long-term potentiation (LTP) of corticostriatal synapses,
while DA depletion (activity below baseline) globally induces long-term
depression (LTD) of the same synapses [@Reynolds2000].
The second part of the right term of @eq-ficn:weightcn performs a
homeostatic regularization of the corticostriatal synapses. Its shape is
similar to the classical Oja learning rule [@Oja1982] to avoid an
infinite increase of the weight values, but the difference is that the
regularization factor $\alpha_i(t)$ is not fixed but varies with the
activity of the cell [@Vitay2008]. Homeostatic plasticity allows cells
to adapt their learning behavior to ensure stability [@turrigiano2004].
In our case, we want to avoid that the striatal cells fire too much in
order to save energy, by scaling down proportionally the weights of all
the connections. $\alpha_i(t)$ therefore becomes positive when the
firing rate of the cell exceeds a defined threshold $u^\text{MAX}$:
$$
\begin{aligned}
\tau \cdot \frac{d \alpha_{i}(t)}{d t} + \alpha_i(t) = (u^{\text{CN}}_i(t) - u^\text{MAX} )^+
\end{aligned}
$$ {eq-ficn:alphanacc}
with $\tau = 20$ ms and $u^\text{MAX} = 1.0$. In addition to dynamically
and locally normalizing the afferent connections to the cells, this
homeostatic regularization term also allows to sharpen the selectivity
of the cell. Homeostatic plasticity has been observed in the nucleus
accumbens, a part of the striatum [@Ishikawa2009].
### Substantia nigra pars compacta
The dopaminergic cells contained in SNc have the property to respond to
the delivery of unexpected rewards by a phasic burst of activity above
baseline [@Mirenowicz1994]. However, in conditioning tasks, the
amplitude of this response to primary rewards gradually decreases
through learning and is transferred to the appearance of the conditioned
stimulus [@Pan2005]. In addition, when reward is omitted, these
dopaminergic cells show a phasic depletion of activity (below baseline)
at the time reward was expected [@Schultz1997]. Several theories have
tried to explain this behavior related to reward expectation, including
an analogy with the error signal of the temporal difference (TD)
algorithm of reinforcement learning [@Suri1999] or more biologically
detailed models [@Brown1999; @OReilly2007]. The TD analogy considers
that DA phasic activation or depletion at the time of reward delivery or
conditioned stimulus appearance are due to a unique mechanism. The more
biologically detailed approaches contrarily highlight the role of
afferent structures in the different components of this behavior: the
phasic activation to primary rewards may be due to excitatory
connections coming from the pedunculopontine tegmental nucleus, and its
amplitude is gradually decreased by the learning of the reward
expectation through inhibitory connections coming from the striatum. In
these models, the DA phasic activation for the appearance of a
conditioned stimuli is provoked by different mechanisms than for the
delivery of primary rewards. The depletion in DA activity when reward is
omitted is controlled by an external timing mechanism, presumably
computed by an intracellular calcium-dependent mechanism in striatal
cells [@Brown1999] or by an external signal computed in the cerebellum
[@OReilly2007]. We followed the assumptions of these models, but did not
model explicitly this timing signal.
We used only one cell in SNc, which receives information about the
received reward $R(t)$ and learns to predict its association with
striatal representations through learnable inhibitory connections. The
activity of this cell is ruled by the following equations:
$$
\begin{aligned}
\tau \cdot \frac{d m(t)}{dt} + m(t) & = R(t) + P(t) \cdot \displaystyle\sum_{j \in \text{CN}} W^{\text{CN}}_{j}(t) \cdot u^{\text{CN}}_j(t) + \overline{\text{DA}} \\
\text{DA}(t) & = (m(t))^+
\end{aligned}
$$ {#eq-ficn:mp-snc}
where $\tau = 10$ ms, $\overline{\text{DA}} = 0.5$. The reward $R(t)$
(set to 0.5 when received, 0.0 otherwise) and the timing of its
occurrence $P(t)$ (set to 1.0 when expected, 0.0 otherwise) are external
to the neuronal model. When reward is delivered, $R(t)$ will drive the
activity of the cell above its baseline but this effect will be reduced
by the learning of the inhibitory connections between the striatum and
SNc. When reward is expected but not delivered, the striatal inhibition
will force the cell to exhibit an activity below baseline. The
connections between CN and SNc are learned according to the following
rule:
$$
\begin{aligned}
\eta \cdot \frac{d W^{\text{CN}}_{j}(t)}{d t} & =& - f( \text{DA}(t) - \overline{\text{DA}} ) \cdot (u^{\text{CN}}_j(t) - \overline{\text{CN}})^+
\end{aligned}
$$ {#eq-ficn:weightsnc}
$$\begin{aligned}
f(x) & = &
\begin{cases}
x & \text{if $x > 0$} \\
5 \cdot x & \text{else.}
\end{cases}
\end{aligned}
$$ {#eq-ficn:fsnc}
where $\eta = 10000$. The weights are initialized with a value of $0.0$,
so that striatal representations have initially no association to
reward. When $\text{DA}(t)$ is above baseline (reward has been
delivered), the inhibitory connections are further decreased, which
means that the striatal representation increases its associative value.
When $\text{DA}(t)$ is below baseline (reward has been omitted), the
same striatal representation decreases its association to reward. This
dopaminergic signal is used to modulate learning in CN and SNr.
### Substantia nigra pars reticulata
The output nuclei of the BG (GPi and SNr) have the particularity to be
tonically active (with an elevated firing rate of 25 Hz at rest and
pause in firing when inhibited by striatal activity). They send
inhibitory projections to ventral thalamic nuclei as well as various
subcortical structures such as the superior colliculi. The SNr cells are
selective for particular motor programs and can disinhibit various
thalamocortical loops [@Chevalier1990]. Their selectivity is principally
due to the inhibitory connections originating from the striatum and GPe,
but they also receive excitatory inputs from the subthalamic nucleus.
However, the SNr cells also tonically inhibit each other, with a
particular connectivity pattern suggesting they may subserve an
important functional role [@Mailly2003]. When a SNr cell is inhibited by
striatal activation, it stops inhibiting the other SNr cells, who
consequently increase their firing rate and inhibit more strongly their
efferent thalamic cells. Inhibitory connections within SNr may therefore
help focusing on the disinhibition of the desired thalamocortical loop
by suppressing the competing other loops [@Gulley2002]. Instead of
considering the inhibitory effect of high nigral activity, we modeled
this competition between SNr cells by an excitatory effect of low nigral
activity, what is functionally equivalent. The 8 cells in SNr evolve
according to the following equations:
$$
\begin{aligned}
\tau \cdot \frac{d m_i(t)}{d t} + m_i(t) & = \displaystyle\sum_{j \in \text{CN}} W^{\text{CN}}_{i,j}(t) \cdot u^{\text{CN}}_j(t) + \displaystyle\sum_{j \in \text{SNr}} W^{\text{SNr}}_{i,j}(t) \cdot (M - u^{\text{SNr}}_j(t) )^+ + M + \epsilon(t) \nonumber\\ & &\\
u^{\text{SNr}}_i(t) & =
\begin{cases}
0 & \text{if $m_i(t) < 0$} \\
m_i(t) & \text{if $0 \leq m_i(t) \leq M$} \\
\displaystyle\frac{1}{1 + e^{-\frac{m_i(t) - M}{20}}} + 0.5 & \text{if $m_i(t) > M$}
\end{cases}
\end{aligned}
$$ {#eq-ficn:mp-snr}
where $\tau = 10$ ms, $M = 1.0$ and $\epsilon(t)$ is an additional noise
randomly picked between $-0.3$ and $0.3$. The excitatory connections
from neighboring SNr cells are active when their corresponding activity
is below baseline. The transfer function ensures that activities
exceeding $M$ saturate to a value of 1.5 with a sigmoidal shape. The
inhibitory connections originating in CN are learned according to an
equation similar to @eq-ficn:weightcn. Even if little is known about
synaptic learning in SNr, the strong dopaminergic innervation of nigral
cells [@Ibanez-Sandoval2006] makes it reasonable to hypothesize that DA
modulates the learning of striatonigral connections in a way similar to
the corticostriatal ones.
$$
\begin{aligned}
\eta^{\text{inh}} \cdot \frac{d W^{\text{CN}}_{i,j}(t)}{d t} & = f(\text{DA}(t) - \overline{\text{DA}}) \cdot g( \overline{\text{SNr}} - u^{\text{SNr}}_i(t)) \cdot (u^{\text{CN}}_j(t) - \overline{\text{CN}})^+ \\
& - \alpha^{\text{inh}}_i(t) \cdot ( (\overline{\text{SNr}} -u^{\text{SNr}}_i(t) )^+ )^2 \cdot W^{\text{SNr}}_{i, j}(t)
\end{aligned}
$$ {#eq-ficn:weightgpi}
$$
f(x) =
\begin{cases}
x & \text{if $x > 0$} \\
10 \cdot x & \text{else.}
\end{cases}
$$ {#eq-ficn:fgpi}
$$
g(x) = \displaystyle\frac{1}{1 + e^{-\frac{x}{20}}} - 0.5
$$ {#eq-ficn:ggpi}
$$
\tau^{\text{inh}}_{\alpha} \cdot \frac{d \alpha^{\text{inh}}_{i}(t)}{d t} + \alpha^{\text{inh}}_i(t) = K^{\text{inh}}_{\alpha} \cdot ( m_i(t) )^-
$$
where $\eta^{\text{inh}} = 500$, $\overline{\text{SNr}}$ is the mean
activity of all the cells in SNr, $\tau^{\text{inh}}_{\alpha} = 10$ ms,
$K^{\text{inh}}_{\alpha} = 2.0$ and $()^-$ is the negative part of the
membrane potential. The weights are randomly initialized between $-0.15$
and $-0.05$ and later restricted to negative values. DA depletion (below
baseline) has been given a greater influence in the learning rule
through the $f()$ function, because at the beginning of learning DA
depletion has a much smaller amplitude than the DA bursts. Contrary to
the classical Hebbian learning rule, the postsynaptic activity
influences here the learning rule through a sigmoidal function $g()$,
what makes it closer to the BCM learning rule [@Bienenstock1982].
Similarly to BCM, there is a threshold (here the mean activity of the
nuclei) on the postsynaptic activity that switches the learning rule
from LTD to LTP. This learning rule is meant to increase the selectivity
of each SNr cell regarding to its neighbors as well as the
signal-to-noise ratio in the population. Another way for the nigral
cells to increase their selectivity is competition through their lateral
connections. There are two different learning rules used depending on
whether the DA level is above or below baseline. When DA is above its
baseline, the lateral connections are updated according to the following
equation:
$$
\begin{aligned}
\eta^{\text{lat}} \cdot \frac{d W^{\text{SNr}}_{i,j}(t)}{d t} & = (\text{DA}(t) - \overline{\text{DA}}) \cdot ( \overline{\text{SNr}} - u^{\text{SNr}}_i(t))^+ \cdot ( \overline{\text{SNr}} - u^{\text{SNr}}_j(t))^+ \\
& - \alpha^{\text{lat}}_i(t) \cdot ( (\overline{\text{SNr}} -u^{\text{SNr}}_i(t) )^+ )^2 \cdot W^{\text{SNr}}_{i, j}(t)
\end{aligned}
$$ {#eq-ficn:latgpipos}
where $\eta^{\text{lat}}= 500$. The weights are initially set to $0.0$.
This rule is similar to a classical anti-Hebbian learning, as it
favorizes the competition between two cells when they frequently have
simultaneously low firing rates. In the case of a DA depletion, an
important feature of the model is that the symmetry of the lateral
connections between two inhibited cells has to be broken. DA depletion
has then a punishing effect on the most inhibited cells, which will
later receive much more excitation from previously moderately inhibited
cells:
$$
\begin{aligned}
\eta^{\text{lat}} \cdot \frac{d W^{\text{SNr}}_{i,j}(t)}{d t} & = (\overline{\text{DA}} - \text{DA}(t)) \cdot \sqrt{( \overline{\text{SNr}} - u^{\text{SNr}}_i(t))^+} \cdot ( \overline{\text{SNr}} - u^{\text{SNr}}_j(t))^+ \\
& - \alpha^{\text{lat}}_i(t) \cdot ( (\overline{\text{SNr}} -u^{\text{SNr}}_i(t) )^+ )^2 \cdot W^{\text{SNr}}_{i, j}(t)
\end{aligned}
$$ {#eq-ficn:latgpineg}
In both cases, two simultaneously inhibited cells will increase their
reciprocal lateral connections. However, in the case of DA depletion,
the square root function applied to the postsynaptic activity breaks the
symmetry of the learning rule and the most inhibited cell will see its
afferent lateral connections relatively more increased than the other
cells. Thus, the inhibited cells which won the competition through
lateral connections but provoked a DA depletion will be more likely to
loose competition at the next trial. The effect of these asymmetric
learning rules will be presented in section @sec-ficn:competitionsnr, where we
will show that they are able to eliminate distractors. Both learning
rules use the same equation for the updating of the regularization
factor:
$$\begin{aligned}
\tau^{\text{lat}}_{\alpha} \cdot \frac{d \alpha^{\text{lat}}_{i}(t)}{d t} + \alpha^{\text{lat}}_i(t) & = & K^{\text{lat}}_{\alpha} \cdot ( m_i(t) - M)^+
\end{aligned}
$$ {#eq-ficn:alphagpilat}
where $\tau^{\text{lat}}_{\alpha} = 10$ ms and $K^{\text{lat}}_{\alpha} = 1.0$.
### Experiments {#sec-ficn:tasks}
In order to test the ability of our model to perform visual WM tasks, we
focused on three classical experimental paradigms: the delayed
matching-to-sample (DMS), the delayed nonmatching-to-sample (DNMS) and
the delayed pair-association (DPA) tasks. These three tasks classically
consist in presenting to the subject a visual object (called the cue),
followed after a certain delay by an array of objects, including a
target towards which a response should be made (either a saccade or a
pointing movement or a button press). In DMS, the target is the same
object as the cue; in DNMS, the target is the object that is different
from the cue; in DPA, the target is an object artificially but
constantly associated to the cue. These three tasks are known to involve
differentially IT, MTL, PFC and BG
[@Sakai1991; @Elliott1999; @Chang2002].
Similarly to the mixed-delayed response (MDR) task of @gisiger2006, we
want our model to acquire knowledge about contextual information,
allowing it to learn concurrently these three tasks with the same cued
visual objects. We therefore need to provide the network with a symbol
specifying which task has to be performed. The meaning of this symbol is
however initially not known by the model and must be acquired through
the interaction within the tasks. The top part of @fig-ficn:model b shows
the time course of the visual inputs presented to the network during a
trial. Each trial is decomposed into periods of 150 ms. During the first
period, a cue is presented to the network, followed by a delay period
without visual stimulation. A visual object representing which task to
perform (DMS, DNMS or DPA) is then presented, followed by the same delay
period. During this presentation phase, the signal $G(t)$ in @eq-ficn:mp-pfc is set to 1.0 to allow the sustained activation in dlPFC of
these two objects.
In the choice period, two objects are simultaneously presented to the
network: the target (whose identity is defined by the cue and the task
symbol) and a distractor chosen randomly among the remaining cues. At
the end of this period, the response of the network is considered to be
performed, and reward is given accordingly through a probabilistic rule
during the following reward period. For the entire duration of this
reward period, the signal $R(t)$ in @eq-ficn:mp-snc is set to 0.5 if
reward is given and to 0.0 otherwise. $P(t)$ is set to 1.0, denoting
that reward is expected to occur. This reward period is followed by
another delay period, the activities in dlPFC being manually reset to
their baseline, allowing the network to go back to its resting state
before performing a new trial.
In these experiments, we use four different cues (labelled A, B, C and
D) and three task symbols (DMS, DNMS and DPA) that stimulate each a
different cell in PRh. The corresponding cells will therefore be
successively activated according to the timecourse of the trial
described on the top part of @fig-ficn:model B. In the Results section, we
will only consider subsets of combinations of cues and tasks. For
example, we define DMS-DNMS\_AB as a combination of four different
trials: A followed by DMS (A+DMS), A followed by DNMS (A+DNMS), B
followed by DMS (B+DMS) and B followed by DNMS (B+DNMS). These four
different trials are randomly interleaved during the learning period. In
the DMS trials, the target of the task is the same as the cue, the
distractor being chosen in the remaining possible cues. In the DNMS
trials, the target is the object that is different from the cue. In the
DPA task, the target is an object artificially associated to the cue. In
DMS-DPA\_AB, the target of the trial A+DPA is C and the one of B+DPA is
D.
Each PRh cell is stimulated by its corresponding visual object by
setting the signal $V_i(t)$ in @eq-ficn:mp-prh to a value of 1.0 during
the whole period. In the choice period, $V_i(t)$ is limited to 0.5 for
both cells (to mimic competition in the lower areas). To determine the
response made by the system, we simply compare the activities of the two
stimulated PRh cells at the end of the choice period. If the activity of
the cell representing the target is greater than for the distractor, we
hypothesize that this greater activation will feed back in the ventral
stream and generate an attentional effect that will guide a saccade
toward the corresponding object [@Hamker2004a; @Hamker2005]. We assume
that this selection is noisy, what is modeled by introducing a
probabilistic rule for the delivery of reward that depends on the
difference of PRh activity for the two presented stimuli.
If we note $u^{\text{target}}$ the activity of the PRh cell representing
the target at the end of the choice period and $u^{\text{dist}}$ the
activity of the cell representing the distractor, the signal $R(t)$ in
@eq-ficn:mp-snc has the following probability to be delivered during the
reward period:
$$
\mathcal{P}(R) = 0.5+ u^{\text{target}} - u^{\text{dist}}
$${#eq-ficn:reward}
This probability is of course limited to values between 0.0 and 1.0.
When the activities of the two cells are equal, reward is delivered
randomly, as we consider that a saccade has been performed randomly
towards one of the two objects, as the feedback from PRh to the ventral
pathway is not sufficiently distinct to favorize one of the two targets.
When the activity of the target cell becomes relatively higher, the
probability of executing the correct saccade and receiving reward is
linearly increased. When reward is delivered, the signal $R(t)$ has a
value of 0.5 during the whole reward period, whereas it is set to 0.0
otherwise. We do not consider here the influence of rewards with
different amplitudes.
In delay conditioning, reward is delivered randomly with a fixed
probability during the presentation of a visual object (called X). The
timecourse of this task is depicted on the bottom part of @fig-ficn:model B.
This task is described in @sec-ficn:traceconditioning to study the
effect of the probability of reward delivery on striatal representations
and reward prediction in SNc.
In @sec-ficn:numbercells, we will study the influence of the number of
cells in SNr on the performance of the network. While this number is
equal to 8 in the previous experiments, we vary it here from 6 to 16.
When the number of cells in SNr exceeds 8, we simply added cells in SNr
which receive striatal inhibition and compete with the others, but which
do not inhibit any thalamic cell. When there is only 6 cells, we
suppressed in SNr and VA the cells corresponding to the objects DPA and
X, which are not used in this experiment.
## Results
### Concurrent learning of the different tasks {#sec-ficn:concurrentlearning}
![Different success rates. (A) Mean value and standard deviation of the last incorrect trial during learning of 50 randomly initialized networks for different combinations of cues and tasks: 1) DMS-DNMS\_AB; 2) DMS-DPA\_AB; 3) DMS-DNMS\_ABC; 4) DMS\_ABCD; 5) DNMS\_ABCD; 6) DPA\_ABCD. (B) Average success rate of 50 networks presented with DMS-DNMS\_AB. (C) Success rate of a particular network which learned DMS-DNMS\_AB, but computed only on the trials composed of A as a cue followed by DNMS as a task symbol.](img/ficn/vitay_figure_2.png){#fig-ficn:tasks-result}
@fig-ficn:tasks-result A shows the learning behavior of the model when
different combinations of tasks are presented. Each network was fed 1000
times with randomly alternated trials. The Y-axis represents the rank of
the last trial during the learning sequence where the network produced a
incorrect answer, which is a rather conservative measurement of
behavior. After this last mistake, the performance of all networks are
stable, even when more than 1000 trials are presented as further tests
have shown. We represent here the performance of different combinations
of tasks: DMS-DNMS\_AB, DMS-DPA\_AB, DMS-DNMS\_ABC, DMS\_ABCD,
DNMS\_ABCD and DPA\_ABCD. For each combination of tasks, we used fifty
different networks that were initialized randomly. One can notice that
the different networks learn at very variable speeds, as shown by the
standard deviation. For example, for the DMS-DNMS\_AB task, some
networks converged after 200 different trials whereas a few others
needed 800 trials, what denotes the influence of initialization as well
as the one of noise. The only significant difference between the
combinations of tasks is that DMS-DNMS\_AB is learned faster than
DMS-DNMS\_ABC, DMS\_ABCD, DNMS\_ABCD and DPA\_ABCD (two-sample K–S test,
$P < 0.05$). However, this can be simply explained by the fact that
DMS-DNMS\_ABC uses six different trials instead of four for DMS-DNMS\_AB
(C+DMS and C+DNMS have to be learned at the same time), and that
DMS\_ABCD, DNMS\_ABCD and DPA\_ABCD use a bigger set of possible
distractors during the choice period. We will investigate in @sec-ficn:competitionsnr
the influence of distractors on performance. The
distributions of the numbers of trials needed to learn for each
combination have no significant shape, though a Gaussian fit can not be
rejected ($\chi^2$-test, $0.2 \leq P \leq 0.6$).
@fig-ficn:tasks-result B shows the average success rate of 50 networks
presented with the DMS-DNMS\_AB task. The success rate of a network is
computed after each trial during learning as the percentage of rewarded
trials for the last ten trials: if the last ten trials were rewarded,
the success rate is 100%, if only one trial was not rewarded, the
success rate is 90% and so on. All networks have reached the maximum
success rate before the $800^{th}$ trial, but some only need 200 trials.
At the beginning of learning, the success rate is 50%, as the network
does not really select a response and reward is given randomly according
to the probabilistic rule of reward we use. This success rate quickly
increases to a high value in around 300 trials, followed by a more flat
phase where the competition in SNr temporarily deteriorates the
performance of the networks.
This flattening of the average success rate can be explained by
observing @fig-ficn:tasks-result C. We represent the success rate of a
particular network which learned DMS-DNMS\_AB, but this success rate is
plotted for analysis purpose only from trials composed of A as a cue
followed by DNMS as a task symbol. We see that the network performs this
task accurately after only 40 trials and stays at this maximum until it
makes a mistake shortly before the $80^{th}$ trial. We will later show
that this temporary decrease in performance is due to the late
involvement of selection in SNr. To quantify this behavior, we examined
the success rates of the 50 networks used in @fig-ficn:tasks-result B and
decomposed them regarding to the four types of trials involved in the
learning phase (A followed by DMS and so on). We found that 32.5% of
trial-specific networks showed this type of behavior, by reaching
success in at least ten successive trials before performing again a
mistake. In average, these trial-specific networks reach stable success
after only 14 trials and stay successful for 17 trials before performing
a mistake. They then need on average 47 other trials before reaching
definitely 100% success (last mistake after the $78^{th}$ trial). In
comparison, the other trial-specific networks (67.5%) perform their last
mistake at the $64^{th}$ trial on average, which is significantly
shorter ($\chi^2$-test, $P \leq 0.05$).
### Temporal evolution of the activities after learning
![Temporal evolution of the activity of several cells in a network which successfully learned DMS-DNMS\_AB. The activities are plotted with regard to time (in ms) during a trial consisting of A as a cue, DNMS as a task symbol and B as a target. The first row represents the activities of three cells in PRh which are respectively selective for A (blue line), DNMS (red line) and B (green line). The second row shows the activities of two cells in CN, one being selective for the pair A+DMS (blue line), the other for the pair A+DNMS (green line). The third row represents the activities of three cells in SNr which are respectively selective for A (blue line), DNMS (red line) and B (green line). The fourth row represents the activities of three cells in VA which are respectively selective for A (blue line), DNMS (red line) and B (green line).](img/ficn/vitay_figure_3.png){#fig-ficn:timecourse}
@fig-ficn:timecourse shows the temporal evolution of some cells of a
particular network that successfully learned DMS-DNMS\_AB. The learning
phase consisted of 1000 randomly interleaved trials. At the end of
learning, the network was able to generate systematically correct
responses which all provoked the delivery of reward. The selectivity of
CN cells developed to represent the different combinations of cues and
task symbols through clusters of cells (see @sec-ficn:traceconditioning). SNr cells also became selective for some of these
clusters and the learned competition between them ensured that only one
SNr cell can be active at the same time in this context. The temporal
evolution of the activity of the cells on @fig-ficn:timecourse was recorded
during the course of a trial using A as a cue and DNMS as a task symbol.
However, this pattern is qualitatively observed in every network that
successfully learned the task and similar activation patterns occur for
different tasks. The cells which are not shown on this figure do not
exhibit significant activity after learning.
When the object A is presented as a cue in PRh (and simultaneously
enters the working memory in dlPFC), it excites a cluster of cells in CN
which, in this example, represents the couple A+DMS (blue line). This
cluster inhibits the cell representing A in SNr which in turn stops
inhibiting the corresponding cell in VA. The thalamocortical loop is
then disinhibited and the two cells representing A in PRh and VA excite
each other. After 150 ms, the stimulation corresponding to the cue ends
and the activity of the cells representing A slowly decreases to their
baseline. At 300 ms, the object specifying the task (DNMS) stimulates a
cell in PRh and enters WM in dlPFC. This information biases processing
in CN so that a new cluster representing A+DNMS gets activated (green
line) and disinhibits through SNr the cell in VA representing the object
B, which is the target of the task. At 600 ms, when both objects A
(distractor) and B (target) stimulates PRh, the perirhinal cell A only
receives visual information, while the cell B receives both visual and
thalamic stimulation. Consequently, its activity is higher than the cell
A and will be considered as guiding a saccade toward the object B. The
cell representing DNMS in SNr never gets inhibited because it has never
been the target of a task during learning. The corresponding thalamic
cell only shows a small increase during the presentation of the object
in PRh because of the corticothalamic connection. In the Discussion, we
will come back on the fact that, in this particular example, the system
has learned to select B instead of avoiding A as it should do in a DNMS
task.
Three features are particularly interesting in this temporal evolution
and have been observed for every network used in @sec-ficn:concurrentlearning. The first one is that the perirhinal and thalamic
cells corresponding to the object B are activated in advance to the
presentation of the target and the distractor. The network developed a
predictive code by learning the input, context and target association.
For example, the behavior of the perirhinal cell correlates with the
finding of pair-recall activities in IT and PRh during DPA tasks: some
cells visually selective for the associated object have been shown to
exhibit activation in advance to its presentation [@naya2003].
Similarly, the behavior of the thalamic cell can be compared to the
delay period activity of MD thalamic cells (part of the executive loop)
during oculomotor WM tasks [@Watanabe2004a]. The second interesting
observation is the sustained activation of the perirhinal cell B after
the disappearance of the target (between 750 and 900 ms on the figure)
which is solely provoked by thalamic stimulation (as the WM in dlPFC
still excites CN), whereas classical models of visual WM suggest that it
is due a direct feedback from dlPFC [@ranganath2006].
The third interesting feature is the fact that the network, when only
the cue was presented in PRh and dlPFC, already started to disinhibit
the corresponding thalamic cell, somehow anticipating to perform the DMS
task. We tested the 50 networks used in @sec-ficn:concurrentlearning
after learning the DMS-DNMS\_AB task and presented them with either A or
B for 200 ms. By subsequently recording the activity of the
corresponding cells in SNr, we noticed that they all tended to perform
DMS on the cue, i.e. disinhibiting the corresponding thalamic cell. This
can be explained by the fact that the representation of the cue in PRh
is also the correct answer to the task when DMS is required, and the
projection from PRh to CN therefore favorizes the selection of the
striatal cluster representing A+DMS compared to A+DNMS. This can be
interpreted such that the “normal” role of the visual loop is to
maintain the visually presented objects, but that this behavior can be
modified by additional prefrontal biasing (here the entry of DNMS into
WM and its influence on striatal activation), as suggested by
@Miller2001.
### Effect of the competition in SNr {#sec-ficn:competitionsnr}
![Evolution of internal variables in SNr for trials surrounding the mistake performed by the network on @fig-ficn:tasks-result C. (A) Reward received at each trial. (B) Activity of four SNr cells at the time reward is received or expected during the trial. These cells are selective respectively for A (blue line), B (green line), C (red line) and D (turquoise line). (C) Striatal inhibition received by these four cells. (D) Competition term received by the same four cells.](img/ficn/vitay_figure_4.png){#fig-ficn:competition-closeup}
We focus now on what happens around the late incorrect trial in @fig-ficn:tasks-result C to show that the first phase of learning corresponds to
the selective learning of connections from cortex to CN and from CN to
SNr, whereas the second one corresponds to the learning of lateral
connections within SNr to decorrelate the activities in the structure.
@fig-ficn:competition-closeup shows the evolution of some internal
variables of SNr cells between the trials surrounding the mistake
produced at the trial number 77 of @fig-ficn:tasks-result C. These trials
are all composed of A as a cue, DNMS as a task symbol and therefore B as
a target. @fig-ficn:competition-closeup A shows that the preceding and
following trials were rewarded, but not the trial 77. @fig-ficn:competition-closeup B shows the activity of four SNr cells at the exact
time when reward is delivered or expected to be delivered (750 ms after
the beginning of the trial on @fig-ficn:timecourse). These cells are
selective respectively for A (blue line), B (green line), C (red line)
and D (turquoise line). The four remaining cells in SNr are not plotted
for the sake of readability, but they are not active anymore at this
stage of learning. @fig-ficn:competition-closeup C represents the inhibition
received by these cells at the same time, which means the weighted sum
of inhibitory connections coming from CN. @fig-ficn:competition-closeup C
represents the competition term received by these cells, which means the
weighted sum of lateral connections in SNr (see @eq-ficn:mp-snr).
Through learning in the 76 first trials consisting in A followed by
DNMS, the cells B and C became strongly inhibited during the choice
period. In the rest of the article, we will call “active” a cell which
is strongly inhibited and has an activity close to 0.0. Both cells
receive a strong inhibition from the same CN cluster but they still do
not compete enough with each other so that only one remains active. As B
is a target, this provokes the disinhibition of the thalamocortical loop
corresponding to B, so that the cell B in PRh is much more active than
the cell A, leading to a correct response and subsequent reward. The
cell C is not involved in this particular task, so it is just a
distractor: its activation does not interfere with the current task.
However, this cell may be useful in other tasks, but the strong striatal
inhibition it receives will make it harder to recruit it for other
tasks. At the trial 77, the cell C in SNr competes sufficiently with the
cell B so that the activity of the cell B becomes close to its baseline
(around 0.7 on @fig-ficn:competition-closeup B). The difference between the
activities of cells A and B in PRh becomes small, leading to an omission
of reward on @fig-ficn:competition-closeup A according to the probabilistic
rule we used. This omission has two effects through the depletion of DA:
first, it reduces the striatal inhibition received by the two active
cells, as seen on @fig-ficn:competition-closeup C; second, it increases the
competition between the two active cells, but in an asymmetrical manner
(@fig-ficn:competition-closeup B). According to @eq-ficn:latgpineg, the
excitatory connection from the cell B to C will be much more increased
than the one from the cell C to the cell B, as the cell C is much more
inhibited than the cell B. Consequently, at trial 78, the cell C
receives much more excitation from the cell B and its activity becomes
above baseline. The cell B is then strongly inhibited by the same
cluster in CN and generates a correct rewarded response. In the
following trials, the cell B will further increase its selectivity for
this cluster, whereas the other cells in SNr (including the cell C) will
totally lose theirs and can become selective for other clusters.
What happened around this trial shows the selection of a unique cell in
SNr, even when the network already had a good performance. This
selection relies on four different mechanisms. First, the network should
have selected a number of cells in SNr which produce a correct answer.
These cells include the target, but also distracting cells that are also
selective for the same cluster in CN but which disinhibit irrelevant
thalamocortical loops. Second, as the network produces correct answers,
the cluster in CN becomes associated to a high reward-prediction value
in SNc. The amplitude of phasic DA bursts is accordingly reduced.
However, omission of reward will generate a greater depletion of the DA
signal, compared to the beginning of learning when CN clusters had no
association to reward and provoked no DA depletion. Third, omission of
reward reduces the striatal inhibition received by active cells in SNr.
However, if this was the only “punishing” mechanism, all the active
cells will lose their selectivity. In this particular example, the cell
B would gradually stop receiving inhibition from CN and all the
preceding learning would be lost. Fourth, the learning of lateral
connections in SNr is asymmetric with respect to DA firing: when a
distractor progressively wins the competition until the response
associated to the target is attenuated, this distractor becomes
disadvantaged in the competition with the target. This is an indirect
memory effect: as the cell corresponding to the target was previously
activated and provoked reward delivery, the cease of its activation
(provoking reward omission) is transmitted to the other cells in SNr
through DA depletion, which “understand” that their activation is
irrelevant and “get out” of the competition.
It is important to note that this competition between cells in SNr stays
completely local to the cells: there is no winner-take-all algorithm or
supervising mechanism deciding which cell should be punished. This
competition emerges only through the interaction of the cells and the
learning of their reciprocal connections. As stated in @sec-ficn:concurrentlearning, the scheme described before occurs during learning
in 32.5% of the networks we studied: the target cell in SNr temporarily
loses the competition before being reselected. However, in other cases
the target directly wins the competition and the distractors fade: there
is no degradation in performance, what can explain the great variability
in the number of trials needed to learn correctly all the tasks on @fig-ficn:tasks-result A.
![Magnitude of weight changes during learning of DMS-DNMS\_AB for two different networks, plotted here only for A+DMS trials. The top line corresponds to global weight changes in CN (projections from PRH and dlPFC), the middle one to the connections from CN to SNr, the bottom one to lateral connections within SNr. (A) Network showing a late competition mechanism in SNr selecting directly the correct target without provoking a mistake. (B) Network showing a late competition mechanism in SNr that led to the performance of mistakes and to a long period of instability. The amplitude of lateral weight changes has been thresholded during this unstable phase (it reaches up to 5000) in order to allow a better comparison with the first network.](img/ficn/vitay_figure_5.png){#fig-ficn:weightchange}
In order to better describe these two schemes of learning, we show on
@fig-ficn:weightchange the magnitude of weight changes in CN and SNr during
learning for two different networks. This magnitude is computed for each
trial in the learning session by summing the absolute values of the
discretized variations of weight values ($|d W_{i,j}(t)|$ in [@eq-ficn:weightcn;@eq-ficn:weightgpi;@eq-ficn:latgpipos] and @eq-ficn:latgpineg for
all neurons in the considered area and for all computational timesteps
in the entire trial (1050 in our design). These two networks have both
learned the DMS-DNMS\_AB task, but we represent here only the magnitude
of weight changes occurring during A+DMS trials. The top row represents
the magnitude of weight changes for striatal cells (@eq-ficn:weightcn),