Two Applications and a Conclusion 325
In one set of experiments, the dopaminergic (DA) neurons of macaque mon-
keys were recorded as they learned that a light is predictive of the availability
of a reward (juice, received by pressing a lever).
82
In the absence of reward,
DA neurons exhibit a sustained level of activity, given by the baseline or tonic
firing rate. Prior to learning, when a reward was delivered, the monkeys’ DA
neurons showed a sudden, short burst of activity, known as phasic firing (Figure
11.2a, top). After learning, the DA neurons’ firing rate no longer deviated from
the baseline when receiving the reward (Figure 11.2a, middle). However, pha-
sic activity was now observed following the appearance of the cue (CS, for
conditional stimulus).
One interpretation for these learning-dependent increases in firing rate is
that they encode a positive prediction error. The increase in firing rate at the
appearance of the cue, in particular, gives evidence that the cue itself eventually
induces a reward-based prediction error (RPE). Even more suggestive of an
error-driven learning process, omitting the juice reward following the cue
resulted in a decrease in firing rate (a negative prediction error) at the time at
which a reward was previously received; simultaneously, the cue still resulted
in an increased firing rate (Figure 11.2a, bottom).
The RPE interpretation was further extended when Montague et al. (1996)
showed that temporal-difference learning predicts the occurrence of a partic-
ularly interesting phenomenon found in an early experiment by Schultz et
al. (1993). In this experiment, macaque monkeys learned that juice could be
obtained by pressing one of two levers in response to a sequence of colored
lights. One of two lights (green, the “instruction”) first indicated which lever to
press. Then, a second light (yellow, the “trigger”) indicated when to press the
lever and thus receive an apple juice reward – effectively providing a first-order
cue.
Figure 11.2b shows recordings from DA neurons after conditioning. When
the instruction light was provided at the same time as the trigger light, the
DA neurons responded as before: positively in response to the cue. When the
instruction occurred consistently one second before the trigger, the DA neurons
showed an increase in firing only in response to the earlier of the two cues.
However, when the instruction was provided at a random time prior to the
trigger, the DA neurons now increased their firing rate in response to both
events – encoding a positive error from receiving the unexpected instruction
and the necessary error from the unpredictable trigger. In conclusion, varying
the time interval between these two lights produced results that could not be
82.
For a more complete review of reinforcement learning models of dopaminergic neurons and
experimental findings, see Schultz (2002), Glimcher (2011), and Daw and Tobler (2014).
Draft version.