Positive Reinforcement Training: Adding Up the Pluses and the Minuses

In my last article, “Equitation Science” in the November/December issue of Horse-Canada, I discussed the predominant role of negative reinforcement in horse training, and how we might do a better job of it. Here, I’ll consider the underlying principles of positive reinforcement, and where it could play a greater role in our interactions with horses. But first, let’s back up and consider why negative reinforcement came to be the mainstay of horse training.

WHY NEGATIVE REINFORCEMENT?

In training horses to perform, in most of the behaviours we seek, we apply pressure (a mildly aversive event) and when the horse offers the desired response, we relax the pressure. As prey animals, horses are particularly motivated to escape aversive events. This makes them evolutionarily programmed to comply with a rider’s demands to avoid the discomfort of pressure from bit, leg, spur, or seat. Negative reinforcement, which rewards the desired behaviour by removing that pressure, forms the foundation of control pivotal to riding a horse under saddle (McGreevy, 2010).

WHY NOT POSITIVE REINFORCEMENT?

Positive reinforcement has been underutilized, firstly, because ridden equitation makes it difficult to offer a food reward from a horse’s back without disrupting the very behaviour you want to reward. Secondly, trainers are generally vague about what positive reinforcement is, or how to implement it. A study by two equine behaviourists – Natalie Warren-Smith and Paul McGreevy (2008) – of top New Zealand horse trainers found that the majority of trainers erroneously believed that the release of pressure after a desired response was positive reinforcement. Thirdly, some people view food rewards disparagingly and have a misconception that horses who are fed treats by hand will become nippy and aggressive.

WHY POSITIVE REINFORCEMENT?

There is a growing body of research to suggest a role for positive reinforcement in contemporary horse training. The use of the ‘secondary reinforcer’ (more on this later) circumvents the problem of how to use positive reinforcement under saddle. There is a wealth of information available to trainers and amateurs offering step-by step guides to positive reinforcement training.

Further, there is no evidence to suggest that a horse’s tendency to bite increases with the use of positive reinforcement (Hockenhull & Creighton, 2010). Rather, if the horse is never reinforced for ‘mugging’ behaviours (such as searching clothing or grabbing for treats), positive reinforcement can serve to eliminate these behaviours rather than induce them.

Finally, research indicates that there is much more room for the use of positive reinforcement than we currently employ. Studies have shown that although most horses, regardless of training method, learn the task within the required time frame, positively reinforced horses generally learn the tasks more quickly, retain the learned tasks longer, experience less stress, react to humans more positively and are able to generalize this training across trainers, novel tasks, and over long periods of time.

In 2010, for example, Dr. Carol Sankey, a researcher in equine behaviour in France, trained 21 yearling ponies to back up using either positive or negative reinforcement. Sankey found that the positively reinforced ponies learned to back up more quickly, experienced less stress and were more likely to respond agreeably to their trainer and to a newcomer eight months later, than were negatively reinforced ponies. Interestingly, by the third session, negatively reinforced ponies, showed elevated stress levels immediately upon entering the training area and before the training actually began.

Several studies comparing positive and negative reinforcement to train yearlings for grooming, tacking up, and long-lining, to retrain horses with severe trailer loading histories, or work with rescue horses to approach frightening objects report similar results.

HOW WE TRAIN WITH POSITIVE REINFORCEMENT: CLICKER TRAINING

Clicker training is arguably the most effective method of using positive reinforcement to train animals. The clicker is simply a small plastic hand-held device, designed to make a distinctive click when a metal spring is depressed and released. By using classical conditioning to pair the sound of the clicker followed by a food reward, the horse comes to understand that the click means that good things will happen. In learning parlance, we have created an association between a primary and secondary reinforcer.

Primary reinforcers are inherently rewarding, like food, water, sex and companionship, because they are central to our survival. Secondary reinforcers attain their value by being paired with one or more of these primary reinforcers. Money, for example, is a secondary reinforcer because it has come to be associated with many things that are inherently rewarding. In fact, this association is so strong that money is reinforcing all by itself. Just seeing and holding a $100 bill feels good. Similarly, the sound from the clicker that has been repeatedly paired with a food (or other salient) reward starts to feel good to our horses because they have learned that something good is going to follow.

Learning theory also tells us that to make a behaviour that is most likely to stick (or “resistant to extinction”), a random and intermittent reinforcement schedule is best. Think of your response to a coke machine that gives you a pop whenever you insert money, compared to your response to a slot machine that provides a big reward, unpredictably and infrequently (known as a “variable ratio schedule” in operant conditioning). We don’t become addicted to coke machines. In fact, when they let us down, we give up using them altogether. Thus, to teach a behavior, we need to reward it every time the behaviour is performed. To retain that behavior, we need to move toward reinforcing the behaviour only some of the time and randomly, preferably when we get the very best execution of that behaviour.

WHEN WOULD I USE CLICKER TRAINING?

Consider a horse who stands quietly in the aisle without being tied, is calm and tractable for all veterinary and farrier procedures, lowers his head, tolerates clipping, or loads easily on a trailer. Perhaps your horse does all these things already; there is still a place for training with positive reinforcement.

Establishing good ground manners
Positive reinforcement principles can be generalized to any behaviour we would like to see more often. For instance, if you want to teach your horse to stand without being tied, begin by rewarding him for standing beside you without moving for five seconds. Next, back away one step, and have him stand for five seconds, followed by a click and reward. Then back away two steps, wait five seconds and click and reward. Gradually extend the time you want your horse to stand and increase your distance from him, introducing only one change at each progression (distance or length of time).

Overcoming Phobias & Rehabilitation
Positive reinforcement can be used to set up new reinforcement histories to rehabilitate horses with problem behaviours or phobias. A horse who is terrified of clippers, for example, can be gradually conditioned to tolerate the feared object by rewarding small increments of the behaviour we want to see. He can be initially rewarded for standing quietly when the clippers are running, but sufficiently far away so that he offers no reaction. Gradually, the horse can be moved one step closer to the clippers, followed by a click and reward, then another step, a click and reward, and so on. Eventually, you may pick up the clippers and run the hand holding them over the horse’s shoulder, then finally the clippers themselves over the horse’s face.

THE MECHANICS OF CLICKER TRAINING

While a detailed “how-to” of clicker training is beyond the scope of this article, I would highly recommend a visit to Shawna Karrasch’s website for a more detailed analysis. Her book, You Can Train Your Horse to do Anything!, and accompanying video, provides an excellent, straightforward program that does not require prior experience or professional expertise. Other trainers who have applied this technique particularly effectively with horses include Alexandra Kurland, and Karen Pryor (check out their websites listed at the end of this article). Although each clinician’s approach has some variations, there are common underlying principles that are worth discussing here.

The “Clicker” in Clicker Training
The term “clicker training” is misleading, because it places the emphasis on the clicker itself. In fact, any distinct audible stimulus (zoo handlers often use whistles, for example) can become a secondary reinforcer provided it is predictably paired with a primary reinforcer. If, for example, you repeatedly said the word “asparagus” reliably followed by a food reward, the word “asparagus” would come to have meaning and value, and work as a satisfactory secondary reinforcer to mark the desired behaviour. The clicker is effective because the sound is distinctive and novel. The horse has no prior ideas about the consequences of this sound, so we can build a strong reinforcement history from the ground up. The words “good boy,” on the other hand, have probably not reliably predicted a food reward in the past, and thus have lost their impact.

Building the Bridge: The Association Between Click and Reward
Karrasch (2000) outlines a first session that teaches the horse the association between the primary reinforcer (food) and the secondary reinforcer (the sound of the click), while also training him to keep away from the treat bucket. She calls the click the ‘bridge signal,’ as it bridges the gap between the exact behaviour you are rewarding and the administration of the reward. Karrasch suggests timing the click and reward to follow any behaviour that shows an inclination away from the treat bucket. This may begin with a very small movement, even eye movement. To start, it may be necessary to push your horse away, followed by a click and reward. Surprisingly quickly (usually within the first or second five-minute session) the horse has learned that a) that a click means good things are going to happen; and b) that staying away from the feed bucket will result in a food reward, and that mugging behaviours will not.

Target Training
Once the association is firmly established, many trainers then teach the horse to touch a hand-held target (a visible object such as a small rubber ball fastened onto the end of a wood dowel). You can begin by reinforcing your horse for bumping into the target (or looking or sniffing at it), and then gradually moving the target further away, up or down, and eventually getting the horse to take a step towards a target, then two, then three, and so on. The target can then be used to guide the horse in different directions (backing up, moving sideways, or following you). You can also train your horse to walk away from you (clicking and rewarding one step at a time) to touch a stationary target in a different location. This training technique forms the basis of teaching a horse to walk into a trailer, and is surprisingly effective even for horses with a prior history of being ‘unloadable’ (Hendrickson at al., 2008).

Working with the Clicker Under Saddle
Although it is relatively straightforward to offer a food reward immediately following a desired behaviour while working with your horse on the ground, this becomes more cumbersome while on his back. In fact, were you to reward a desired behaviour under saddle by stooping forward to offer a carrot, the behaviour you would likely see more of in the future would be your horse slowing down and turning his head, since this is the behaviour that immediately preceded the reward.

With clicker training, you can ‘mark’ the exact behaviour you want to reward. As the association becomes stronger and the response more reliable, you can gradually lengthen the time between the click and the food reward. This will be your eventual goal when working under saddle. Remember, that with time and consistency, the click itself starts to feel good (just like money) so it becomes less essential that the reward follow immediately.

Clicker training has been used successfully to teach higher dressage movements (such as piaffe and passage) first in-hand and then transferring this knowledge to working under saddle with the clicker (McGreevy & McLean, 2010). Karrasch worked very effectively with Beezie and John Madden to help top show jumper, Judgment, overcome his fear of water obstacles by training him to respond to the clicker and systematically clicking and rewarding closer and closer approaches to, and eventually over, the jump.

Shaping: The Cornerstone of Clicker Training
Throughout all of these examples, we see that “shaping” forms the foundation of all clicker training. Some behaviours occur spontaneously and can be rewarded immediately following their occurrence. To train your horse to urinate on command, for example, simply wait for this behaviour to occur (predictably after a ride or coming in from the paddock to a freshly bedded stall), name it by giving a consistent verbal command, then click and reward. This repeated association will eventually lead to your horse performing the behaviour on cue, within a consistent context such as a freshly bedded stall. Most behaviours we want horses to perform do not, however, occur by themselves, so we use the principles of shaping to reward closer and closer approximations of the eventual behaviour we want to see.

Karrash emphasizes the need to “set the horse up for success,” by making shaping increments sufficiently small so that the horse can succeed easily. In this way, we provide more opportunities for reinforcing the desired response, minimize confusion by reducing or eliminating the wrong response, and reliably establish the new behaviour more quickly and efficiently.

Karen Pryor outlines “10 Laws of Shaping” as follows:

1. Raise criteria in increments small enough so that the subject always has a realistic chance of reinforcement.

2. Train one aspect of any particular behavior at a time. Don’t try to shape for two criteria simultaneously.

3. During shaping, put the current level of response on a variable ratio schedule of reinforcement before adding or raising the criteria.

4. When introducing a new criterion, or aspect of the behavioral skill, temporarily relax the old ones.

5. Stay ahead of your subject: Plan your shaping program completely so that if the subject makes sudden progress, you are aware of what to reinforce next.

6. Don’t change trainers in midstream. You can have several trainers per trainee, but stick to one shaper per behavior.

7. If one shaping procedure is not eliciting progress, find another. There are as many ways to get behavior as there are trainers to think them up.

8. Don’t interrupt a training session gratuitously; that constitutes a punishment.

9. If behavior deteriorates, “Go back to kindergarten.” Quickly review the whole shaping process with a series of easily earned reinforcers.

10. End each session on a high note, if possible, but in any case quit while you’re ahead.

~ From Chapter 2 of Don’t Shoot the Dog by Karen Pryor

THE POSITIVES OF POSITIVE REINFORCEMENT

When training with negative reinforcement, our timing needs to be impeccable so that the release comes at the exact moment the horse has performed the desired behaviour. Amateurs often do not possess this timing, so inadvertently punish their horses for performing the very behaviours they were seeking to reward. And, amateurs and professionals alike, in their efforts to achieve an enhanced performance, may sustain the pressure longer than is comfortable or productive. With positive reinforcement there is no downside to bad timing, or unskilled administration. It may take longer for the horse to understand the association you are trying to make, but you are unlikely to see agonistic or dangerous behaviours (such as biting, kicking, bucking, or bolting) that often result from faulty delivery of negative reinforcement.

The overwhelming plus about clicker training, however, will become readily evident after only a few sessions. Your reward is seeing your horse anticipate his work eagerly, respond enthusiastically, allowing you to gain a deeper understanding and closer relationship with him. Indeed, it allows us to connect at an even deeper level to the horse’s tremendous and generous spirit that keeps us heading out to the stable every single day of the year.

FOUR QUADRANTS OF OPERANT CONDITIONING

Operant conditioning, first introduced by Edward Thorndyke, and refined by B.F. Skinner, is so called because the organism operates in his or her environment based upon consequences.

Reinforcement refers to an event or stimulus that serves to strengthen the behaviour that led to it. Reinforcements can be both positive and negative; both serve to make the preceding behaviour more likely to occur in the future. Often, we think of negative and positive in the moral sense of good and bad, and this is where much of the confusion resides in making sense of learning lingo. Rather, negative refers to taking something away, and positive refers to adding something.

Positive Reinforcement occurs when a behaviour is strengthened because it is followed by the presentation of a rewarding stimulus.

Negative Reinforcement occurs when a behaviour is strengthened because it is followed by the removal of an aversive or unpleasant stimulus.

Punishment refers to an event or stimulus that serves to decrease the frequency of a behaviour – most typically an undesired behaviour. Punishments too can be both positive and negative.

Positive punishment then is the administration of an unpleasant stimulus in order to reduce or stop an unwanted behaviour.

Negative punishment refers to the removal of a desired stimulus in order to reduce or stop and unwanted behaviour.

<< Previous Post

Next Post >>