A grey Andalusian stallion gallops at liberty across the Cavalia stage, comes to a sliding stop in front of his handler and dances with her in a breathtaking choreographed sequence. A grand prix dressage horse seems to skip effortlessly through one-tempi changes in time to a Rachmaninov piano concerto. An unbroken semi-feral pony from the research lab at the New Bolton Centre willingly stands for an injection without halter or handler. Incredulous observers ponder “How on earth were these feats accomplished?”
There is a great deal of learning theory jargon that is unnecessarily obscure – reinforcement contingencies, discriminative stimuli and negative reinforcement, to name a few. But learning theorists nailed it with the term shaping. Just as an artist shapes a colourless lump of clay into an exquisite vessel, complex behaviours are shaped in small increments, progressing ever closer to the eventual target behaviour. Shaping forms the foundation of all equine training whether it be with positive reinforcement (+R), negative reinforcement (-R), or even when we had no intention of training our horses anything.
There are some things we want horses to do that they perform of their own accord (standing, moving, urinating, etc.). These behaviours can be “put on cue” by giving the behaviour a name, or an associated signal, and rewarding it when it naturally occurs (+R). This is known as “capturing.” After surprisingly few repetitions, the horse will perform the behaviour with the command or signal alone. Similarly, we can reward a behaviour by contriving its occurrence (e.g. pinching a horse’s tendon until he picks up a foot), and rewarding it with release (-R).
However, there are many more behaviours that we want from horses that will never occur spontaneously, nor through physical manipulation – jumping over 1.60m obstacles, full canter pirouettes, or racing around a barrel to name a few. This is where shaping comes in. In learning jargon, shaping is the rewarding of closer and closer approximations of the desired behaviour.
McLean stresses that all training, regardless of discipline, is built upon breaking down the task into single trainable units, repeating until the horse offers the correct response to the aid each time, and then building each additional unit until the final outcome is reached. Training units, or “criteria,” increase the difficulty, or the duration, or the expected quality of the exercise, and so on. Shaping needs to be consistent, occur in a particular and systematic order, and each component needs to be firmly established before progressing in order to give the horse the opportunity to get the right answer. By making rewards contingent on desired behaviours, and ignoring unwanted responses, increasingly complex behaviours can be molded or shaped.
Violating these principles exerts a cost, not only in extending the time for the horse to learn the desired response, but by encouraging conflict behaviours (such as tail swishing, balking, shying, rearing, etc.). Chronic conflict states result in negative physiological consequences for horses such as ulcers, poor condition and self-mutilation. McLean reminds us that “When horses give wrong responses, you cannot expect them to know what is right – only you know that” (2007).
A shaping program is always modifiable. If it isn’t working, shift tactics or reduce criteria. There are many shaping avenues to reach the desired behaviour.
Positive and Negative Reinforcement
Horses, like people and most other animals, learn through both positive and negative reinforcement because their actions become associated with consequences.
Positive reinforcement (+R) occurs when a behaviour is strengthened because it is followed by the presentation of a rewarding stimulus. If you feed a horse that is barging and banging at his stall door, you positively reinforce the behaviour and make it more likely that you will see it every time you bring out the grain cart. If you shout at him to be quiet, you may even be able to get him to do it on command, as the shouting will become a cue that stall banging is expected and a food reward will follow.
Negative reinforcement (-R) occurs when a behaviour is strengthened because it is followed by the removal of an aversive or unpleasant stimulus. Many equine scientists have lamented this terminology because of the confusion it fosters when positive and negative are conceived of in their moral sense of good and bad, rather than the intended arithmetical sense of adding and subtracting. Think of -R as providing relief from an aversive event. In the previous example, when we feed the stall-banging horse, the banging ceases, and we have been relieved of the obnoxious event. The horse has trained us through -R, to feed him quickly. Andrew McLean, a leading equine scientist, provides a salient visual of the pressure/release principle behind negative reinforcement in the words of Tom Roberts. Roberts asks, “When you sit on a pin, why do you get off?” Most of us would answer, “Because it hurts,” but Tom corrects them, saying “No, you get off because it STOPS hurting.”
Clickers, Climate Change and Secondary Reinforcers
Primary reinforcers, such as food, water, sex and companionship, are intrinsically rewarding to living creatures because they are critical to survival; we do not need to learn that these events are desirable. Secondary reinforcers are rewarding because they have been consistently presented together with a primary reinforcer and so come to have an associated value. Even though you cannot eat a $100 bill, it feels delicious because of the strong associations we have about what money can buy – like horses and Hermès accessories.
A clicker, a hand-held device that makes a distinctive sound for which the horse is unlikely to have formed previous associations, has no inherent value to a horse. However, if we consistently pair that sound with a food reward, a horse will come to associate the clicker with good events. The term “Clicker Training” is misleading because of its emphasis on the clicker. If I said “climate change” and always followed that with a food reward, the words “climate change” would become a reliable secondary reinforcer and eventually take on value for my horse.
The advantage of a secondary reinforcer is that we can mark the precise behaviour we want to reward, and eventually introduce delay between click and reward. Shawna Karrasch, a prominent equine trainer specializing in +R training, calls the clicker a bridge signal as it provides a bridge between the desired behaviour and the moment of reward. This is particularly useful when chaining a few behaviours together (such as the Cavalia horse’s performance) or when using +R under saddle (I can click and mark the behaviour I like – such as a good jumping effort – and reward when we are no longer airborne).
Shaping to Reduce Fear Responses
The horse is evolutionarily programmed to take scary things seriously, learn them quickly and not forget about them. And flee them. McLean comments “Fear is quickly learned, not easily forgotten, and is strongly associated with the movement of the horse’s legs” (2007). Fearful stimuli receive special attention in the amygdala region of the brain, directly triggering the flight response, which activates the horse’s entire body. Unlike almost anything else we teach horses to do, which requires numerous repetitions, the flight response can be learned in just one experience, and is extraordinarily resistant to extinction (i.e. it is difficult, probably impossible, to completely eradicate a fear response). Trial and error learning is not evolutionarily adaptive on the savannah where you could well be the next meal for a hungry predator.
Flight responses may not always be volatile, but may also present in more subtle behaviours such as raising the head or stepping sideways. These milder reactive behaviours quickly escalate, however, to dangerous behaviours because they are often rewarded through principles of -R. When any of these movements results in creating distance between the horse and the feared object, we are likely to see more of that behaviour in the future. Ending the aversive event is the reward. Ingrained phobias are often learned when reactive behaviours made the feared stimulus go away, and greater reactivity made it go away more quickly.
An established history that particular procedures mean bad things will follow can, however, also offer many shaping opportunities to relearn that this procedure is now associated with good events. The more we are able to reinforce and entrench this new agreeable history with both -R and +R, the less likely it will be for that fear response, still lurking in the ancient regions of the brain’s amygdala, to resurface. For most procedures, even unpleasant ones, the panic reaction and the subsequent forcible control are much worse than the procedure itself. Once the horse understands that the latter can be eliminated, the clipping, or the injection, or the farrier’s hammering become quite tolerable (see Clipping photos on page 48).
A last caveat here: because fear responses are so deeply embedded in the horse’s memory, in spite of our best efforts, they often reappear in what learning theorists call “spontaneous recovery.” Do not despair. Rewind and reshape the calm behaviour you are seeking. The shaping proceeds more quickly with each re-shaping sequence, and further solidifies the new reinforcement history.
Critics often argue that you don’t need a lot of scientific mumbo-jumbo to train horses – just use good old common sense. Unfortunately, our good old common sense is coloured by our human-centred viewpoint, and does not do the best job of seeing the horse’s world perspective. Science can help us here.
When our horses are difficult, and even when they are fabulous, it is tempting to attribute their behaviour to their deficient or exemplary disposition. But we are always better served by learning theory explanations. Horses are always learning, whether we are intentionally training them or not. Shawna Karrasch reminds us of a simple truth about training horses: “If any behaviour, either desired or undesired is increasing in frequency, there is something in the environment that is reinforcing it.” It behooves us to do some sleuthing to discover where we might be reinforcing behaviours that we wish would go away, and to systematically shape and reward the behaviours we would like to see more often.