Training without Pain
I beg your pardon? Oh, all right then, if you insist.
Operant conditioning: Operant conditioning is a natural learning process which was formally described and applied primarily by the American psychologist, Burrhus F. Skinner, an exciting and controversial figure in the development of psychology as a science. Beginning round about 1938, he developed a series of experiments to investigate learning in animals, and succeeded in doing things such as, for example, training a pigeon to peck only at a yellow disc, and to ignore discs of all other colours. Although Skinner would probably not have phrased it this way, the central idea behind his theory is that animals (and people) learn by trial and error, and will modify their behaviour in order to obtain a reward or to avoid something unpleasant. In the course of his experiments, he defined the term positive reinforcement, and several other related terms.
While it is not necessary to have a deep academic understanding of Skinners work in order to train a dog, understanding the most important terminology is extremely helpful as the methods used are somewhat different from those of traditional dog handling. While Skinner worked largely with pigeons, I will therefore try to explain his terminology using dog-handling exercises as examples.
Behaviour: A behaviour is something a dog does, such as sitting on command or pulling on the lead. It might be something you would like to get him to do, such as sitting on command. Getting him to do it the first time is called eliciting the behaviour. Getting him to learn to do it every time you ask is called establishing the behaviour. And getting him to keep on doing it instead of trying his luck is called maintaining the behaviour.
Alternatively, it might be something you want him to stop doing, such as pulling on the lead. Getting him to stop doing it as much is called weakening the behaviour. Getting him to stop doing it completely is called extinguishing the behaviour.
Positive and Negative Reinforcement: There are two ways of getting the dog to do something you want (establishing the behaviour), such as sitting on command. These are called positive reinforcement and negative reinforcement. Positive reinforcement means waiting for the dog to sit accidentally and giving him a reward, such as a food treat, as soon as he does (this is capturing the behaviour), or luring the sit and rewarding it when it occurs (this is eliciting the behaviour). He learns with remarkable rapidity that sitting on command will earn him a reward, and will repeat the behaviour if he continues to receive the reward, thus establishing the behaviour.
Negative reinforcement means doing something unpleasant to the dog, such as using a shock collar to shock him until he sits, and then immediately stopping the shock.
More simply put, positive reinforcement means giving him something good and negative reinforcement means taking away something bad. Reinforcement is always done immediately after the behaviour you want (i.e. as soon as he sits, you either give him the treat or stop shocking him), but never before (i.e. you dont give him the treat or stop shocking him to encourage him if he hasnt sat!)
Negative reinforcement works extremely well and produces behaviours which are extremely resistant to extinction (i.e. difficult to get rid of). Applying aversives correctly takes some skill, however, and getting it wrong can cause serious side effects such as stress, anxiety and even aggression, and there is also the training problem that the dog quickly learns to avoid the unpleasant stimulus which might result in his avoiding you! So unless you know exactly what you are doing, you’re probably better off sticking to positive reinforcement.
Punishment and Non-reinforcement: There are several ways of getting the dog to stop doing something you dont want him to do, such as pulling on the lead. Academic psychologists call the two most common ones punishment and extinction, but to avoid confusion I am going to call them punishment and non-reinforcement.
You can punish in two ways, either by doing something unpleasant to the dog such as smacking him on the nose or spraying hin with a citronella spray, or by removing something pleasurable that he has, such as his food, a toy or your attention.
Non-reinforcement means exactly what it says. You dont reinforce the behaviour, either positively or negatively. You ignore it completely, and you continue to ignore it. For ever.
Punishment works better in the short term, as it stops the unwanted behaviour immediately, but once the punishment has been over for a while, the behaviour will return (unless the punishment is severe and your timing is excellent!). This is called spontaneous recovery. In other words, if you thump your Dobe for pulling on the lead, he will stop pulling for about two minutes (if youre lucky) and then start again. Dogs and most other animals also get habituated, or used to a particular punishment (psychologists love long words, dont they?) In other words, the more often you punish a dog in the same way, the less effective the punishment gets. One can surmise that a stubborn dog such as a Dobermann habituates, or gets used to a particular punishment very quickly, and soon learns to ignore it, in which case it no longer acts as a punishment. Shouting is a good example of a punishment to which the dog is habituated. (Ring any bells?)
Positive punishment has the same sort of fallout as negative reinforcement, and very few people apply it expertly enough to make it work well without either traumatising the dog or being so poorly applied as to be useless. It is thus an approach to be avoided except in extreme cases, and needs to be carefully thought out before being applied.
Negative punishment can have quite strong emotional consequences for the dog and is a very powerful technique. Again, it needs very good timing and understanding of what you are doing, so this artlcle will concentrate on extinction, or non-reinforcement.
Non-reinforcement wont have as much effect as punishment immediately, but over a longer period will often get rid of the behaviour completely or almost completely. It is also by far the best way to make sure that a behaviour you dont want doesnt get established in the first place. If you do not reward the dog when he pulls on the lead the first few times as a pup, he is far less likely to persist in pulling. (If that sounds rather odd to you, its supposed to.)
Some behaviours can be extremely difficult to get rid of, or extinguish, though, because they are so well-established that even one reinforcement occasionally is enough to keep them going. In other words, if your dog has formed a really strong habit of pulling on the lead, you can ignore his pulling for three weeks and then reward him for it once, and hell keep on doing it. (I hope youre becoming rather mystified at the moment, because of course you dont reward your dog for pulling on the lead. Or do you?)
Some behaviours are also intrinsically reinforcing, or are biologically motivated. For example, pulling on the lead is partly due to an opposition reflex called thigmotaxis, which means basically that the dog will tend to resist physical pressure by pushing or pulling in the opposite direction. Non-reinforcement can thus be very difficult to apply, and a third technique of extinguishing (getting rid of) behaviour called training an incompatible behaviour is sometimes needed.
Training an incompatible behaviour is a phenomenally complicated name for a fairly simple idea. Basically, if you want to get rid of a behaviour that you dont want, such as pulling on the lead, you use positive reinforcement to teach the dog a new, good behaviour which he cant do at the same time as doing the old, bad behaviour. The new behaviour is called the incompatible, or competing behaviour, because it competes with the old behaviour. (Original, that.) Because the new behaviour is reinforced using positive reinforcement, it becomes strongly established, and the old behaviour, which he cant do at the same time as the new behaviour, thus weakens and eventually disappears. In our example you would teach the dog to walk on a loose lead as a competing behaviour, because he cant walk on a loose lead and pull at the same time.
At this point, your mystification must be overwhelming, because you cant teach him to walk on a loose lead when hes pulling all the time. Or can you?
In order to explain this point, its worth having a close look at how a good competition handler stops a dog from pulling. (Remember that punishment stops a behaviour immediately, but that the effect often doesnt last.) As the dog starts to forge ahead, the handler will release all the slack in the lead, swivel sharply on her left foot and take off fast in the opposite direction. The dog is brought up very sharply at the end of the lead and is often pulled off its feet if the handler moves fast enough. This is clearly a punishment. There is thus a short window during which the behaviour stops, i.e. the dog stops pulling and walks on a loose lead (new behaviour!). The reason this correction works is that a good handler will punish every incident of pulling and then immediately praise the dog for walking on a loose lead (positive reinforcement), thus establishing the new, good behaviour. If this is done quickly and emphatically enough, the dog will learn the new behaviour thoroughly and the old behaviour will disappear. In this case, the old behaviour (pulling) has been successfully replaced by a new behaviour (walking on a loose lead). In fact, what the handler has done is punishment followed by the establishment of a competing behaviour.
Unfortunately, most of us dont handle that well. We make two mistakes. We dont punish the dog each and every time he pulls, and we dont praise him quickly enough when he walks on a loose lead. Most importantly, though, what we fail to realise is that every time we let the dog pull, even if it is only for two yards, we are positively reinforcing the pulling behaviour, and thus making the behaviour harder to extinguish.
Whats the difference? This is where the real difference between traditional training and positive reinforcement training starts to emerge, because of course we are not aware of rewarding the dog for pulling! We dont praise him or pat him or give him a treat when he pulls. So whats going on?
The question to ask at this point is why the dog pulls in the first place, and the simple answer is: because he wants to go somewhere. When you get towed along behind him, is he getting there? Of course he is! Every step you take behind a pulling dog reinforces him positively for pulling.
The next question to ask is how you stop reinforcing him, and the simple answer is: you stand still.
And suddenly, you have the means to stop him from pulling without hauling him around and dislocating your shoulder in the process, without shouting and yelling, without even raising your voice. Instead of applying punishment, you simply stop reinforcing the behaviour, and although this doesnt work as fast as punishment, it lasts a great deal longer.
Stopping a dog from pulling now becomes quite simple. All it requires from you is patience and consistency (remember, if you reinforce him even once, hes likely to retain the behaviour). Set out for a walk with him. As soon as he starts to pull, stop in your tracks and stand still. Dont shout at him, dont talk to him, dont yank on the lead, just stand there. When the lead slackens, start walking again and praise him effusively for every little bit that he manages to do on a loose lead. But as soon as he starts to pull again, stop. If hes a hardened puller, you might not get more than a few yards down the road on the first day, but dont give up; hell get the idea more quickly than you think, and of course you will be reinforcing the new behaviour all the time.
Now why didnt I think of that 25 years ago?
In psycho-speak (which I hope youre learning fast), instead of punishment followed by the establishment of a competing behaviour, we now have non-reinforcement followed by the establishment of a competing behaviour. We get all the advantages of positive reinforcement without any of the disadvantages of punishment. It sounds like a good deal to me.
The real beauty of this method is that a strong-willed, determined, highly intelligent dog like a Dobermann wont stop trying to go somewhere, but he will realise very quickly that if pulling doesnt get him anywhere, hell have to try something different; and because hes determined, hell keep on trying until he finds something that works. It wont take long for him to discover that walking on a loose lead reinforces him in two ways; he gets to go where he wants to go and he gets praised for it! Suddenly, all that will power and energy is being put into finding out what you want him to do. The more stubborn and determined the dog, the harder he will work to find a way of getting the reward! Streets ahead of a Border Collie? Light-years ahead!
Living with positive reinforcement: When using positive reinforcement training, its important to make that the basis of your entire relationship with the dog. The golden rule for achieving this is to stop giving him attention, affection and treats (no, it isnt harsh bear with me) and make him work for them.
Dogs which are stroked and petted endlessly by their owners habituate (get used to) to the petting, so it loses its value as a reward. They also get bored and frustrated because they are usually not doing enough work, and will often start getting up to mischief such as excessive barking, chasing cats, digging etc. A dominant dog may also misinterpret petting as submissive behaviour, and can become very aggressive when the owner tries to discipline him, particularly if he has not been consistently handled.
To make matters worse, if your dog has an easy life being fed and petted without having to do anything in exchange, hes not going to enjoy it when you tie a chain around his neck, drag him around the garden or exercise field, push him around physically and shout at him, and will strenuously resist this treatment. Wouldnt you?
To use positive reinforcement really effectively, you need to turn every interaction with your dog into a lesson. Get him to do some small task, even if its only a sit or his latest exercise, before you praise him, pet him, give him a treat or feed him, and dont let him demand attention from you ignore him if he does.
There are several benefits to this:
- the dog comes to value a reward from you intensely because he has to work for it (and he really does love you, you know)
- he has to use his intelligence and ingenuity a great deal more to get the reward he wants, and is thus mentally stimulated and entertained, so the boredom behaviours tend to disappear
- his training becomes an extension of his lifestyle and so he enjoys it and doesnt resist his practice sessions
- because all the joy, pleasure and fun in his life, and none of the pain, is associated with working for you, he will come to love you and want to work for you with all his heart; he will be completely willing, responsive and attentive, and will follow you around in the hope of being asked to do something for you
- If you control all the good things in his life, status issues tend to get resolved as a side-effect, as he learns that he has to defer to you in order to obtain what he wants
- if he is ever really hurt or traumatised, some free affection and attention from you will mean vastly more to him and be a much greater source of comfort
- in the same way, because he is hardly ever punished, if you do ever need to raise your voice to him or punish him, it will have a dramatic effect on him
- finally, you will enjoy the training much more and thus be positively reinforced yourself, so you will much more likely to persevere!
Try it and see.
Food treats and timing: Food treats are used extensively with this method. Some old obedience hands will question this for one of two reasons, first, that the dog becomes dependent on the treat and second, that the dog gets used to the treat and stops working for it. They are both absolutely right and completely wrong. What is critical is the timing of the treats, and this has to be planned very carefully:
Skinner and his colleagues devoted a great deal of research to what they called reinforcement schedules, and came to the conclusion that this was by far the most important element of operant conditioning, possibly even more important than the degree of pleasure afforded by the reward.
Remember the three stages of teaching a desired behaviour such as sitting on command eliciting, establishing and maintaining. During the first two stages, i.e. when the dog is learning a new behaviour, he should be reinforced continously. In English, that means that you give him the treat every single time he does what you want. When teaching him to sit, give him a treat every single time he sits on command.
However, there are problems associated with continuous reinforcement schedules, one being that the dog sates quickly if he is being given a treat every time he performs the desired behaviour. He simply gets full, or he gets bored with the treat; whatever happens, after a while he will stop working for it.
On the other hand, if you stop reinforcing him altogether, you are practicing non-reinforcement, and again, the behaviour will disappear, and if the behaviour was originally established with a continuous reinforcement schedule, it will tend to disappear quite rapidly!
Skinner & co. found that by far the best way to maintain an established behaviour was to reward it sometimes using a variable ratio reinforcement schedule (what a mouthful!). The best way to illustrate what this means is by using an example.
The ratio is the proportion of sits which are rewarded, and the variable is the number of sits in between reinforcements. Suppose you have successfully taught your dog to sit on command and you want to maintain the behaviour. First you decide what ratio, or proportion of sits you want to reinforce. This means that you reinforce 1 in 5 sits, or 1 in 10 sits, or 1 in 20 sits, whatever ratio you decide on. Suppose you decide to reinforce 1 out of every 5 sits.
You then make sure that you average 1 reinforcement to every 5 sits, but that you vary the number of sits in between reinforcements (hence variable ratio these names do make sense, sort of). It is very important not to reinforce him on every 5th sit, as he will see a pattern emerging. However, after a large number of sits he should have received on average 1 reward for every 5 sits.
So in 20 sits you would give 4 rewards, but NOT on the 1st, 6th, 11th and 16th sits! You might give one on the 2nd sit, one on the 9th, one on the 12th and one on the 17th, in other words, the number of sits which dont get rewarded is different each time, and the dog has no way of working out in advance which sit is going to be rewarded. This keeps him on his toes as he knows that the reward will turn up sometime in the future, but not when.
Behaviours reinforced in this way become extremely resistant to extinction, even in the absence of reinforcement. In other words, they become very difficult to get rid of!
It should now be a bit clearer why a behaviour such as pulling on the lead can be so difficult to eliminate. An occasional and apparently unimportant slip-up like letting the dog pull you a few yards down the road is actually a variable ratio reinforcement the most powerful way of maintaining a behaviour over a long period of time! Non-reinforcement really does mean non-reinforcement!
Positive reinforcement in practice: All right, how did I get Slug to restrain himself from leaping all over me when I praised him? It wasnt too difficult. First, I got out a reward (a food pellet) and made sure he knew I had it. Then I told him to sit, which he did very eagerly because he wanted the pellet. Once he was sitting, I started praising him in an excited tone of voice. As soon as he jumped up at me, I turned my back to him, folded my arms and stared at the wall for a few seconds. This is non-reinforcement of an unwanted behaviour, namely jumping up. Needless to say, he didnt get the pellet.
After a few moments, I turned back (to a very anxious dog!) and repeated the whole process. On the third try (these guys have got brains!) he managed to hold his bum on the floor for a couple of seconds and I immediately gave him the pellet (positive reinforcement) and made a fuss of him. You have never seen such a happy dog! On the fourth try, he stayed sitting for a moment again and then jumped up, just as he did on the third try, so I turned round and folded my arms. On the fifth try, he managed to sit for a bit longer so I rewarded him again. And so on.
(This little-by-little approach is called behaviour shaping. I havent actually discussed it in this article, but its quite easy: You reinforce the dog for doing something similar to the behaviour youre after, but if he repeats the similar behaviour on the next try you withhold the reward. If he then does something even closer to the behaviour you want, you reward him again, but if he repeats that behaviour, you withhold the reward again, and so on. This tells him that hes on the right track, but hasnt got there yet. Only once he is doing exactly what you want do you start reinforcing him continuously.)
Within a few minutes, Slug was determinedly holding himself in a sit while I praised him and made an enormous fuss of him verbally. He would not budge until I had given him a release command. At no time during the lesson did I touch him, except to add physical praise to the pellet, which of course increased the value of the reward.
In fact, by turning my back on him, I actually punished him for jumping up as well by removing my attention and praise, which he was thoroughly enjoying. (Remember that the second type of punishment is the removal of something pleasurable to the dog.)
Easy when you know how, isnt it? Ironically, I have been fascinated by the principles of operant conditioning for several years and have in fact successfully applied them to humans! As a result of using positive reinforcement in my previous career as an IT manager, I was several times treated to the extremely entertaining spectacle of five grown men sprinting down a corridor to make sure they were on time for a meeting run by a woman! But its taken me until now to start applying these extremely kind, sensible and successful principles to my dogs. Still, as Ive just demonstrated by writing this article, you can teach an old dog (of the female variety) new tricks!
By the way, did you manage to read through all the complicated theory in this article so you could find out how I taught Slug a reliable sit? Of course you did. Heres how I got you to do it. I dangled a treat in front of you by telling you what I did with Slug, but not how I did it. I reminded you that I had a treat for you when I suggested that you might have to wait for the next issue to get the answer, thereby keeping your attention. Then I dangled a similar treat in front of you a bit further along (the mystery bit about rewarding pulling on the lead), and a few paragraphs later I gave you the treat (the bit about stopping in your tracks). This was a reward – positive reinforcement for persevering with the behaviour I wanted from you continuing to read the article. You thus learned that continuing to read was a behaviour which would be rewarded, and so you persevered until the end of the article and got your treat! I do hope it was worth it, and that you will find even greater satisfaction and reward in applying these principles to your own wonderful Dobermann.