Who Destroyed Three Mile Island? – Nickolas Means | #LeadDevLondon 2018

thank you so much Mary it is quite the honor for me to be back a third year in a row this is one of my favorite events all year and I want to start by saying a heartfelt thanks to the team at White October for willing this event into existence it’s clear by the fact that there are 1,100 people here that this event brings tremendous value to those of us who care about leading well and software so I want to start by having everybody give them a hand and thank them for all the work that they do to put on this wonderful event so before we get started I’m curious show of hands who understands how a nuclear reactor generates power a pretty good number of you for those of you that don’t know worries I will explain the whole thing it’s really quick really easy so when I was a kid my parents gave me this for volume set of books my dad is a mechanical engineer by training and so I spent lots of time asking him how various complicated things work and listening to his very patient very kind explanations of those things they bought me these books because they wanted to encourage my curiosity and teach me to seek these answers on my own I don’t have to look at them very often anymore thanks to the magic of Wikipedia and the Internet there’s just no need but they do still have a treasured place on my bookshelf because there’s such an important part of what made me who I am I distinctly remember turning to these books when I was a kid and it was all over the local news that a new nuclear power station had come online fairly close to where I grew up I had no idea how a nuclear reactor generated electricity and I wanted to know and so I turned to this page I think it’s a good place for us to start today as well with the actual reactor diagram I learn from when I was a kid it’s this one right here and as it turns out the basic mechanics of a nuclear power plant are essentially the same as a conventional power plant you have a heat source that heats up water in a conventional plant it’s going to be natural gas or coal that’s burning in a nuclear power plant it’s a carefully controlled uranium chain reaction high-pressure water circulating through the reactor carries the heat to the steam generator or it’s used to boil water converting it in the steam that steam as water boils to steam it expands and that expansion propels it end of the term which is basically a giant fan in a tube and the steam spins the turbine around which in turn turns the generator and that’s where our electricity actually comes from after that the steam gets dumped into the condenser where it turns back into water and it goes through the whole process again now there are two primary families of nuclear reactors in the United States there’s boiling water reactors and pressurized water reactors if you want to come up to me after this talk I will talk to you about the reactors in in this country they’re quite a bit different we’re looking at a pressurized water reactor here because that’s the kind that was an operation at Three Mile Island so what is it that makes this a pressurized water reactor well the components that I just walked you through are in two separate coolant circulation lips the primary loop in orange consists of the water that flows through the reactor vessel to gather heat and then through the steam generator passing the heat to the secondary loop the water in the secondary loop in blue consists of the water that flows into the steam generator boils off and turns into steam goes through the turbine back into the condenser and back through the loop again water from these two loops never mixes they’re completely isolated from one another the thing that makes this design a pressurized water reactor is that the water in the primary coolant loop is held at about 2,000 psi and it’s question of basic economics a boiling water reactor has to have a much larger reactor Pressure Vessel because the water is actually boiling the steam in the reactor Pressure Vessel so you need that room a pressurised reactor because it holds the water at 2,000 psi the water doesn’t boil even at the plants operating temperature of 300 degrees Celsius so you don’t need all that room or at least it shouldn’t boil which brings us to March 28 1979 Three Mile Island nuclear generating station is a two unit nuclear power plant in Londonderry Township Pennsylvania it’s built on a three-mile long sound bar in the middle of the Susquehanna River imaginative Lee named Three Mile Island and it’s about 10 miles south of the capital of Pennsylvania Harrisburg unit two is a 906 megawatt electric pressurized water reactor designed by Babcock Babcock & Wilcox that when in a commercial operation on December 30th 1978 early in the morning of March 28 1979 it’s running at 97% capacity and that has been for three months straight since it came online as they say in the nuclear power industry this reactor was running hot straight and normal no significant issues these four men are at the controls of Three Mile Island unit two for the overnight shift on March

28th bill Z II was the shift supervisor for the overnight shift it was his plant that night he was in charge Fred Shima is the shift foreman for unit 2 he’s running the operations at unit 2 and he’s busy he’s second in command for that unit Edie Frederik and Craig Faust were the control room operators on duty they’re the ones that are actually sitting at the controls of the reactor and everything plant was running pretty normally that night except for a small problem one of the condensate polishers that the previous shift hadn’t been able to solve so these are the condensate polishers they’re a set of eight filtration tanks that filter the water coming out of the condenser before sending it back into the expensive and very delicate steam generators the steam generators are made up of very tiny tubes so if there’s a Fleck of rust or a Fleck of dirt that got into the water it could clog up one of those tubes and the only way to unclog them is to take the steam generator apart and rebuild it so it’s very important that that water get filtered and I should point out these are not the actual condensate polishers from Three Mile Island as you can imagine it’s very difficult to find a picture of a specific component of a specific nuclear power plant but this is what condensate polishers look like they’re big metal tanks ten hours earlier the swing shift started working on unclogging a clog in the number seven filtration tank it was completely blocked they were using the built-in backwash system but unfortunately when they built Three Mile Island they didn’t build the backwash system powerful enough so often the the system would develop clogs that the backwash just couldn’t take care of so the operators early on had developed sort of this backup system where they took air from the pressurized air system and also injected that into the condensate polishes and used it to break up the resin beads so this clock this clog was proving to be particularly intractable they were using both systems to try to unclog it 3:59 in the morning Fred seaman has gone down into the basement of the reactor building to see what the status of these tanks is and he’s climbed up on the side of the number seven tank and he’s peering in the viewing port to see if they’re making any progress on the clog when suddenly things get very quiet now hundreds of tons of water per hour are moving through these tanks so the quiet is very disconcerting and he barely jumps free before a water hammer comes through and knocks the feed water pipe completely out of its moorings for the condensate polisher tanks what’s happened is over 10 hours a leaking one-way check valve has allowed water to go from the condensate polishers up to the manifold that controls the vowels for the condensate polisher tanks when the water finally reaches the man fold it blocks the flow of air and all eight valves close simultaneously completely blocking the flow of water obviously this is not good help us understand why here is a schematic of Three Mile Island unit two it’s a little more complicated than the diagram before but I’ve colored it the same and I’ll get you oriented real quick here in the center is the reactor vessel that’s where the nuclear chain reaction generates heat right beside it on either side or the steam generators where water in the primary loop transfers heat to the secondary loop and boils water to create steam over in the turbine building you have your turbine and generator where the electricity is made here’s the condenser where the steam gets cooled down enough to change back into water right after that is the condensate polisher and it’s completely blocked there is no water to be pumped through the secondary cooling loop so the first thing that happens is the main feedwater pump strip offline it’s 36 seconds past 4:00 in the morning the official start of the accident two seconds after the main feedwater pump strip the turbine senses it’s not going to getting any more steam so the turbine and generator trip offline as well and the plants main safety is open now this is one of the benefits of a pressurized water reactor the water in the secondary loop that creates the steam is not radioactive so it’s not a problem and they just dumped thousands of tons of steam into the night sky if you heard it it would be pretty scary it’s quite loud but it’s not radioactive at all in the control room at Frederick and Craig Faust are getting their first indications that something has gone awry an alarm horn announcing the turbine trip starts going off in several alarm indicators start to flash a few seconds after the turbine and generator alarms go off the pressure on the reactor vessel is starting to climb rapidly now the good thing is this pressure spike is expected the reactor is designed to cope with just such an event without the secondary loop to remove heat the primary loop heats up and like I mentioned earlier when water heats up it expands so it’s natural that the pressure is going to go up and the reactor immediately kicks in systems that are designed to counteract this pressure spike the reactors pressure control system is the first one to jump into action and there’s two components to this system and both are important to the accident sequence first we’ll talk about the pressurizer now in a pressurized nuclear reactor the pressurizer serves several purposes first it regulates the system pressure because it’s the highest point in a closed system when you need to adjust the the pressure of the primary cooling loop you just have to raise or lower the pressure in the pressurizer and adjust

the pressure system-wide second when they designed this reactor they designed it without water level instrumentation in the reactor vessel and so they use the pressure raisers as a proxy for that water level measurement if there’s water in the pressurizer you can reasonably assume that there’s water in the rest of the system because it’s the highest point if there’s no water in the pressurizer well then you’ve got something to worry about the third purpose steam is significantly more compressible than water and so the pressurizer acts as a shock absorber for the system they maintain a bubble of steam at the top of the pressurizer so if there’s a sudden pressure spike somewhere in the system that steam will compress and absorb the pressure spike preventing a water hammer that could knock pipes loose so the steam in the pressurizer absorbs the initial shock of the secondary loop going offline but the pressurizer is really only designed for finite pressure adjustments it would take several minutes to regulate the kind of pressure that’s carrying in the reactor right now it’s about 200 psi over standard operating pressure and continuing to climb so the second thing that happens is the pilot-operated relief valve opens now if you’ve heard anything about the Three Mile Island accident this is probably the component that sticks in your head this is the one that gets all the press in the event of a big pressure spike pilot-operated relief valve will open and release coolant into the drain tank on the containment building floor the pilot-operated relief valve opens four seconds after the turbine and generator trip offline a few seconds later the computer senses that reactor pressure despite the pilot-operated relief valve being open is still continuing the climb so it takes another defensive action it scrams the reactor now I understand what this means you have to understand what’s happening in the core of a nuclear reactor what’s happening is there are neutrons flying around everywhere and occasionally those neutrons hit a uranium atom when that happens the neutrons sticks the uranium atom and causes it to fissure that’s where we get the word fission from so that that uranium atom splits when that happens it releases tremendous heat and a couple more neutrons those neutrons in turn go off and hit other uranium atoms cause them to split and that’s your nuclear chain reaction so it continues releasing heat the way that we control this reaction is through a set of cadmium control rods that can be inserted into the core of the reactor now cadmium loves to absorb neutrons they’ll bond to it very readily much much easier than the uranium atoms that are floating around the core of the reactor and so the cadmium control rods are typically raised and lowered kind of like a throttle to adjust the amount of reactivity in the court unless there’s an emergency that’s where the scram comes in the scram is like the ax murderers emergency stop when the reactor is scrammed these control rods are released from the control mechanism that raises and lowers them and they drop by gravity to the bottom of the reactor Pressure Vessel and this stops the nuclear chain reaction almost instantly but it doesn’t stop heat production at least not entirely you see when the nuclear chain reaction takes place there’s fission byproducts that are created other radioactive elements that are created out of the primary fission process and these other elements continue to decay even after the primary chain reaction shuts down so even once you scram the reactor the reactor core is still going to be producing about six and a half percent of the heat it was producing before the shutdown so still a significant amount of heat it takes several hours for this to cool down enough to be safe in the hours after a reactor shutdown it’s crucial that you continue carrying heat away from the core because that decay heat is enough to cause significant damage to the core a few seconds later back in the control room elide on the console turns from red to green to indicate that the pilot-operated relief valve has been signaled to close at this point after the scram the the reactors comfortable it’s confident that everything is in control and so are the operators this is a situation that well it doesn’t occur every day and you wouldn’t want it to because you want the plant to be producing electricity it is one that they prepare for and have procedures for they’re very confident that the system is behaving exactly as it should be that confidence lasts for about two minutes because two minutes later their world is thrown into chaos when the emergency core cooling system kicks in specifically the high pressure injection pumps start dumping a foul gallons a minute of cold water straight into the core now this was unexpected and a little bit confusing to bills Ely in his crew the plant had gone from a state that they understood to one that they didn’t as soon as high-pressure injection kicked in and the reason this confused them so much is that they were watching the pressurizer level and it was rising seeing the water level in the pressurizer continue to rise told him there was plenty of water in the system and so they couldn’t understand why the reactor thought that it needed more water and so two and a half minutes after the high-pressure injection system kicked on fred shimon makes the call to turn it back off had he not made that decision had he let it continue to run the plant might have been down for a day or two before they could restart it would have been a minor incident we’re now five minutes into the accident there’s something that’s perplexing bill Z we at this point you see the water level in the pressurizer is still

continuing to rise but the pressure and the reactor vessel is continuing to drop now this is a problem because if the pressure continues to drop enough eventually the water in the core will start boiling and if the water and the core starts boiling it can’t effectively cool the core he has a hunch about what’s going on he suspects that the pilot-operated relief valve might have stuck open and that might be why the system is maintaining having trouble maintaining pressure so he double checks the pilot operator relief valve indicator on the control panel it shows closed my key expects to double check he has one of the operators run around to the back of the instrument console and check the outlet temperature of the pilot operator really felt to see what the outlet temperature is the operator reads back two hundred and twenty-eight degrees Fahrenheit around 180 Celsius and so Z we moves on there’s problem with that decision now and the problem is that the plant operation manual indicates that any reading over 200 degrees Fahrenheit indicates that the pilot-operated relief valve is stuck open and it requires the manual block valve in front of it should be closed heads II we decided to close the block valve he would have interrupted the loss-of-coolant accident that was in progress again plant would’ve been shut down for a day or two and then right back on line no big deal we’re now six minutes in five minutes later at 4:11 in the morning another alarm go off this one indicates that the sump in the containment building is filling up and the sump is a giant pit at the bottom of the reactor building that collects any water that might be vented or leaked from anywhere in the system you don’t know if this waters radioactive or not so you want to collect it filter it make sure it’s safe to discharge before you just dump it in the river in this case what was happening is that so much water has been released from the pilot-operated relief valve that it’s overflowed the drain tank on the floor of the reactor vessel and flowed into the sump filling the sump up now enough water in the sump is a very clear indication that the system has a significant leak but the crew miss it all of them the core is in serious trouble at this point but the operators aren’t done yet see just after 5:00 in the morning the floor of the control room starts to rumble subtle at first but before long it becomes really difficult to ignore what’s going on is that the primary coolant pumps of the reactor is starting to push around steam in addition to the water that they were designed to pump the water in the reactor core is boiling and the reactor pumps and pumping the steam are encountering significant turbulence and this is causing the vibrations that the reactor operators are feeling now they know what their training says to do when this happens in order to keep the very large a very expensive pumps from vibrating themselves to pieces and causing the coolant loop there to shut them off and so they hold off as long as they can but 15 minutes NZ we can’t stand it any longer and he makes the call to shut off one set of primary coolant pumps that helps for a little while but 30 minutes later the reactor the the vibration is back and worse than ever and so bill Z we makes the call to shut down the second set of pumps it’s now 544 in the morning and a nuclear reactor that was running at 97 percent of capacity less than two hours earlier now has no coolant whatsoever moving through its core now it doesn’t take very long for the effects of no circulation to make themselves known at 6:00 in the morning exactly two hours into the accident a radiation alarm like this one starts going off in the containment building now there’s a couple of things that we can infer from this radiation alarm going on number one for radiation alarm to be going off one or more of the sealed fuel rods has ruptured the fuel in the core of Three Mile Island is all encased in zirconium rods like this this isolates the fuel from the surrounding water and it keeps the rain from leaching into the cooling water so if there’s a radiation alarm going off it means that one of these rods is leaking and second if a fuel rod has been damaged it’s almost certain that the water level has dropped below the top of the core we know that we now have an exposed nuclear core and by this point plant leadership has started to make it to the plant Gary Miller is the station manager the chief executive of Three Mile Island this is his plan and George Kundra is the Technical Support Manager for Three Mile Island unit two he manages all the nuclear engineers health physicists chemists etc almost as soon as they walk in the door they join a conference call with Leland Rogers the the site representative for reactor designer Babcock & Wilcox and one of the first questions that Leland Rogers asks them is they closed the block valve right the block valve the valve that bill Z we decided not to close earlier George Connor yells to somebody in the control room is the block valve shut a few seconds later the answer comes back yeah it’s shut and so at 6:22 in the morning because of a question from Leland Rogers the block valve was finally shut ending

the loss-of-coolant accident now this would have been the right thing to do about twenty minutes into the accident but doing it now actually made things worse you see with all the coolant pumps turned off closing the block valve eliminated the only source of cooling this poor reactor had left the only way it was discharging any heat was by boiling the coolant off out through the pilot operator relief valve with the block valve closed to the heat in the core intensified rapidly it took about eight minutes from this point for the top of the quarter collapse subsequent calculations would show that by 7:00 in the morning the core was 2/3 uncovered and the temperatures in the hottest part of the core around 4,000 degrees Fahrenheit about 2200 degrees Celsius hot enough not only to melt the cladding around the fuel rods but the fuel itself at 7:20 in the morning the radiation alarm in the top of the containment building goes off the very top of the dome indicating a reading of 800 rym per hour now to give you some idea of what 800 REM per hour means if one of the workers from Three Mile Island we’re standing in an 800 REM per hour radiation field they would get their maximum yearly allowable radiation radiation dosage in 20 seconds it’s a lot of radiation the crew had largely been in denial about core damage after the first alarm but the second alarm left them no doubt they knew that the cord started to melt down at this point so immediately after this alarm they try to turn the high-pressure injection pumps back on but they turn them back off after 18 minutes because they’re filling the pressurizer up again they’re so concerned about filling the pressurizer it wasn’t until 8 26 in the morning after the situation continued to worsen that they finally reenable high-pressure injection for good largely out of desperation not sure what else to do it would take until 10:30 in the morning for the Corps to finally be covered again ending the primary accident sequence over the next few days there would be continued worry about a nuclear release at the plant and so they’d keep monitoring the situation on the ground and by flying helicopters overhead with radiation detection equipment but the redundant containment designed into the plant had done its job there was no significant radiation release that ever occurred at the plant there would be public concerned about a potential hydrogen explosion in the containment building because when the the fuel rod cladding melted it reacted with water and created a significant amount of hydrogen but it turns out the calculations were wrong and the fear was never really significant never really a risk on Sunday April 1st four days after the accident President Jimmy Carter his rifle Roselyn would visit the plant to measure the him to reassure the American public about the safety of nuclear power he would later convene an investigatory Commission that would result in this report on the accident that I’ve drawn a lot of the facts for this talk from Three Mile Island unit two would be written off as a total loss about 20 tons of melted uranium ended up at the bottom of the reactor vessel another ten tons ended up sitting right in the middle of it this is what they found when they began the initial cleanup in 1983 what you’re looking at in this picture is severed and melted fuel rods that ended up at the bottom of the reactor vessel the final cost of that initial cleanup was just over a billion dollars about seven hundred and fifty million pounds and it took about 14 years and it’s still not finished Three Mile Island unit two is still standing in the middle of the Susquehanna River in Pennsylvania this is a picture that one of my colleagues actually took earlier this year on a work trip to to Pennsylvania you can see unit 1 on your right still producing electricity unit 2 actually won’t be fully cleaned up until unit 1 is shut down and decommissioned currently scheduled for 2034 so what happened how did these four men miss so many signs along the way that the reactor was in the middle of a loss of coolant accident why didn’t they just leave the emergency cooling system on when it activated why didn’t they close the block valve sooner they don’t know what they were doing maybe we’re looking at this the wrong way Sydney Decker’s wonderful book the field guide to understanding human error is an in-depth guide to investigating and understanding what happened when things go wrong in it he introduces the concept of first stories and second stories the story I’ve just told you is the first story of the accident of Three Mile Island deliberately so first stories focus on the humans in the story and what they should have done differently a first story almost always lays the blame for an accident at the feet of the humans that were involved in the decisions that they made but there’s a couple problems with this in the form of biases that we all have the first is hindsight bias this is the phenomenon where when you review an event after its occurred and you know the outcome you exaggerate your own ability to have predicted and prevented the outcome sometimes it’s referred to as that I knew it all along effect a good example in this case is well I don’t know that much about nuclear reactors but I think if I saw water pooling at the bottom of the reactor

building I would have noticed that there was a leak the second is outcome bias and this is where once you know the outcome of the situation you carry the full weight of that outcome into evaluating every decision that led up to it makes you more willing to judge those decisions and more likely to judge them harshly a good example here is that turning off the emergency core cooling system early in the accident is obviously a stupid decision when you know that the outcome is a partial meltdown but Fred scheming didn’t know that when he made that decision focusing on what he did know is the first step in finding a second story the second story human error is seen as an effective systemic vulnerabilities deeper inside the organization not a result of bad decision-making or somebody not following the rules how do we get to a second story first we work from the participants reality we dig into the decisions from the perspective of the people that made those decisions we work to consider the messy reality that they were facing when they made them not the clean room conditions that your mind gives you in hindsight and second we assume positive intent we come at this with the belief that everyone involved made the best decisions they could with the information they had like Jenny said earlier today your team shouldn’t have to earn your trust they should get it implicitly so let’s see if we can find some second stories from through my island and let’s start with that that decision that Fred seaman made early on to turn off the emergency core cooling system why did he make that decision decision five minutes into the accident well we’ll find our answer in the pressurizer in his deposition of the presidential inquiry Fred Shaymin says he turned off the emergency core cooling system because it was causing the water level in the pressurizer to rise and he was afraid that it was going to go solid go solid what does that mean well remember that one of the pressurizes purposes is to absorb systems pressure shock shocks in the system if you allow the pressurizer to fill completely with water there’s no more shock absorption capability there and that’s what he means by the pressure eyes are going solid he’s afraid it’s gonna fill completely with water and the system will be left unable to cope with any transient pressure spikes well that’s a problem but it doesn’t seem like it’s as big a problem as the core melting down so why is Fred Shaymin more concerned about the pressurizer than he is the core and the answer to that question goes all the way back to Admiral Hyman rickover and the nuclear Navy because you see Bill Z Fred seaman at Frederick and Craig Faust we’re all former you US Navy nuclear reactor operators they’d all run nuclear reactors on submarines and the naval reactor training created by Admiral rickover emphasized that the primary job of a naval nuclear reactor was to keep operator was to keep the pressurizer from going solid the single most important thing a reactor operator could focus on in a 1960s era submarine reactor made all the sense in the world because the 1960 submarine reactor produced about 12 megawatts of thermal energy Three Mile Island two thousand eight hundred and forty-one megawatts of thermal energy to produce its nine hundred and six megawatts of electricity it’s pretty normal for a nuclear power reactor to have to produce way more thermal energy than electricity because of losses and inefficiencies in the system like I mentioned earlier when you scram a reactor the primary heat production stops immediately but there’s still the kehe being produced that takes a while to go away and a submarine reactor that Energy’s trivial it’s around 780 kilowatts if you shut down a submarine reactor core and dump all the water nothing will happen that energy can go into the air it won’t cause any damage to the fuel through my island immediately after shutdown was still pretty producing 185 megawatts of thermal energy that is more than enough to melt the fuel in a submarine of water hammer with no shock absorption is the worst case scenario where it could result in the loss of propulsion and a disabled ship carrying that mentality into the operation of a power reactor where far worse things could happen was an incredible systemic vulnerability and it was one that was unreal eyes before the accident it threw my Island we just didn’t know that that was a thing that would be in the minds of reactor operators until this happened and so Fred seaman faced with the rising pressurizer inferred that the system had plenty of water in it already and continuing the to allow the emergency core cooling system to operate would overfill the system risking a solid pressurizer so he disabled emergency core cooling to keep the system safe in his mind that’s why he made that decision let’s look at another decision why did bill Z we not closed the block valve when he first checked the pilot-operated relief valve if you’ll remember the reported outlet temperature was 228 degrees Fahrenheit and plant operation manual called for the block valve to be closed for any temperature over 200 degrees it’s an open-and-shut case he didn’t follow the rules he didn’t do what the book said to do except that the pilot-operated relief Alf had been leaking almost from the point that Three Mile Island started operating very slightly but enough steam that the crew

regularly observed outlet temperatures of over 200 degrees Fahrenheit so if they shut the block valve every time they saw temperature over 200 agrees fahrenheit they’d be shutting the reactor down all the time and so they made the decision that they would continue to operate the plant until the next refueling shut down the pilot-operated relief valve is on the nuclear side of the plant so there’s not any way to repair it while the plants operating and they didn’t want to take the plant offline to repair such a small problem so they kept it going now this meant that they regularly saw those high temperatures and so when bill Z we saw temperature of 228 degrees Fahrenheit and knew that the pilot-operated relief off had just been discharging scalding hot water seemed perfectly reasonable to him on top of that he also had a clear indication that the pilot-operated relief fouled was closed he had a light right on the console right in front of him but the thing bill Z we didn’t know is that that indicator didn’t actually indicate the position of the valve all it indicated is what the computer had told the valve to do a red light meant that the computer told the valve to open a green light meant the computer told the valve to close but there was actually no indication anywhere on the control panel of the valves actual position the only way to know the position of the valve was to infer it from the outlet temperature and so bill Z Lee assimilating all the information that he had had it had at his disposal and considering that closing the block valve would eliminate the system’s ability to release pressure decided to keep the block valve open in order to keep the reactor safe I made the best decision he could with the facts at hand let’s do one more quickly why did the crew not respond when the sump alarm went off how did they not know that they had a coolant leak and the answer to that one is really simple it turns out they never got the alarm the control room relays alarms to them in two ways there’s a giant panel of lights and there’s another one like this on the other side of the room and each one of them represents an alarm there’s a few problems with them first they’re really noisy on a good day when the plants operating exactly as it should be 40 or 50 of these lights are lit up due to sensor miss calibration or alarm parameter miss calibration lots of different reasons but 40 or 50 of these lights are always on second there’s no rhyme or reason to their placement the alarm light for reactor vessel pressure is right next to the alarm light that indicates a stuck elevator in the reactor building and finally they don’t indicate any chronology you can’t tell from looking at them when they went off and you can’t tell from looking at them which ones are new since the last time you looked at the panel’s now this one they actually had an answer for you can see there’s a dot matrix printer right here in the foreground that’s the alarm printer every time an alarm goes off this printer would print a line indicating the alarm so you get a log of events that have occurred in the reactor the only problem is this printer runs over a 300 baud serial connection it’s very very slow so less than an hour knacks the accident there’s more than a hundred alarm lights illuminated and the printer is running about two and a half hours behind printing alarms so there’s no way they can prioritize the flood of information that they’re getting they can’t react to the sump alarm because they never know that it occurred they never got the message that the sump had filled up so how do we implement this idea first and second stories with our teams dr. Decker has some helpful advice for us first when we’re trying to figure out why something went wrong we agree on a baseline rule that human error is never the cause human error is always a system of some underlying systemic problem or problems so blaming an issue on human error keeps us from figuring out what actually went wrong a helpful way of framing this in these terms is to ask this question ask what is responsible for an accident not whose fault it is second understand why it made sense this goes back to assuming positive intent you know that the people on your team don’t come to work intending to do a bad job chances are when they make a decision you don’t understand or miss something obvious there’s a good reason why they did what they did take the time to understand why it made sense to them because if it made sense to them it’ll make sense to somebody else later so if you don’t understand why I made sense now it’s gonna happen again third seek forward accountability not backward or instinct when things go wrong is often to find who is responsible and punished them when we try to move our organization away from blame and towards finding second stories one of the most common objections is but what about holding people accountable who here is implemented blameless post-mortems on their team anybody who got this question when you started pushing for blameless post-mortems how do we hold people accountable how will we how will we make sure that the people that do something wrong don’t do it again it’s a very common question one of the reasons that blameless post-mortems are so important is that removing removing punishment actually frees people up to candidly share their stories of what went wrong so that you can learn from the stories instead of sweeping them

under the rug to try to avoid punishment but there’s another deeper reason that this is important it turns out that the act of telling the story of what happened giving their account is all the accountability a well-meaning person needs to change their behavior next time they just need to tell their story and on their part in it they don’t need any punishment from you to internalize the lesson they’ve probably already beat themselves up plenty and don’t need you digging the hole deeper for them backwards accountability looks to blame someone for past events forward accountability seeks to help people focus on the work necessary for change and improvement going forward the beauty of this technique is that it’s so broadly applicable there’s always a second story if you’re willing to do the work to find it it works when someone drops the production database when the team misses an important deadline when a key team member chooses to leave or even when sales misses their quarterly target it requires honesty and building trust but it’s worth it because finding the second story is such a powerful way for your team to grow and improve and it allows you to treat your teammates with the humanity that they deserve it turns out that who destroyed Three Mile Island isn’t a fair question at all the better question to ask is what destroyed through Mile Island and thankfully that’s exactly the question that the President’s Commission asked when they wrote this report notice the subtitle of the report this report is full of second stories and those second stories revealed weaknesses in reactor design operator training throughout the world by getting past human error to the real causes the President’s Commission actually made the world a safer place if you’ve studied any about human factors and human factors engineering a lot of that work comes out of the study that was done on the incident at Three Mile Island this is the root of a lot of that line of take the time to find the second stories for everything not just your outages you’ll not only address the things that impact your delivery speed and quality you’ll make your organization a safer place for the people who work there and you’ll empower them to do their best work best of luck