Meltdown by Chris Clearfield, András Tilcsik (Book Summary)

What comes to your mind when you hear about the word meltdown– perhaps an accident in a nuclear reactor? Once you’ve gone through these chapters, you’ll get to call on lots of examples of specifically modern types of a meltdown, the reasons for it and how you can stop the same from occurring to you or your organization.

We’re living in an era of extraordinary technical ability like transport, commerce, medicine, power and so on, the systems that surround us are very advanced and still very complicated than ever before. This is the reason why authors Chris Clearfield and András Tilcsik wrote Meltdown, to explain the means on how complexity causes failure. Also, they’re making the solutions available to all of us.

In these chapters, we’ll learn about the main components that cause modern system failure and how to address them. Also, we’ll analyze the tools required for failure-proof systems and organizations such as structured decision-making, diversity, dissent, reflection, iteration and a list of warning signs.

Buy this book from Amazon

1 – Modern systems regularly fail for the same reasons in extremely different settings.

What does BP’s oil spill in the Gulf of Mexico, the Fukushima nuclear disaster and the global financial crisis have in common? Truly, they’re all crises; however, they also share similar basic causes.

Modern systems are more capable than ever before, however even increased capability has also increasingly difficult and also made systems less forgiving. For example, let’s take a look at the finance industry: the change from face-to-face to computerized stock-market trading has assisted in lessening operational costs, increased trading speed and also given more power over transactions. 

However, digital change has also made the system complex to understand and increased the chance of difficulty, unpredicted interactions. Also, finance has turned into a perfect example of what Charles Perrow a sociology professor would call a complex, tightly coupled system. 

Perrow was an expert on organizations and he was employed during the late 1970s to inspect the roots of a nuclear accident in Pennsylvania. What he found transformed the science of catastrophic failure. 

Perrow recognized a mixture of small failures behind the disaster that work together in a domino effect. Instead of putting the blame on the nuclear plant’s operators or calling it an anomaly incidence, Perrow acknowledged that the accident had been caused by features in-built in the plant as a system – difficulty and tight coupling. 

Tight coupling is a term used in engineering for when a system is hard or has a bit of a barrier between its parts. For instance, the system for preparing a Thanksgiving dinner is tightly coupled: the meal requires a lot of elements that depend on each other, like stuffing that cook inside the turkey and gravy from the roasted bird’s juices. Also, with just one oven in most houses, one dish could set back all the rest.

Complexity in a system signifies that it’s non-linear and difficult to see inside. If we keep to the Thanksgiving dinner analogy, cooking a turkey is a complex system, because it’s difficult to see the inside of a bird in order to determine if it’s cooked or not. Complexity in a system makes it difficult to see the problems and their knock-on effects.

A combination of complexity and tight coupling is what Perrow terms the Danger Zone. This is where meltdown – the failure or breakdown of a system – becomes extremely possible. Thus, that tightly coupled, complex Thanksgiving dinner could well be ruined unless precautions are taken.

Perrow’s complexity/coupling formula shows the shared DNA behind all types of modern meltdowns, hence, a failure in one industry can now offer lessons in other fields. We’ll learn about how in the next chapters. 

2 – Failure can be prevented by decreasing complexity also increasing the buffer between the parts of any system. 

When you drive, you use your seatbelt even if you don’t know the actual nature of any accident that could occur. You might be aware that the danger is there and the seatbelt could possibly save your life. In the same way, Perrow’s complexity/coupling formula helps avert failure without showing the actual shape it could take.

This shows that you can prepare ahead to limit complexity, for example by increasing the transparency of a system. Failure to do so can even result in serious accidents. Take the story of Anton Yelchin a 27-year-old Star Trek actor, who died in 2016 getting out of his Jeep Grand Cherokee when the vehicle rolled and restrained him against a brick pillar. 

The cause of the terrible incidence was the structure of the car’s gearshift. It was stylish, however, it didn’t really show if the car was in “park,” “drive,” or “reverse.” Meaning, the system was pointlessly opaque and complex, which made Yelchin to wrongly assume that the car would stay still. The disaster could have been prevented if the gearshift had been made transparently and shown clearly the mode the Jeep was. 

Transparency lessens complexity, making it difficult to do the wrong thing – and easier to notice if you’ve made a mistake.

Although, sometimes, transparency isn’t possible. If you think of a journey to climb Mount Everest, there are a lot of unknown risks like crevasses and falling rocks to avalanches and sudden weather changes. The mountain is continuously going to be an opaque system. Therefore, mountaineering companies troubleshoot small problems such as delayed flights, supply difficulties, and digestive ailments before they can gather it into the main crises. This hinders such problems from delaying the final climb, where there’s little margin for error.

There’s always the buffer when the complexity won’t change

Gary Miller a nuclear engineer turned management consultant, talks about how he saved a bakery chain from failed expansion by increasing their buffer. Earlier before the rollout, he saw that their new menu was too complex and depended on a complicated network of suppliers. When they refused to solve this, Miller convinced them to lessen their aggressive launch schedule instead, which gave them enough slack to solve the problems when they unavoidably appeared. 

Perrow’s complexity/coupling formula helps to understand if a project or business is susceptible to failure and where. It recognizes vulnerabilities in a system, even if it can’t say precisely what will go wrong. As Miller says: “You don’t have to predict it to avert it.”

3 – Utilizing structured decision-making tools can help you evade big or small disasters.

We regularly experience life making snap judgments or making use of our instincts. There’s no harm in doing that until we see ourselves working in a complex system.

Engineers of the Fukushima Daiichi nuclear plant in Japan made use of their instincts when they falsely projected the height of their tsunami defense wall. On the 11th of March, 2011, an earthquake produced a wave numerous meters greater than the initial thing they had planned for, which flooded the generators in charge of cooling and caused the world’s worst nuclear accident in 25 years.

However, how could the engineers have done any better?

The types of defenses are huge and very expensive to make, and as they couldn’t build an extremely tall wall, the engineers had to forestall a height that they were very certain would work. In order to do this, they made use of confidence interval which is a calculation that is based on comparing the highest likely wave height and the lowest. The problem is that humans are not really good at making these types of predictions: the ranges we draw are very narrow. 

One way is to utilize a structured decision-making tool called SPIES, which is the acronym for Subjective Probability Interval Estimates. It drives us to consider a wider range of results. Instead of only evaluating the best and worst possible scenarios, SPIES estimates the likelihood of a lot of outcomes within the whole range of possibilities. 

Although it’s not perfect, however, various research has regularly shown that this approach gets the correct answer all the time than other forecasting means. If the Fukushima engineers had used SPIES, they could have protected against overconfidence and been less likely to ignore the seemingly implausible scenario that flooded their defenses. 

A different structured decision-making tool is the use of predetermined criteria, which enables us to concentrate on the factors that actually matter. Let’s use the Ottawa Ankle Rules as an example, this set of predetermined criteria was created in Canada during the early 1990s and to decrease doctors’ use of pointless X-rays of feet and ankles by a third. By concentrating only on the pain, age, weight-bearing, and bone tenderness to determine whether an X-ray was needed, they evaded getting side-tracked by inappropriate things such as swelling.

In complex systems for instance like medicine and tsunami prediction, the effects of our decisions are difficult to comprehend or learn from, and our instinct regularly fails us. That’s when tools like SPIES and predetermined criteria can offer interruption to business as usual, letting us face our choices systematically.

4 – Complex systems emit warning signs that can be used to save lives, money, as well as reputations.

We regularly choose to overlook warning signals – for example, if there is a blockage in your toilet, would you simply think of it as a slight inconvenience, or consider it as a warning sign of a looming flood? 

Sometimes, disregarding warning signs can lead to terrible outcomes.  In the year 2005 in Washington DC, three metro trains approached a few feet of crashing deep under the Potomac River. It was just luck and fast action by the train drivers that saved the day.

Engineers speculated that the original cause was an issue with the track sensors, however, before they arrived there to fix it, the issue went away. Hence, they created a testing procedure, hoping to guarantee that the same malfunction couldn’t occur again elsewhere. The issue was, their bosses forgot immediately about this near-miss and stopped doing the tests. Four years after, the same malfunction happened in a different spot, leading to a terrible crash and the deaths of nine people.

We regularly ignore the signs in small errors, as long as things end up OK. That near-disaster in 2005 was a warning signal that the metro organization decided to disregard. An important feature of complex systems is that we can’t discover the entire problems only by thinking about them. Fortunately, before things fall apart, the majority of systems emit warnings.

Different from the DC metro, the commercial airline industry is a major example of how focusing on small errors can pay off. By doing this, they have collectively lessened serious accidents from 40 in every one million departures to two in every ten million over the past 60 years in a process known as anomalizing. 

This is how it works:

First, it is important for data to be collected on all flights. Afterward, the issues need to be raised and solved– incident reports shouldn’t collect dust in a suggestion box. The third step is to understand and solve the sources instead of considering mistakes as a series of isolated incidents. For instance, if pilots on a specific path keep flying dangerously low, then there could be a core sign or signage problem.

The last step is to share learnings, which conveys a clear message that mistakes are common and it enables colleagues to forestall issues that they are likely to encounter one day. 

In systems such as air and metro travel, we can learn from specific operational occurrences. On the other hand, business owners can learn from a devoted team or a trusted adviser appointed to find threats from competitors, technological disruptions and regulatory changes.

5 – Supporting dissent makes groups very effective and systems very strong.

Speaking up isn’t easy. As a matter of fact, neuroscience illustrates to us that a wish to obey is hard-wired into our brains. However, that doesn’t make conflict any less valuable. This is why:

In a strange-but-true study of airline crew errors, the US National Transportation Safety Board (NTSB) saw that between the years 1978 and 1990 almost three-quarters of main accidents occurred when it was the captain’s time to fly. That is to say, not the less-experienced first officer. That was shocking because the captains were flying 50% of the time. Therefore, their errors should have been the same or less than their deputies’. 

Hence, the NTSB investigated deeper.

They established that the captains weren’t worse at their jobs – far from it – however, that their seniority signified that their mistakes were going unchallenged. Their first officers didn’t have the tools with which to give the captain feedback and were keeping their worries or giving unclear clues instead of raising alarms. The ranking was putting lives in danger.

Therefore, the airlines created a groundbreaking training program known as Crew Resource Management (CRM). The pilots assumed that it is really basic that they jokingly called it charm school; however, it broke the taboo around raising worries. CRM totally reduced the number of accidents and leveled responsibility to 50:50 between pilots and their deputies. By democratizing safety, the program encouraged everyone from cabin crew to baggage handlers to voice their worries, also harnessing the motivational power of shared responsibility. 

Hence, how do you boost dissent in other kinds of organizations?

One effective approach is through open, as opposed to directive leadership. Directive leaders will mention their own ideal solutions from the beginning of a conversation and tell their colleagues that the aim is to reach an agreement. 

An open leader will refrain their own opinion until last and encourage colleagues to talk about a lot of viewpoints as many as possible. Incredibly, this simple method is confirmed to produce more possible solutions and almost double as many facts, which enables a better-informed discussion. Simple!

If hierarchies, social problems, and even our brains’ wiring work against dissent, then it’s vital for leaders to foster more than just an open-door policy. As Jim Detert a dissenting expert describes, you have to actively encourage people to speak out, otherwise, you’ll be discouraging them.

6 – Creating diverse teams helps in decreasing risk and improve outcomes for organizations.

A lot of people approve that diversity is a just and a positive thing, right? However, are you aware that it’s also been proven to reduce the risk for organizations?

In a study that was conducted in 2014 by landmark, scientists proved the advantages of ethnic diversity when it comes to decision-making. In a basic stock-market simulation, they studied lots of diverse and homogenous groups as far afield as Singapore and Texas and evaluated the accuracy of their trading. Guess what? The diverse groups did very better than the homogenous groups, pricing stocks more precisely and making lesser mistakes. 

Interestingly, the study revealed that crashes were more severe and price bubbles more regular in homogenous markets, this is because homogenous groups put a lot of faith in each other’s’ decisions, which made errors to increase. However, in the diverse markets, participants were more critical of each other’s’ decisions and they copied less often, which caused more rational decision-making. 

In reflection, we can use the lessons of this study to understand the financial crash that happened in 2007 and 2008. Sallie Krawcheck the Former Citigroup CFO said in an interview that was conducted in 2014 that those accountable for the crash weren’t “a group of evil geniuses” able to predict the financial downturn, however, they were “peas in a pod.” Krawcheck blamed the lack of diversity for the poor decisions leading up to the crash and claimed that diversity enables it to be more permissible to ask questions without looking stupid or bothering that you’ll lose your job.

Therefore, why don’t all companies have mandatory diversity schemes? They must be effective, right?

This is wrong. In a paper that was written in 2016 by sociologists Frank Dobbin and Alexandra Kalev in a Harvard Business Review paper, they discovered that the most regularly used diversity programs failed to get outcomes. Not just that, but also over the past three decades and in more than 800 US firms, mandatory diversity schemes truly made organizations less diverse. They discovered that managers had rebelled against mandatory schemes because they believed that they were being monitored. They had resisted hiring diversely just to affirm their autonomy.

Fortunately, there are other solutions that are effective. Over the exact three decades, voluntary as opposed to mandatory mentoring schemes were proven successful in assisting diverse candidates to progress. These schemes naturally lessen bias with positive messaging like managers sensed that they were being revealed to new talent pools instead of having their hiring decisions monitored. 

Also, the study discovered that formal mentoring schemes were very effective than the informal ones because white male executives didn’t feel comfy addressing young women and minority men informally. Assigning them mentees detached this awkwardness and enabled them to mentor a diverse range of junior employees. 

With advantages like increasing accuracy and evading the next financial crash, diversity is the safe substitute for homogeneity and groupthink. The healthy skepticism that accompanies diversity enables organizations and decisions stronger.

7 – Important coping strategies for high-pressure situations are reflection and iteration.

We’ve all experienced – with the end of a project vision, there’s a strong desire to quickly rush to the finish, even if the conditions have been altered. 

This is called get-there-itis by the pilots. However, the technical term is plan continuation bias, and it’s a terrifyingly common factor in airline accidents. If you’re just 15 minutes away from your destination, and there is a change in the weather, it’s very difficult to divert to a close-by airport than when you’ve just started out. Maybe, you’ve felt the same effect while working toward a deadline? 

Pilot Brian Schiff was aware of the risks of get-there-itis when he declined to take a furious Steve Jobs on his charter flight. Regardless of pressure and entreaties, Schiff recollected his training and calculated that hot weather, hefty luggage, and hilly terrain would make take-off in their small plane dangerous. Schiff still recollects how daunted he felt as a puny 20-year-old in the firing line of Jobs’ rage. However, he didn’t budge and refused to fly. Schiff stood up to a very significant customer, however, instead of being castigated, he was rewarded for waiting and prioritizing safety in a high-pressure condition, and he was paid doubled for that. 

However, sometimes there’s no time for thinking when circumstances change: for example, in an emergency room, medics need to balance caregiving tasks such as resuscitation and administering medication with monitoring the patient’s complete condition. This is where an iterative process becomes essential.

An effective iteration consists of three basic stages which are tasks, monitoring, then diagnosis – or offering a solution. 

The three stages are then repeated in a cycle to assess and improve solutions on the go. 

A great domestic example of this is a parenting approach from the authors of a paper called “Agile Practices for Families: Iterating with Children and Parents.” In order to improve the commotion of their family’s morning routine, the Starrs decided to have steady family meetings in order to talk about what went well that week, what they could improve on and what they would oblige to improving in the following week.

After obliging to changes, they would repeat the questions at the next meeting, enabling them to concentrate on the most effective solutions over time. Their experiment was very successful that when New York Times columnist Bruce Feiler came to their house, he defined theirs as “one of the most amazing family dynamics I have ever seen.” 

You can make use of iteration to check in anytime you have a backlog of tasks and deadlines. The vital thing is to go over the steps and then re-evaluate once you’ve attempted a solution.

Meltdown: Why Our Systems Fail and What We Can Do About It by Chris Clearfield, András Tilcsik Book Review

We live in the golden era of meltdowns, however, it’s within our ability to bring that age to an end. The solutions highlighted in these chapters are difficult to implement because they regularly go against our natural characters or acknowledged organizational and cultural norms. However, if we give due thought to the complexity and tight coupling, we can reveal greater innovation and productivity in modern systems while avoiding tragic failure.

Provide yourself a pre-mortem.

You might have heard about post-mortem, however, did you know that it’s opposite – a pre-mortem – can assist in preventing failure? 

When planning a project, attempt to imagine failure as an inevitable conclusion. It has been shown by research that you’ll then consider of far more potential malfunctions than if you’d thought of what success would look like. This method harnesses what psychologists term prospective hindsight, and it’s also helpful for discovering more concrete and exact reasons for a result. 

Hence, when next you’re at the planning phase, instead of asking, “how can we make this work?” attempt “what could have been the reason for this to fail extremely?”

Buy this book from Amazon

Download Pdf

Download Epub

Audiobook Sample

Savaş Ateş

I'm a software engineer. I like reading books and writing summaries. I like to play soccer too :) Good Reads Profile:

Recent Posts