February 08, 2019

Commercial Aviation Part 7

A couple of installments ago, when I discussed safety, I talked about the standard procedure for addressing airplane-level structural problems found before they cause a crash. But the majority of aviation incidents today are more complicated, mostly because of how good we are at finding purely mechanical problems before they bring down a plane. I’m going to neglect terrorism and other outside factors throughout, as that’s a rather different discussion.

A rare crash with a happy ending

When people think of plane crashes, they tend to think of the plane slamming into the ground, and everyone onboard being killed immediately. While this does happen, it’s actually fairly rare. All of 2017 passed without an incident of this type, although there were three in 2018, most notably the Lion Air crash, and another in 2019. In the US, the last mainline crash where everyone onboard was killed was that of American Flight 587, in November of 2001. That crash was due to structural failure after the co-pilot overused the rudder to counter wake turbulence from the plane taking off ahead of them.

Somewhat more common are various incidents where the plane impacts the ground slowly enough that some or all of the people onboard survive. The most prominent recent example of this is Asiana Flight 214, the 777 that crashed when the pilots brought it in short at San Francisco airport in 2013. This kind of incident ranges from extreme cases where only one passenger survives (surprisingly common) to cases where everyone is evacuated safely, such as the US Airways flight that ditched in the Hudson. And then there are the cases where something goes badly wrong, but the plane itself is only damaged (although it may be subsequently written off), and everyone aboard survives.

To illustrate these better, I’m going to look at some prominent incidents, and what they tell us about what causes planes to crash. This will be weighted towards recent events, although I may throw in a few older ones.¹

Air France Flight 447: An A330 flying from Brazil to Paris crashed in the Atlantic in June of 2009, killing all 228 onboard. The investigators were unable to find the ‘black boxes’ for two years, which delayed the final report until 2012. The case was particularly mysterious because there were no obvious problems with the airplane before it hit the water. The sequence as eventually pieced together runs as follows:

1. The pitot tubes (the devices that provide airspeed data) iced up as the airplane was flying at the upper edge of a storm over the Atlantic.

2. The flight control computer detected the loss of airspeed data, disabled the autopilot, and switched to an alternate control law. (The Airbus fly-by-wire system has different control laws for different situations, like a lack of air data.)

3. The pilots overcorrected for a roll induced by the turbulence of the storm, and put the airplane into a steep climb for reasons that are not really understood.

4. The pilots ignored several stall warnings, and continued to try to climb. The Airbus flight computer normally prevents the pilot from flying into a stall, but it was inoperative due to the use of the alternate law, so the airplane stalled. Also, the stall warnings became inoperative due to the extremely high angle of attack.²

5. The pilots were unable to return the airplane to normal flight before it hit the ocean.

Ultimately, the cause of the crash was that the pilots did not fly the airplane properly. Reducing angle of attack during a stall is the standard and only way to resolve the problem. Why they continued to pull up is unknown, as this is piloting 101. A possible contributor is the design of the fly-by-wire system. In normal mode, it provides stall protection, not allowing the pilot to fly into a stall, and the pilots may not have realized that this protection was no longer in place. Obviously, if the pitot tubes had not iced up, there would have been no crash. The so-called ‘swiss cheese model’ was developed to describe air crashes, and has since been extended to other domains. In this model, there are a number of layers (pilots, airplane, maintenance, etc), each with holes in them. When the holes line up, an accident occurs. Good systems have few or small holes, while bad systems have larger holes. In this case, we can identify at least three layers, the pilots themselves, the pitot tube icing, and poor procedure/training in alternate flight modes. Any of these three not occurring would have saved the airplane.

Asiana Airlines Flight 214: I mentioned this above, but it’s another case where poor piloting and external factors interacted to cause a crash. Asiana is a South Korean airline,³ and Flight 214 was a 777 from Seoul to San Francisco. On the day of the accident, the instrument landing system (ILS) on the runway was out of service for maintenance. This forced the pilots to use a visual approach. The pilot flying the plane was new to the 777, but experienced on other airplanes. The command pilot had a lot of time on the 777, but was new to flight instruction. They came in too low and slow, partially due to a belief that the autothrottle was running when it was in fact not.⁴ The landing gear and tail struck the seawall, while the rest of the plane slid almost half a mile down the runway before coming to a halt. Two of the escape slides deployed inside the fuselage, and had to be punctured to clear evacuation routes. Of the 307 onboard, only three died. Two of them were not wearing their seatbelts on landing and were thrown clear of the airplane, dying on impact. (This is why you should always wear your seatbelt during takeoff and landing.) The third was run over by a rescue vehicle, possibly after also being thrown out of the airplane.

Our slices of cheese are again more complicated than simple pilot error. The crew also failed basic piloting, but the missing ILS, the misunderstood autothrottle, and the inexperience of the crew also were probably contributing. Many Asian airlines are known for placing more reliance on automation than US and European airlines do, which probably played a part. ⁵

Both of the crashes discussed here are the result of interactions of human and mechanical failure. This is intentional. For all the talk of unmanned airliners, humans are the best backup to problems with the automated systems, while automation can help to bridge the gaps when humans falter. But thanks to incredible efforts by manufacturers, operators, and regulators, crashes are so rare that they can be individually analyzed and dissected, instead of being a simple fact of life.

1 Shortly after I wrote this, I decided to end this series, so these never got written. Sorry. ⇑

2 John Schilling: Since aerodynamic stall is caused by high angle of attack, this is a major oversight. ⇑

3 Bean: North Korea has one airline, Air Koryo, which is famous among airline geeks for being rated as the worst airline in the world. ⇑

4 John Schilling: There is some disagreement whether an autothrottle should be used at all on a visual approach. ⇑

5 John Schilling: The common factor in these two is that when the automation stopped working, the human pilots didn’t understand what was expected of them. ⇑

Comments

February 08, 2019NealSchier said...
Excellent work on keeping an even-handedness in this look at safety.

I remember in the late 1980s, while I was on AF active duty, attending a required "stand down" safety brief after a C-141 had crashed in Florida. The Ops Group Commander (a full Colonel) was up on the stage at the base theatre (where the event was held) telling us how these gents had screwed up. Well...obviously something had gone wrong as the crew had died and the airframe was destroyed.

I found this, even though I was a new aviator, to be extraordinary un-enlightened. The question needed to be, if it was human error, why did the pilots make it?

Fortunately the industry no longer takes that approach and instead asks why? This mindset has led to advances in operating the aircraft, the design of the aircraft, and investigation when things go wrong. Almost all parties have realized that it is best, to start, to fix the problem instead of the blame.

This is why I commend your thinking here. Mistakes were obviously made, but wwhye need to know what we need to do so not only does that specific problem not occur again, but anything similar.

Should one wish to get into more detail on the Air France 447 accident, I recommend Captain Bill Palmer's book Understanding AF 447. It has become one of the standard works on this accident. He talked to Airbus test pilots, and while they would not pin down an exact altitude, Bill gained the impression that the stall was so deep that if the crew had not been well into the recovery by around FL200 to FL250 (roughly 20 to 25 thousand feet) that they would not have been able to recover anyway. Wow...

Sadly we will probably never knowm as you stated, why the flying pilot (in the right seat at the time) pulled back so much. He really was agressive in the pull.
February 09, 2019Lambert said...
Regarding Asiana Airlines Flight 214, do you mean to say that the crash threw several people outside the plane?
Out of the hole in the back, I presume.

And the swiss cheese model is an interesting way of looking at things. I remember learning in school about the Titanic, and how around half a dozen things had to go wrong for the disaster to be so bad. Any one of them could have saved most of the people onboard, if not prevented the collision in the first place.
February 09, 2019bean said...
@Neal

Thank you. Another good book, for anyone interested in human factors in aircraft accidents, is Breaking the Mishap Chain. It's a NASA book on human factors in experimental aircraft accidents, and it's free online.

@Lambert

I believe they were ejected through the hole in the tail, as all three were in the last two rows of the airplane.
February 09, 2019Inky said...
Thanks again for this excellent series.
Re: the Lion Air crash. It seems that every time the holes in the cheese line up it is "automatic system failure > undocumented behavior of the aircraft > failure of pilots to deal with it > crash". And, for the most part, the problem is dealt with by addressing the system failure, making it even more failure-proof, testing everything that can be tested. And this is a totally valid approach, I mean it got us where we are now, which is pretty amazing.
But it creates a environment heavily biased towards putting the automata even more in power. The cornerstone behind reliable engineering is supposed to be graceful degradation, right? So that even if the automated system fails, it fails in a way that humans are able to pick up the slack. Crash happens when humans fail to do so. And this is understandable too! As machinery becomes better and better at not causing crashes and all the obvious points of failure are ironed out, the places where things go wrong become more obscure. Sometimes the safety system malfunction becomes the cause of failure, up to disrupting the pilot's attempts to handle the situation (as far as I understood this was what happened to the Lion Air flight).
At the same time, as systems become more advanced and reliable, and airline competition becomes more severe, the pilots face increased pressure to work longer hours, take less breaks, which leaves them more stressed and less ready for the blowup when it will happen. And it is, in a way, perfectly logical too. After all, however many hours a person might spend in simulation, testing different failure scenarios, 99% percent of time a pilot will control the aircraft well within the safety margins, how it should be. In the ideal world, airline pilots would be former military pilots, having a wealth of experience in dealing with unplanned and thinking on their feet, but in practice, as more and more pilots are needed, airlines would turn to anyone who can handle the craft and accept the terms.
Blowups happen. And will happen, I guess. Ever rarer, though.
February 10, 2019bean said...
My take on Lion Air is that it was a basic failure of airmanship. The problem should have been controllable, and was controlled by the previous flight crew. For a while, it was also controlled by the flight crew of the accident airplane. Then they stopped controlling it. No idea why, nor why they didn't follow the checklist they had that should have fixed the problem.

The biggest problem is that different parts are under the control of different actors. The manufacturers have a very strong incentive to have no crashes, but they can't control what kind of yahoos are going to be given command of their airplanes. So they automate. But the system works best when backed up by good pilots, which usually come from either the military or an active general aviation community. John Schilling has described it better than I can, and it's worth noting the safety record of airlines from places with those features is really, really good.

At the same time, as systems become more advanced and reliable, and airline competition becomes more severe, the pilots face increased pressure to work longer hours, take less breaks, which leaves them more stressed and less ready for the blowup when it will happen.

There are strict rules on the lengths of shifts and the amount of rest pilots must take. Neal can speak to this better than I can.
February 10, 2019Neal said...
The famed attorney F. Lee Bailey, when asked what the secret of his success was, answered "Preparation, preparation, and more preparation." When it comes to transportation safety (and a good number of other fields) there can be no truer words.

We call it "training," but I have never met a pilot who can honestly claim to have had too much of it. Not only initial training, but timely and pertinent recurring training is a must for airline crews. Cliché that it might sound like, better to be over-trained than under. Training, and plenty of it, is the absolute foundation for air safety. Obviously a great deal else is important, but Boeing or Airbus' best design efforts are for naught if there is not good training underlying it all.

One of the things that the AF 447 accident brought out was that most airline pilots, unless they are flying/instructing on the side, have not performed stall work in years or even decades--except possibly in the simulator during a transition to a new aircraft type. Yet AF 447 crashed due to the pilots stalling a perfectly airworthy aircraft.

The problem is: What should the airlines train their pilots for? One cannot cover every eventuality and possible emergency and it would be foolish to try to run down every rabbit hole in pursuit of covering all the bases.

What one can do however, is work on the biggest and most serious threats. These include engine failures on takeoff, rapid decompression, fires, and electrical failure. Woven into these scenarios is practice on crew communication and checklist discipline. The manufacturers and airline training departments have worked very hard on constantly refining the topics covered in ground school and in the simulator. It is, and will remain, an ongoing subject and there is a lot more to it than I described here. Bottom line: train for types of contingencies and build the airmanship, knowledge, and communication skills that will be there if the unexpected happens.

The good thing is that in the last twenty five years a great deal of emphasis has been put on communication--both within and without the cockpit. The days of the autocratic know-it-all captain are in the past and good leaders foster an air of open and frank exchange of information. Sadly, we saw a complete breakdown in this area in AF 447 as well as the Asiana accident at SFO. Not from an autocratic captain fortunately, but rather unsatisfactory communication.

There has been some good reporting out regarding the Lion Air accident. I am extremely reluctant to place blame on this, but the 737 has a trim wheel on each side of the center console that should have been the key that something was amiss--for it spins vigorously when the trim is moving. You would, without a doubt, hear/feel/see this wheel moving if the trim were running away. You can then, as is the procedure in every Boeing that I have flown, use the trim cutout switches to, yes, cut out the trim. I, and just about everyone else who has read about this, wonders why the pilots did not do this when they were having trouble. Yes, the new system trim system should have been described to the pilots in at least a training bulletin, but again, those who have flown a Boeing would be inclined (especially if directed by a checklist) to use the cut out switches. I am anxious, now that the voice recorder has been recovered, to read more to see why they did not do this.

As far as duty time and fatigue, the FAA and airlines finally pulled themselves out of the 1940's and into at least the 1990's by addressing rest and duty time. There had always been 30 and 365 day limits on flight hours in FAR Part 121 operations (FAR = Federal Air Regulations). The FAA came up with something called FAR 117 that dictates the rest times that must be afforded to crews. I post the following link as an illustration only--the actual regs have are the authority. It was designed by someone at USAirways (now folded into American Airlines of course) as a help in understanding the rules. As you can see it is VERY complicated: https://far117understanding.files.wordpress.com/2014/01/qr-sheet-final.pdf

The sad thing is that it has been a dirty little secret in many industries how they kept their crews on duty for indefinite times. Truck drivers, pilots, doctors on call, etc. have been a staple of the "way things are done" for far too long. I was sickened when I read about the U.S. Navy accidents in the Western Pacific with the McCain. I had no idea that the duty periods were so convoluted on board a ship--it seems to be so against common sense as to be unbelievable.

This is not just a beef either as there is a ton of science behind how shift workers should be scheduled. All blithely ignored in same walks to this day apparently--except with the airlines where 117 has brought some sanity to the work/rest cycles. Not perfect, but a start.
February 11, 2019bean said...

You would, without a doubt, hear/feel/see this wheel moving if the trim were running away. You can then, as is the procedure in every Boeing that I have flown, use the trim cutout switches to, yes, cut out the trim. I, and just about everyone else who has read about this, wonders why the pilots did not do this when they were having trouble. Yes, the new system trim system should have been described to the pilots in at least a training bulletin, but again, those who have flown a Boeing would be inclined (especially if directed by a checklist) to use the cut out switches. I am anxious, now that the voice recorder has been recovered, to read more to see why they did not do this.

This is particularly confusing because the preliminary accident report shows them resetting the trim manually for 10 minutes before the crash, then stopping a little while before they hit the water. Either the trim cutout switch wasn't working (I suppose this is technically possible, given the problems on the previous flight) or they didn't think to use it. Dispatching the plane with the switch INOP, particularly after issues with the system on the previous flight, would have been pretty stupid even if they aren't MEL, but budget airlines outside the jurisdiction of the FAA (or its equivalent in other western countries) have done stupider things. Not following the checklist is totally inexplicable, but we won't know until they start talking about the contents of the CVR.
February 11, 2019bean said...
Thinking this over more, I'm becoming intrigued by the decision to not release anything from the CVR until the investigation is done. The cynic/conspiracy theorist in me would point out that the decision was made by the Indonesian authorities, who might well have a motive to keep their country's aviation industry from getting dragged through the mud if the pilots did something like not following the checklist. They might be hoping that the public won't be paying attention when they release the report in a few months. But I also am aware that I'm massively biased in favor of Boeing on this one, so it might be wishful thinking on my part.
February 11, 2019ADifferentAnonymous said...
Googling a bit about this topic led me to (this blog post)[https://mmsba.wordpress.com/2017/05/15/airbus-or-boeing-part-2-5/] in which a pilot describes Airbus's design philosophy as 'pull means up'. Seems relevant to Air France 447?
February 11, 2019bean said...
That seems a pretty good summary of Airbus's views on flight controls. I think it's not a particularly good one. In Part 2 of that series, he says that the problem with Airbus is that the planes tend to occasionally revert to being normal airplanes, just when you need the protections the most. That's exactly what happened with AF447. There were a couple of crashes in the early days of the A320 which were pretty much caused by pilots saying "the Magic Fly-By-Wire will protect me" and doing stupid stuff they wouldn't normally have done. Airbus, to its credit, seems to have done a pretty decent job of explaining to pilots since then that, no, FBW isn't a substitute for good airmanship, and don't count on it to bail you out of stupid choices. This was one case where the pilots didn't listen.
February 13, 2019doctorpat said...
I'm reminded of Toyota's problems with stuck throttles.

The issue of the throttle itself (or of people getting the pedals confused) is of some interest, but what really made me wince was that people could not turn the cars off.

These were push-button start cars (which I never understood the attraction of, but whatever) where you push the button to start, and push the button to stop.

EXCEPT.... in an emergency. In an emergency the button controls change so that the normal method of turning the engine off (push the start/stop button) stops working. Now you have to push the button and hold it in for some period of seconds. This is no doubt explained on line 5, page 17 of the user manual, which every driver will have memorized.

The idea was that the push-to-stop button would only be used when the car is parked and stationary. When the car is driving then nobody would turn it off, so they probably bumped the button by accident and having the car turn off at high speed would be bad. So once the car is moving the engine controls change (in the background, without telling you) to need a multi-second push. I think it's only 2 full seconds or so, but to a panicking driver that would feel like 2 minutes.

To someone who works every day with industrial machines that have nice big red emergency stop buttons on them to shut down instantly if something goes wrong, the idea of both a 2 second hold time and an off control that changes function when the system is going fast just makes me shudder.
February 14, 2019bean said...
@doctorpat

Aren't those emergency stop buttons usually caged and/or in a place someone doesn't normally have hands? Shutting the engine off accidentally while the vehicle is moving is definitely a bad thing, and for every case where someone gets the pedals confused and needs to shut the car down because they can't take their foot off the pedal and try again, you're going to have a bunch where someone accidentally hits the button. From both safety and liability perspectives, the on/off button needs to either be caged or have some other mechanism to reduce the chance of someone hitting it accidentally.
February 14, 2019CatCube said...
I think one solution would be to have a normal shut-off take the hold-down; people will whine about how illogical it is for a bit, but they'll learn. Then, you don't have to have the button change functionality behind the scenes.
February 15, 2019bean said...
That would work, too. Actually, I'm kind of surprised they didn't do that.

Also, in unrelated aviation news, the A380 is dead. This is a bad day for Airbus, but probably a good day to be at Boeing.
February 15, 2019Doctorpat said...
E-stop buttons may well be different for different types of machine, but all the ones I'm familiar with (over a range of different machine types, different manufacturers, made in different countries over different decades... quite a range actually) the buttons are very easy to get to. The idea is that you can wildly slap at them while your hand is caught in the mechanism or something and be sure that it will definitely shut down right now. And there are often a number of different buttons distributed around the machine, so that no matter where you are standing you can reach one easily. However, there are a couple of elements to make it less likely to be triggered accidentally (though I certainly have used machines that did have an E-stop (emergency stop) that kept being set off because it was in a spot that wasn't ideal.) One point was that it isn't a gentle prod with the fingers. It takes a big hard push.
Note: in many designs you don't use the E-stop to turn the machine off normally. There is a normal shutdown procedure that is slower and more gently on the system. Emergency stop is EMERGENCY stop. Though with other machines it's just the only stop button. Depends on the design. There is also often a bit of a protective shield so you can't just bump it by leaning against the machine. But not too much because you need to be able to hit it without looking. Catcube's idea is a good one. Indeed I have many small systems where shutdown is a 2 second push on the on/off button. You do grow used to that very quickly.

The rotary motion of turning the key is one that people don't "accidentally" do, so that was an advantage of the old system.

I also wonder if there is an issue with Left-hand-drive cars, where the start/stop control is in the middle of the dashboard, compared to correct-hand-drive cars where the start/stop is hidden between the steering wheel and the driver's door.

Commercial Aviation Part 7

Comments

Comments from SlateStarCodex:

Leave a comment

RSS Feed