Go Back

Southwest Airlines System Failure: A Lesson For IAM Disaster Recovery


Shiran Kleiderman


Southwest Airlines System Failure: A Lesson For IAM Disaster Recovery


Southwest Airlines could have made headlines for weathering the Storm Elliot with the right IT infrastructure that contributes to business continuity paired with smart and proactive back-up and recovery practices and tech stack – instead of making the news for a fully blown fiasco. 


The Storm of Q4 2022 Brought Airlines to Their Knees & Southwest to Fall Flat on their Face


Southwest Airlines have always been a customer-centric and experience focused brand that was able to stay profitable during tough times when other airlines were sinking into debt, or even in stages of acquisitions and crises, particularly during the pandemic.


Their digital experience for both critical sides of the operation, internal employees and consumers, was intended to ensure the best of outcomes for everyone for decades. Southwest equates in brand reputation to easy booking, smooth flight transfers, consistently promising baggage arrivals or efforts to retrieve, swift service, quick travel, and optimal overall user experience.


What happens when one damaged database file corrupts multiple airline systems, bringing an entire operation like Southwest Airlines to chaotic mayhem, with over 2500 canceled flights and a brand reputation tarnished permanently?



Both external and introspective on Southwest’s part.


How One Corrupt File and Old IT Infrastructure Led to Southwest’s Downfall


There have been dozens of media articles covering Storm Eliot’s now notorious system failure, speculating how and why Southwest were destined for an event like this.


The truth is organizations of Southwest’s scale and seniority have to be on top of their game when it comes to their tech stack and the infrastructure housing it. In the digital enterprise, an agile, secure IT infrastructure is paramount, and cyber hygiene is ideal, but it won’t prevent the drama of what one damaged database file of the NOTAM system did to Southwest’s entire operation, let alone other airlines. If, over and above the infrastructure’s agility, backup, recovery and the ability to bounce back into action is impeded without a robust solution and best practices that enforce restoration of data, no matter how reputable a brand is, their reputation can flop and failure is on the horizon.


FAA’s System Failure Exacerbates Southwest’s Struggle with Old IT Infrastructure


NBC recently profiled the event, indicating that the Notice to Air Missions (NOTAM) system outage affecting multiple airlines, (but with Southwest dragging their reputation south indeed), is traceable to a damaged database file.



Quoting Sen. Maria Cantwell who heads the Commerce Committee overseeing the FAA:


“[…] The No. 1 priority is safety,” Cantwell stated, “As the Committee prepares for FAA reauthorization legislation, we will be looking into what caused this outage and how redundancy plays a role in preventing future outages. The public needs a resilient air transportation system.”


U.S. Travel Association’s President and CEO Geoff Freeman also stated:


“Today’s FAA catastrophic system failure is a clear sign that America’s transportation network desperately needs significant upgrades,” while also adding, “Americans deserve an end-to-end travel experience that is seamless and secure. And our nation’s economy depends on a best-in-class air travel system.


[…] We call on federal policymakers to modernize our vital air travel infrastructure to ensure our systems are able to meet demand safely and efficiently,” he said.


Well, there you have it folks. In writing, redundancy is a nuisance. In the world of data and technology, complex IT infrastructures, endless files, and vulnerabilities that can bring operations to a hard stop when damaged, redundancy for business continuity is a blessing, warranted and a critical necessity.


Protecting the mission-critical assets residing within organizational networks and infrastructure with a powerful backup and recovery platform could mean success or failure when push comes to shove and vulnerabilities arise. Outdated infrastructures are already a risk. Add the lack of backup and recovery solutions to the mix, and you’ve got seriously risky business at play. Business continuity isn’t a goal. It’s the essence of daily operations ebbing and flowing as needed to ensure profitability and enforce brand reputation and stability.


Technology’s a Friend and Foe: How Southwest’s Oversight Made It Their Enemy


Southwest’s Transportation System was built on a P2P flight network. What got brushed under the table for decades was the critical insight that P2P networks lack hubs – a central element that can simplify management of transportation networks. Since transportation systems are dynamic technological creatures, hubs that can often optimize management of networks in the industry can also delay travel times and make for more cumbersome processes, potentially a reason for Southwest opting out and running on P2P all together. Technology that was so badly in need of a serious update was simply neglected, and wrongfully so.


Health checks, updates and monitoring of IT infrastructures in addition to backup and recovery is vital to ensure the pumping heartbeat of business continuity in the thriving enterprise, Southwest and other organizations alike. Creating regular, automated, prescheduled backups of the infrastructure is both best practice and wise.


In a recent article from Forbes, Forrester’s VP of Emerging Tech Portfolio, Brian Hopkins, shares:


“To keep its system optimized, Southwest depends on a complex suite of analytics and software. An analytics application called SkySolver likely provides crew routing recommendations to the Crew system, which can move crew members from one location to another to staff flights. Multiple sources also mentioned Southwest’s antiquated process for tracking crew locations: using the telephone.”


“[…] Two decades of neglect takes several years to overcome.” The software engineer corroborated this story of neglect: “[The software] went offline due to its outdated software packages and overutilized server resources, aka CPU, memory, and disk space.”


There are endless stories and excuses corporations can use to cover up the truth, but at the end of the day, their IT infrastructure and lack of back up and recovery was truly the cause for the fiasco.


What’s Next? Take Action with a Backup & Disaster Recovery Platform Hand in Hand


Organizations with global customer reach that affect the essence of their customers’ daily whereabouts, belongings, their time with family, business, and life overall have an obligation: take action before the disaster happens. Southwest missed this one.


Disaster recovery and ensuring its ease with backup is not something to be taken lightly or for granted.  In a chaotic and technology-driven workforce we have to know that vulnerabilities are literally everywhere, and they come in different forms, shapes, sizes, file types, and at the worst time. A simple slip in weather conditions coinciding with a damaged database file and airline giants like Southwest have every stakeholder in their operation frantically looking for ways to tread in ice cold water. Companies as reputable as Southwest Airlines open themselves up to catastrophic fiascos that make bylines, instead of making headlines for excellence.


And the million dollar question is why?


The availability of cost-effective, viable, and robust platforms that can protect IT infrastructures with back-up and recovery systems are accessible and often easy to implement. It is a technological sin if not a plague brought upon one’s own business to simply not acquire a smart, innovative and easy to deploy backup and recovery solution, particularly without thinking your business will need technological triage or emergency care.


Guidelines to Streamline Business Continuity Procedures: Best Practices for Backup and Disaster Recovery”


  • Use Automation & Make Backup Your Friend on A Regular Basis


Robustness is no longer a buzz word, it’s the essence of creating a well-rounded IT environment that’s secured with the right technologies in place to run a powerhouse operation as needed. But reinforcing the security of your mission-critical assets, networks, systems, resources and daily functionality with the ability to bounce back and restore data systems with agility is not a good idea, it’s a necessity. Backup automation, daily, hourly, and regularly checking health, monitoring security posture and infrastructure in parallel to user activity and behavior can help prevent breaches and data file disasters from causing permanent irreparable damage. With automated backup you can sleep better through the night knowing that your IT infrastructure is never left to fate to survive a damaged data file fiasco. Airlines, invest in a tool that automates backup and allows restoration of data storage and mission critical assets, so no flight is too hard to handle in stormy times and cloudy skies. And speaking of the cloud.


  • Consider a Buoyant, Agile and Resilient Cloud IT Infrastructure for your Business


Cloud IT infrastructures have become a popular choice for many enterprises for the flexibility they provide, but those with highly sensitive data often prefer the hybrid or on-prem model for discretion. With that, cyber technologies have empowered the enterprise with the ability to house IT infrastructures in the cloud with solutions like Zero Trust and permission granting by admins. The cloud allows for swift and easier collaboration from one end of the world to the next, but also provides a foundation for data storage that’s easy to both backup and recover when all hell breaks loose and storms like Eliot and vulnerabilities arise. Many organizations have opted for choosing cloud infrastructures provided by leading brands like Microsoft or AWS. Whatever you decide, ensure that your cloud IT infrastructure is ready for the best and worst of moments with backup and you’re ahead of the game.


  • Get A Grip: Adopt & Implement Backup & Recovery Strategy For Business Continuity


So you think it’s a good idea to hire a consultant, outsource their insight and ensure business continuity is prioritized? You don’t need to. It’s easier than you think when it comes to your IT infrastructure. Just do the simple task of adopting daily practices listed above and a platform that covers your data backup and recovery needs.


Finding the means to do so is no longer a choice. It’s a critical aspect of enterprise IT responsibility. The digital, modern enterprise must protect its IT ecosystem and network with a solution that’s designed to enforce security of assets with backup, recovery and restoration capabilities, so business as usual isn’t just a phrase – it’s a mantra to live by.




Looking to stay in the loop on the latest IAM trends and updates?


Subscribe to the FiveNines IAM newsletter today and gain access to exclusive insights from industry leaders, groundbreaking companies, and global news outlets. Don’t miss out on the must-read monthly newsletter that delivers the juiciest edition yet of IAM resilience.


Subscribe on Linkedin now and stay ahead of the curve!

Scroll to Top
Skip to content