When the Internet Stumbles: Lessons from the CrowdStrike Outage 

July 19, 2024, was a typical Friday at work… until it wasn't. From airlines to healthcare facilities and public transit sectors, computer systems experienced a technical outage that was felt around the globe. In an instant, millions of Windows systems crashed displaying the blue screen of death thus causing services to come to a grinding halt for people worldwide.  This event, which can only be described as one of the biggest IT outages in recent history, was caused by a bug in a software system at CrowdStrike, an endpoint security vendor, that caused massive disruptions on a scale that has never been seen before. It is now known as the CrowdStrike Incident. 

What Was The Crowdstrike Incident? 

At the heart of the colossal disruption was a faulty update to a software system (Falcon) that was pushed out to scores of companies by CrowdStrike. In a statement from the company, “the defect was found on a single content update for Windows hosts — noting that Mac and Linux systems were not impacted.” This seemingly routine update ended up causing a disastrous ripple effect in companies using the Windows software in the form of outages, delays and massive tech issues. This led quickly to disruptions in air traffic, hospital systems, banking programs, and media outlets, not to mention the thousands of businesses that use Windows.  According to TechTarget Online, “The outage was not a Microsoft Windows flaw directly, but rather a flaw in CrowdStrike Falcon that triggered the issue.” The Falcon flaw tied into the Microsoft Windows OS as a kernel process with high privileges. The logic flaw in Falcon caused it to crash and ultimately resulted in a Windows crash and the dreaded blue screen of death for users.  Microsoft estimated that approximately 8.5 million Windows devices were directly affected by the CrowdStrike logic error flaw. While that may be less than 1% of Microsoft's global Windows install base, it impacted critical operations in industries where mistakes could mean the difference between life and death, such as air travel and healthcare.

data loss

Lessons from the CrowdStrike Outage

In every cyber incident, whether it is a breach, glitch, bug or other technical issue, there are always lessons that can be learned to make our digital world safer in the future.  The CrowdStrike Incident showed the world how interconnected we really are as well as how dependent we all are on this particular cybersecurity organization.  Some takeaways that organizations should glean from this incident include the importance of backups and redundancy systems, enhancing monitoring and real-time alerts, the importance of phased rollouts and improved communication in the wake of future incidents. 

The Need for Backup & Redundant Systems 

This outage demonstrated to business leaders across the globe the importance of having a solid backup plan and backup process as well as utilizing redundant systems to protect their services and critical data. The concept of redundancy involves having multiple, independent systems in place to ensure that if one fails, others can take over. Businesses should implement backup solutions and alternative security measures to avoid single points of failure, like those that were experienced during this particular incident. 

Enhanced Monitoring & Real-Time Alerts 

Being able to respond quickly in a technical crisis can mean this difference between being “down” for a few hours versus a few days. In the world of technology, downtime means lost business and revenue, both of which can spell disaster.  Detecting anomalies quickly can help mitigate the impacts of the incident. However, to respond quickly, users must be alerted to the activity in order to reroute traffic or switch to backup systems.  One tip for small and medium-sized businesses is to be familiar with Downdetector, an application that offers real-time status and uptime monitoring for hundreds of services, including telecommunication outages (internet, phone and TV service), online banking problems, websites that go down and apps that aren't working. This app monitors over 12,000 services in more than 45 countries giving businesses monitoring data in real-time. 

Importance of Phased Rollouts 

To avoid inoperable systems, like the 8.5 million Windows devices experienced in the CrowdStrike Incident, phasing rollouts of updates is advisable. Phased rollouts can reduce the risk of facing widespread issues. With a controlled rollout, companies can reduce risks and still reap the rewards of new updates. 

Communication is Key 

Clear and timely communication is key when technology issues arise, no matter what the cause. Communication among the company causing the issue and stakeholders should be transparent and rapid to mitigate issues and losses. While CrowdStrike was able to identify and deploy a fix for the issue in 79 minutes, the impacts were felt at airports, banks, hospitals and businesses for days and days. TechTarget estimates that many businesses were able to apply the fix within a few days. “However, the process was not straightforward for all, particularly those with extensive IT infrastructure and encrypted drives. The use of the Microsoft Windows BitLocker encryption technology by some organizations made it significantly more time-consuming to recover as BitLocker recovery keys were required.” In the end, some organizations could take months to recover.  After July 19th, many businesses and organizations found themselves revisiting their backup and disaster recovery plans to be sure they would be prepared for the next tech interruption. These incidents are a sober reminder that while the cloud and software applications make broad-ranging updates possible and efficient, they also carry with them a level of risk.  How prepared is your organization for the next glitch, bug, hack, or breach? Do you have redundancies, backups, or disaster recovery plans that are not outdated? If you can not confidently answer yes to these questions, it may be time to reach out to your IT department or Managed IT Service Provider to confirm these questions. Talk to our team here at Spectra Networks by contacting us online or via phone at 978.219.9752.