Learning from the IT crisis: Improving internal communications at Microsoft and CrowdStrike

The recent global IT outage caused by a flawed CrowdStrike software update and impacted Microsoft illustrated the critical importance of improved communication between cybersecurity companies and their partners.

Even though the problem was fixed quickly, the event showed that there are big communication problems that could be fixed to avoid future crises.

One key area for improvement is the pre-update communication protocol. Due to an unexpected compatibility issue with the Windows operating system, CrowdStrike’s routine update resulted in extensive disruption.

Potential conflicts could have been identified prior to the update’s deployment by implementing a more rigorous pre-update testing and communication strategy.

A collaborative testing environment involving both CrowdStrike and Microsoft could ensure that updates are thoroughly vetted in real-world scenarios, minimising the risk of such disruptions.

Furthermore, enhancing real-time communication channels during crises is essential. In order to mitigate the impact, it was essential that CrowdStrike and Microsoft communicated clearly and promptly after the issue was identified. The response process can be streamlined by establishing a dedicated crisis communication team that includes representatives from both companies. This will ensure that information circulates seamlessly and decisions are made promptly.

Read also: Navigating Crisis: A Proactive Approach to Communication Preparedness

Documentation and transparency also play a vital role. Detailed and accessible documentation on update procedures, potential risks, and contingency plans should be shared between the companies. This transparency ensures that all stakeholders are aware of the steps being taken and can prepare accordingly.

Since this event began, Microsoft maintained ongoing communication with customers, CrowdStrike, and external developers to collect information and expedite solutions. Engaging with CrowdStrike to automate their work on developing a solution and deploying hundreds of Microsoft engineers and experts to work directly with customers were crucial steps. Additionally, collaborating with other cloud providers like Google Cloud Platform (GCP) and Amazon Web Services (AWS) to share awareness and inform ongoing conversations helped mitigate the impact further.

Lastly, ongoing training and drills for crisis scenarios can improve preparedness. Regular joint exercises between CrowdStrike, Microsoft, and other partners can help identify weaknesses in the communication chain and develop robust response strategies.

While the rapid response to the recent outage was commendable, enhancing internal communication protocols, establishing dedicated crisis response teams, and maintaining transparent documentation can significantly reduce the risk of similar incidents in the future. These steps are crucial for fostering a resilient and collaborative IT ecosystem.

Share
Comms Room Staff
Comms Room Staff
A new knowledge platform and website aimed at assisting the communications industry and its professionals. Contribute your op-ed, press releases, how-to articles, videos and infographics at media@commsroom.co