Step-by-Step: How to Identify Systems Hit by the CrowdStrike Outage in Your On-Premise or Cloud Environment

In a startling turn of events, a global technology outage triggered by a defective software update from CrowdStrike has led to widespread chaos across various sectors worldwide. The incident, which affected systems running Microsoft Windows, has grounded flights, disrupted hospitals, and halted numerous business operations, showcasing the critical dependencies on cybersecurity infrastructures in the modern digital age.

The Catalyst: A Defective Update

The outage was traced back to a single content update for Windows issued by CrowdStrike, a leading cybersecurity firm. The update contained a defect that led to significant failures in numerous systems, causing a ripple effect of IT outages globally. The impact was profound, affecting a range of sectors from airlines and banks to hospitals and individual computer users.

Immediate Impacts and Chaos

The immediate repercussions were severe and widespread:

  • Airlines: Thousands of flights were canceled, causing massive delays and disruptions at airports worldwide.
  • Hospitals: Medical facilities faced critical operational challenges as their IT systems went offline, complicating patient care and medical procedures.
  • Financial Institutions: Banks and other financial entities experienced significant disruptions, impacting transactions and day-to-day operations.
  • Media Outlets: Several media organizations found themselves unable to access their networks, hindering news dissemination and media operations.

The chaos was further compounded by the interconnected nature of global IT infrastructures, highlighting vulnerabilities that can lead to such widespread crises.

Steps for determine the impact

CrowdStrike, along with Microsoft, has been working tirelessly to address the issue. Below is a step-by-step explanation of how to identify and assess the affected systems using specific queries in Advanced Event Search.

Step 1: Determine Impacted Channel File

The first step involves running a query to identify the impacted channel file, which will help in determining the scope of affected systems. Please run the following query in Advanced Event Search with the search window set to seven days.

#event_simpleName=ConfigStateUpdate event_platform=Win
| regex("\|1,123,(?<CFVersion>.*?)\|", field=ConfigStateData, strict=false) |
parseInt(CFVersion, radix=16)
| groupBy([cid], function=([max(CFVersion, as=GoodChannel)]))
| ImpactedChannel:=GoodChannel-1
| join(query={#data_source_name=cid_name | groupBy([cid],
function=selectLast(name), limit=max)}, field=[cid], include=name, mode=left)
Please make note of the value listed in the column “ImpactedChannel.”
This number




Explanation:

  • Filter Events: The query filters events with the name ConfigStateUpdate on the Windows platform.
  • Extract Version: It uses a regular expression to extract the configuration file version from the event data.
  • Parse Version: The extracted version number is then parsed as an integer.
  • Group by Customer ID (cid): The versions are grouped by the customer ID, and the highest version number is identified.
  • Determine Impacted Channel: The impacted channel file is derived by subtracting one from the highest version number.

Make note of the value listed in the “ImpactedChannel” column. This value will be used in the next step and typically is around 30 for Falcon tenants.

Step 2: Execute Detailed Query to Identify Impacted Systems

With the impacted channel file identified, the next step is to run a more detailed query to identify the systems affected during the critical impact window.

Query to Identify Impacted Systems:

// Get ConfigStateUpdate and SensorHeartbeat events
#event_simpleName=/^(ConfigStateUpdate|SensorHeartbeat)$/ event_platform=Win
// Narrow search to Channel File 291 and extract version number; accept all
SensorHeartbeat events
| case{
#event_simpleName=ConfigStateUpdate | regex("\|1,123,(?<CFVersion>.*?)\|",
field=ConfigStateData, strict=false) | parseInt(CFVersion, radix=16);
#event_simpleName=SensorHeartbeat | rename([[@timestamp, LastSeen]]);
}
// Restrict results to hosts that were online during impacted time window
| case{
#event_simpleName=ConfigStateUpdate | @timestamp>1721362140000 AND
@timestamp < 1721366820000 | CSUcounter:=1;
#event_simpleName=SensorHeartbeat | LastSeen>1721362140000 AND
LastSeen<1721366820000 | SHBcounter:=1;
*;
}
| default(value="0", field=[CSUcounter, SHBcounter])
// Make sure both ConfigState update and SensorHeartbeat have happened
| selfJoinFilter(field=[cid, aid, ComputerName], where=[{ConfigStateUpdate},
{SensorHeartbeat}])
// Aggregate results
| groupBy([cid, aid], function=([{selectFromMax(field="@timestamp",
include=[CFVersion])}, {selectFromMax(field="@timestamp", include=[@timestamp])
| rename(field="@timestamp", as="LastSeen")}, max(CSUcounter, as=CSUcounter),
max(SHBcounter, as=SHBcounter)]), limit=max)
// Perform check on selfJoinFilter
| CFVersion=* LastSeen=*
// ////////////////////////////////////////////////////////// //
// UPDATE THE LINE BELOW WITH THE IMPACTED CHANNEL FILE NUMBER //
// ////////////////////////////////////////////////////////// //
| in(field="CFVersion", values=[0,31])
// Calculate time between last seen and now
| LastSeenDelta:=now()-LastSeen
// Optional threshold; 3600000 is one hour; this can be adjusted
| LastSeenDelta>3600000
// Calculate duration between last seen and now
| LastSeenDelta:=formatDuration("LastSeenDelta", precision=2)
// Convert LastSeen time to human-readable format
| LastSeen:=formatTime(format="%F %T", field="LastSeen")
// Enrich aggregation with aid_master details
| aid=~match(file="aid_master_main.csv", column=[aid])
| aid=~match(file="aid_master_details.csv", column=[aid],
include=[FalconGroupingTags, SensorGroupingTags])
// Convert FirstSeen time to human-readable format
| FirstSeen:=formatTime(format="%F %T", field="FirstSeen")
// Move ProductType to human-readable format and add formatting
| $falcon/helper:enrich(field=ProductType)
| drop([Time])
| default(value="-", field=[MachineDomain, OU, SiteName, FalconGroupingTags,
SensorGroupingTags], replaceEmpty=true)
// Create conditions to check for impact
| case{
LastSeenDelta>3600000 | Details:="OK: Endpoint seen in past hour.";
CSUcounter=0 AND SHBcounter=0 | Details:="OK: Endpoint did not receive
channel file during impacted window. Endpoint was offline.";
CSUcounter=0 AND SHBcounter=1 | Details:="OK: Endpoint did not receive
channel file during impacted window. Endpoint was online.";
CSUcounter=1 AND SHBcounter=1 | Details:="CHECK: Endpoint received channel
file during impacted window. Endpoint was online. Endpoint has not been seen
online in past hour.";
}
// Create one final groupBy for easier export to CSV
| groupBy([cid, aid, ComputerName, LastSeen, CFVersion, LastSeenDelta, Details,
AgentVersion, aip, event_platform, FalconGroupingTags, LocalAddressIP4, MAC,
MachineDomain, OU, ProductType, SensorGroupingTags, SiteName,
SystemManufacturer,SystemProductName, Version], limit=max)

Explanation:

  1. Filter Events: The query selects events with the names ConfigStateUpdate and SensorHeartbeat on the Windows platform.
  2. Extract and Parse Version: It extracts the configuration file version from the ConfigStateUpdate events and parses it as an integer.
  3. Identify Time Window: It filters events that occurred within the impact window (0400 – 0600 UTC on July 19, 2024).
  4. Counter Flags: Sets counters for systems that were online and processed the update during the impact window.
  5. Join Filters: Ensures both ConfigStateUpdate and SensorHeartbeat events occurred for the same systems.
  6. Aggregate Results: Aggregates the results, including the configuration file version and the last seen timestamp.
  7. Check Impacted Versions: Filters systems running the impacted channel file version (CFVersion) identified earlier.
  8. Calculate Time Deltas: Calculates the time since the systems were last seen and formats it for readability.
  9. Enrich Data: Matches additional details from master files and converts timestamps to human-readable formats.
  10. Impact Assessment: Sets detailed status messages based on the system’s update and online status during the impact window.
  11. Export Results: Groups the final results for easier export to CSV.

Important: Update the line with the impacted channel file number derived from Step 1:

sqlCopy code| in(field="CFVersion", values=[0,31])

Adjust the time window if necessary by changing the threshold value. The default is set to one hour (3600000 milliseconds).The output of this query will show systems that have last reported running an impacted version of Channel File 291 that have not been seen in the past hour.
If the time window of one hour is too long, that can be adjusted in Line 26 of the query:
// Optional threshold; 3600000 is one hour
| LastSeenDelta>3600000
The value 3600000 is one hour in milliseconds. You can pick the threshold that best suits your
needs.

By following these steps and using the provided queries, you can accurately scope and identify systems impacted by the recent CrowdStrike outage. This method ensures that all affected systems are assessed, and necessary remediation actions are taken promptly to restore normal operations.