Corporate Network Viruses & Disaster Recovery Implementation
When dealing with viruses or any variety of malware in a live business environment, proactivity is the key for success. However, infections are almost inevitable, and even a minor instance of corporate network infection can be devastating. Organizations that rely on 24/7 availability of their IT resources need a documented Disaster Recovery (DR) plan that coordinates their resources to fight infections and restore services with minimal downtime, while simultaneously ensuring data integrity.
An overall plan requires non-technical upper management and internal communications departments to relay information at the appropriate times to employees, business partners, customers, and the public. This article focuses on the standards that let you know when the infecting virus or Trojan is neutralized, and systems can be restored to production safely.
As some point in the course of operating a network on the Internet, your organization’s infrastructure very likely will become infected. As soon as an infection is identified, the incident-response team must swing into action, quarantining affected systems or files, and then remediating any problems caused by the infection.
The IT department may learn of the infection through an alert from a security monitoring tool, such as a desktop firewall or antivirus software and active users on the network. The software will indicate that it has encountered a file, email message, connection, connection attempt, or other activity that is suspicious and bears closer scrutiny.
Identification of the infection also may come from Help Desk calls complaining of unexpected behavior on desktop or laptop PCs. It usually takes more than one call before people realize what’s happening; unfortunately, this lag time gives the infection time to propagate through the network, compromise hosts, steal or corrupt data, and build its footprint.
After the infection has been verified, the next step is to quarantine all infected hosts, limiting the ability of the attacker to spread, transmit data outbound, or receive instructions from command-and-control servers managing a potential bot network. Depending on how far and wide the infection has spread, the quarantine may need to include hosts, subnets, or entire domains. It’s possible that restricting the flow of the virus may require disconnecting the entire network from the Internet, at least temporarily. The infected organization must avoid two risks:
- The attacker may alter, destroy, or transmit the organization’s data off-network.
- The infected network may serve as a jumping-off point for the infection to affect other networks.
Simultaneously, the staff must log all witnessed behavior and symptoms of the infection. Post this info on the walls of the incident command room to ensure that all known information is shared among staff investigating and fighting the infection. Given the speed of virus propagation and the fact that viruses can lie dormant for varying periods of time, the data-gathering process must continue while you’re quarantining hosts and disinfecting machines.
Remediation is the process of cleaning the infected hosts or mitigating all witnessed behavior and symptoms of the infection. A host is deemed “clean” if any of the following statements are true:
- A means of deleting the infecting virus, Trojan, or other malware is known, and reinfection has been prevented.
- The host can be compared to a clean image, and data integrity is maintained.
- The host can be rebuilt from a clean image.
These options may not always be available in the heart of an infection, especially in the case of zero-day attacks or infections from “unpopular” viruses against which signatures are not actively being developed.
IT team may want to stay offline until every infected host has been thoroughly cleaned or rebuilt even if this process takes a week or longer. However, in many organizations such a loss of computing resources may simply be intolerable. On the other hand, businesses certainly shouldn’t operate with an active virus on the network. You need a “middle ground” that allows for resuming operations while disinfection activities are underway. The following section describes such a system.
Mitigation and Countermeasures
With mitigation, security countermeasures are in place to block or deny each witnessed action and behavior of the infection this is where the running list of the infection’s activity is taken into consideration. For example, let’s assume that the following is a listing of witnessed actions and behaviors of the infection:
- Changes to normal operations:
- Windows Explorer breaks
- Unexpected reboots
- Regular and frequent reboots
- Machines shut down upon initial infection
- Changes to log settings
- System configuration changes:
- Windows Registry settings edited
- New user accounts created
- Software installed on host:
- Keystroke logger
- Files created on infected hosts:
- “Password” text file seen on infected hosts
- Filenames corresponding to the name of the virus
- Unusual communication attempts:
- Attempted FTP connections from infected machines to multiple unrecognized addresses
- SMB connections to and between PC shares originated by infected hosts to potentially clean hosts (potential means of virus propagation).
Some of this behavior can be observed by IT personnel, gathered from user report. Other evidence requires research into the virus itself. If files are being written to infected hosts, for example, it’s helpful to track the path in the directory structure where the files are written, as well as the naming convention used for the directories and files, so this information can be used to search other hosts for signs of the same infection.
Once we know what the virus does, we can design, test, deploy, and verify security measures to block these symptoms. For example, the following measures can address the symptoms from the preceding list:
- Configure and run antivirus software with the latest signatures and rules to block the creation, writing, and execution of all suspected bad files.
- Run scripts in loops to delete all virus-related files and kill its processes and services.
- Prevent repopulation of infected files after deletion.
- Restore Windows Registry settings to correct/default settings.
- Verify normal operations:
- Windows Explorer operating normally
- No unrecognized rebooting
- Delete unauthorized accounts
- Prevent further outbound FTP connection attempts to known “bad” IP addresses
- Change system and domain administrator passwords
Testing and Adjusting the Final Solution
To ensure that countermeasures are working, network connectivity and services can be restored in a deliberate, phased approach, with testing taking place at each stage to ensure that the infection is held in check. For example, we might allow a server to operate disconnected from the network for a period of time, such as three hours or half of a working day, and monitor its behavior. If we see normal behavior we can take the further step of restoring internal connections, such as connecting a mitigated LAN or host to the server again, monitoring operations for evidence of virus activity. All business rules should remain implemented and functioning properly. Also, throughout time, IT should look for signs indicating that the virus is active, or is being blocked. Users should be closely monitored and contacted to ensure data integrity and safe working environment.