Published on Nemertes Research (http://www.nemertes.com)
Virtualization Risk Analysis

A risk analysis of large‐scaled and dynamic virtual server environments

By Andreas M Antonopoulos, Senior Vice President & Founding Partner

Executive Summary

As virtualization has gained acceptance in corporate data centers, security has gone from afterthought to serious concern. Much of the focus has been on the technologies of virtualization rather than the operational, organizational and economic context. This comprehensive risk analysis examines the areas of risk in deployments of virtualized infrastructures and provides recommendations

Read the study below, then read the FAQ [1]!

The Issue

Server virtualization promises to revolutionize corporate data centers, transforming the relationship between the operating system and the hardware. Companies are deploying server virtualization to gain many different benefits such as server consolidation, faster provisioning, higher utilization and lower energy costs. With all the hype surrounding server virtualization come the inevitable security concerns: are virtual servers less secure? Are we introducing higher risk into the data center? For server virtualization to deliver benefits we have to examine the security risks. As with any new technology there is much uncertainty mixed in with promise. Part of the uncertainty arises because most companies do not have a good understanding of the real risks surrounding virtualization.

State of Deployment

Concerns around server virtualization security are increasing because reliance on virtual servers for critical applications is increasing. As recently as three years ago, most companies employed server virtualization sparsely, mostly in test and development environments and in limited “pilot” deployments. Today, however, server virtualization is rapidly moving into production systems. Our research shows that among those who have adopted virtualization technologies, up to a quarter are deploying it on production systems.

As the number of companies using virtualization in production systems grow, we can see some early trends in those companies which had a head start; the early adopters. Among companies implementing server virtualization, the early gains and return-on-investment (ROI) come primarily from server consolidation. A server virtualization program can demonstrate ROI in less than 18 months, based solely on increasing utilization of existing servers and postponing new hardware purchases. Beyond consolidation, companies see major benefits from the “secondary” effects of virtualization. The encapsulation of server images into virtual machines makes those server images portable, replicable, and hardware independent. Those features translate into significant operational savings, which in most cases, exceed the cost benefits of consolidation by a wide margin. As a result, once companies gain experience with server virtualization, they quickly move beyond consolidation towards the use of virtualization to create a flexible on-demand data center.

Moreover, these secondary benefits of virtualization exhibit a “network effect”1: the more servers are virtualized, the larger the pool of virtual hosts, the greater the benefits from virtualization. If a server image is standardized and made portable using virtualization, then it can deployed to any server in the infrastructure. Therefore, as more servers are converted to virtualization “hosts”, the reach and benefits of virtualization rise almost exponentially.

Stages of Deployment

Early adopters of virtualization have already seen this network effect and have moved aggressively to make virtualization the standard way of deploying servers across the entire data center. This progression of deployment is characterized by four milestones:

–

  • Test and development only – early pilots. Virtualization is deployed only in test environments to make software development and application testing easier. No production servers are virtualized

–

  • Basic Services - Low hanging fruit. Virtualization is deployed on selected servers where low-utilization workloads are then consolidated. Companies take servers running file/print, directory, simple web-serving and virtualize them. Large numbers of servers are consolidated with virtual/physical ratios reaching 20:1.

–

  • Production pools. Virtualization is deployed on large pools of servers for production use. Companies start using virtual infrastructure management platforms to provision, deploy and migrate virtual machines on-demand. Pools of virtual servers become flexible deployment targets for many different types of applications. Server images are standardized (identical clones).

–

  • Complete virtualization. Separate pools of virtualized servers are consolidated into one or more very large server pools. Almost all servers are now virtualized. Deployment onto virtual machines is the required standard for most if not all applications. Combined with SAN and synchronous replication, the server pool is used as a dynamic CPU resource and allocated on-demand.

As virtualization deployments progress through the four stages we see completely new data center architecture emerge. In this new environment some of the traditional security approaches are no longer applicable. Therefore, in any risk analysis of virtualization we have to consider not just the individual virtual server, but the risks inherent in each of the four stages of deployment and their corresponding architectures and operational models.

Hybrid environments

In addition to the four stages of deployment discussed above, we have to look beyond the textbook data center. In the real world a data center will contain a mix of technologies, hardware platforms and operating systems. Additionally, a real data center will always contain a mix of “real” and virtual servers, either during transition from one state to another, or as a result of application-based choices (for now, not all applications can or should be virtualized). Finally, even the best laid plans are often interrupted by reality – even if you have a beautifully standardized virtual data center, your board of directors may decide to ruin your plans by acquiring a company with a very “physical” view of the data center. So in considering the risk inherent in virtualization we must also consider it in the context of a constantly changing mix of physical and virtual servers.

Dynamic environments

Another important factor to consider is the dynamic nature of a virtualized data center. When a server “workload” is virtualized, it becomes independent of the underlying hardware. This makes it possible to move the virtual workload from one physical server to another. In fact, live-migration technology (eg. VMWare’s VMotion) can do this in near-real-time without disrupting the service provided by the virtual server. Server image portability is used for maintenance (moving loads from a server that needs a hardware fix), recovery (moving loads to other data centers) and load-balancing (re-distributing loads among servers).

Furthermore, virtual machines can be provisioned very quickly by making a clone of an existing server (often a standardized server “template”) and booting it up as a new server. With just a few clicks, what used to take days or weeks can now be accomplished in a matter of minutes. The result is quite predictable – to test any new service it is easy to just deploy a new temporary virtual server. The ease of deployment results in a completely new deployment model – the “transient” virtual server. In a virtualized data center there may be a large number of these transient servers flitting in and out of existence.

Such an environment, characterized by fluctuating loads and dynamic moves and changes, has its own unique security issues.

Virtualization Risk Assessment

Risk assessment is the most important step in risk management, but also the step that is most difficult. There are many approaches to estimating risk, though most boil down to just two factors: probability of loss – loss of time, loss of protected information, loss of investor or consumer confidence – and impact of that loss. Of those, the probability of loss is the harder to estimate accurately because it depends on both vulnerability to a threat (or all threats), and the rate of occurrence of an attack. The difficulty arises both from the fact that not all vulnerabilities are known (and unknown vulnerabilities have to be presumed to exist) and from the fact that the rate of occurrence of a security attack is very difficult to estimate. However, we can use formulas to estimate the relative risk inherent in different environments if we assume that the rate of occurrence of an attack is the same and keep it constant. Thus we can compare relative risk without directly quantifying absolute risk.

In mathematical terms we can express risk as a function of loss impact and loss probability:

RL = IL ⋅p(L)

p(L) =pattack ⋅V

where Li is the loss (impact), p(Li) is the probability of that loss occurring, pattack is the probability of an attack and Vi is the vulnerability of the asset.

When looking at a collection of systems (eg. a data center), the total risk would be the sum of the risks of each component:

Rsystem =∑/i Ri

So while we can’t easily estimate the probability of an attack, if we keep pattack constant, we can compare relative risk between systems based on vulnerability and loss impact.

Risk Assessment

  • Single Instance (Hypervisor Risk)

Most analysis of security in virtualized environments focuses on the hypervisor. The hypervisor underpins the virtual guest machines and provides a working environment. As such it is responsible for managing all input and output from the server. Therefore, it is seen as a primary target for an attack – get the hypervisor and you have control and visibility over all the virtual workload.

However, the conventional wisdom overlooks two important issues. Firstly, hypervisors are purpose-build software with a small and specific set of functions. A hypervisor is smaller, more focused than a general purpose operating system, and less exposed, having fewer or no externally accessible network ports. A hypervisor does not undergo frequent change and does not run third-party applications. The guest operating systems, which may be vulnerable, do not have direct access to the hypervisor. In fact, the hypervisor is completely transparent (invisible) to network traffic with the exception of traffic to/from a dedicated hypervisor management interface. Furthermore, at present there are no documented attacks against hypervisors, reducing the likelihood of attack.

So while the impact of a hypervisor compromise is great (compromise of all guests), the probability is low because both the vulnerability of the hypervisor and the probability of an attack are low.

  • Test and Development

In this stage the impact of loss is lower than in production environments. On the other hand, the vulnerability of each system is likely to be higher because test systems are not habitually hardened against attack and may run a number of development-related services with lesser authentication requirements. Also, the use of shared or temporary passwords is more common in development environments. Finally, test environments are not usually customer facing, but they might contain customer data (or other sensitive data), which may be stolen in an attack.

  • Basic Services

Once moved into a production environment, virtualization creates a greater risk. Basic infrastructure services are the first to be virtualized in many companies because of their low CPU utilization. As “low hanging fruit” they create the greatest ROI for consolidation projects. However, from a security perspective, basic services may underpin other applications and therefore represent a higher loss impact. Loss of DNS services, for example, may affect critical enterprise applications even if those critical applications are not themselves virtualized yet. Basic applications (file/print/static web) and infrastructure services (AD, DNS, DHCP) are generally more mature, more hardened and change less often than enterprise applications making them possibly less vulnerable to attack.

  • Production Pools

When companies create a pool of virtualized servers for production use, they also change their deployment and operational practices. Given the ability to standardize server images (since there are no hardware dependencies), companies consolidate their server configurations into as few as possible “gold” images which are used as templates for creating common server configurations. Typical images would include baseline OS images, web server images, application server images etc. This standardization introduces an additional risk factor: monoculture. All the standardized images will share the same weaknesses. Whereas in a traditional data center there are firewalls and intrusion prevention devices between servers, in a virtual environment there are no physical firewalls separating the virtual machines. What used to be a multi-tier architecture with firewalls separating the tiers ends up being a pool of servers. A single exposed server can lead to a rapidly propagating threat that can jump from server to server. Standardization of images is like dry tinder to a fire: a single piece of malware can become a firestorm engulfing the entire pool of servers. The loss impact and vulnerability increase with the size of the pool – in proportion to the number of virtual guests each of which bring their own vulnerabilities, creating a higher risk than in a single-instance virtual server. Moreover, the risk of the sum is greater than the sum of the risk of the parts because the vulnerability of each system is itself subject to a “network effect”. Each additional server in the pool multiplies the vulnerability of other servers in the pool.

  • Complete Virtualization

The complete virtualization model is also geared towards enabling on-demand and dynamic computing. Beyond simply consolidation, this deployment model seeks to leverage virtualization to achieve rapid application deployment, business agility and operational efficiency through automation.

On top of the risks inherent in a large pool of virtualized servers, complete virtualization adds risks related to the dynamic nature of on-demand computing. On the one hand, dynamic computing is much harder to secure using traditional perimeter static security. The perimeter security devices have limited visibility into the pool and are modeled on fixed associations between IP addresses, MAC addresses and physical location (eg. ACLs based on IP/MAC address). On the other hand, applying traditional security mechanisms (static) curtails dynamic actions (moves and on-demand provisioning) - the very benefits sought by this deployment model. Adding to the fact that traditional security is not up to the task, there are risks associated with the virtual machine lifecycle. The operational model motivates rapid-provisioning and fleeting instances of virtual machines. Managing vulnerabilities and patches is therefore much harder that just running a scan, as the rate of change is much higher than in a traditional data center.

  • Assessment Summary

Based on the above analysis, we see that security in a virtualized data center is more than just security of the hypervisor. In fact, as enterprises continue their rapid adoption of virtualization, it is the large scale and new operational models that introduce unique challenges and risks.

This conclusion might appear to be pessimistic at first glance. However, note that we are comparing various stages of deployment of virtual servers. A large deployment of physical servers will suffer from many of the same challenges that the “Complete Virtualization” environment suffers from. What is new here is that there are fewer solutions for providing virtual security than there are for providing physical security with firewalls and intrusion prevention appliances in the network. On the other hand, the cost of implementing virtualized security can be significantly lower than the cost of dedicated hardware appliances, just like the cost of managing a virtual server is lower than a physical server.

Conclusions and Recommendations

The reason why companies are moving towards comprehensive deployments of server virtualization (and other forms of virtualization) is because of the significant business benefits that flow from virtualization. The fact that security is still not as mature in this space is a reason for more investment, not less deployment. For comparison purposes, imagine a similar risk analysis two decades ago on whether companies should connect to the Internet: from a single computer, in a controlled DMZ, on their full network and on fixed and mobile devices. Clearly the idea of ubiquitous connectivity on fixed and mobile devices would have earned a “High Risk” rating. Yet, businesses have proceeded in that direction because the benefits far outweigh the risks. So, in the above risk analysis, one must also consider that the benefits in virtualization far outweigh the risks. The question is not so much whether companies should proceed with virtualization – the market is already answering that resoundingly in the affirmative. The question is how to do that while minimizing the risk inherent in such a strategy.

In the long run, virtualized security solutions will not only help mitigate the risk of broadly deployed infrastructure virtualization, but will also provide new and innovative approaches to information security that is in itself virtual. The dynamic, flexible and portable nature of virtual servers is already leading to a new generation of dynamic, flexible and portable security solutions.

1 http://en.wikipedia.org/wiki/Network_effect [2](Metcalfe's law states that the value of a network is proportional to the square of the number of users of the system n2)

The Nemertes Research Group Inc. Copyright ©2002-2008

Source URL (retrieved on 2009-01-06 02:44): http://www.nemertes.com/issue_papers/virtualization_risk_analysis

Links:
[1] http://www.nemertes.com/faq_nemertes_issue_paper_virtualization_risk_analysis
[2] http://en.wikipedia.org/wiki/Network_effect