The Virtual Facility© is a simulator for data centers, allowing you to evaluate the immediate and long-term engineering impacts of any physical change to the data center in past, present,…
Webinar Today 1pm ET - Predictive #DCIM: A necessity to protect IT service availability and #datacenter capacity
Webinar Predictive #DCIM: A necessity to protect IT service availability and #datacenter capacity #futurefacilities
At $10 to $30M per MW, data center capacity is a major, capital investment for any company. After the investment has been made, each unit of capacity must be put to good use in the same way that companies need production from each employee. However, most data centers never fully utilize its capacity potential. In fact, on average 30% of data center capacity is lost to non-optimal management of IT resources.
This begs the questions: How is capacity lost? Once lost, can it be reclaimed? Can lost capacity be avoided altogether?
Answers to these questions will be presented by showcasing an in-depth product demonstration of 6SigmaFM - Industry’s first and only predictive #DCIM solution.
This Industry Perspectives article is the second in a series of three that analyzes the network-related issues being caused by the Data Deluge in virtualized data centers, and how these are having an effect on both cloud service providers and the enterprise. The focus of the first article was on the overall effect server virtualization is having on storage virtualization and traffic flows in the datacenter network. This article dives a bit deeper into the network challenges in virtualized data centers as well as the network management complexities and control plane requirements needed to address those challenges.
Server Virtualization Overhead
Server virtualization has enabled tens to hundreds of VMs per server in data centers using multi-core CPU technology. As a result, packet processing functions, such as packet classification, routing decisions, encryption/decryption, etc., have increased exponentially. Because discrete networking systems may not scale cost-effectively to meet these increased processing demands, some changes are also needed in the network.
Networking functions that are implemented in software in network hypervisors are not very efficient, because x86 servers are not optimized for packet processing. The control plane, therefore, needs to be scaled somehow by adding communications processors capable of offloading network control tasks, and both the control and data planes stand to benefit substantially from hardware assistance provided by such function-specific acceleration.
The table below shows the effect on packet processing overhead of virtualizing 1,000 servers. As shown, by mapping each CPU core to four virtual machines (VMs), and assuming 1 percent traffic management overhead with a 25 percent east-west traffic flow, the network management overhead increases by a factor of 32 times in this example of a virtualized data center.
Click to enlarge chart.This table shows the effect on network management overhead of virtualizing 1,000 servers.
Virtual Machine Migration
Support for VM migration among servers, either within one server cluster or across multiple clusters, creates additional management complexity and packet processing overhead. IT administrators may decide to move a VM from one server to another for a variety of reasons, including resource availability, quality-of-experience, maintenance, and hardware/software or network failures. The hypervisor handles these VM migration scenarios by first reserving a VM on the destination server, then moving the VM to its new destination, and finally tearing down the original VM.
Hypervisors are not capable of the timely generation of address resolution protocol (ARP) broadcasts to notify of the VM moves, especially in large-scale virtualized environments. The network can even become so congested from the control overhead occurring during a VM migration that the ARP messages fail to get through in a timely manner. With such a significant impact on network behavior being caused by rapid changes in connections, ARP messages and routing tables, existing control plane solutions need an upgrade to more scalable architectures.
Multi-tenancy and Security
Owing to the high costs associated with building and operating a data center, many IT organizations are moving to a multi-tenant model where different departments or even different companies (in the cloud) share a common infrastructure of virtualized resources. Data protection and security are critical needs in multi-tenant environments, which require logical isolation of resources without dedicating physical resources to any customer.
The control plane must, therefore, provide secure access to data center resources and be able to change the security posture dynamically during VM migrations. The control plane may also need to implement customer-specific policies and Quality of Service (QoS) levels.
Service Level Agreements and Resource Metering
The network-as-a-service paradigm requires active resource metering to ensure SLAs are maintained. Resource metering through the collection of network statistics is useful for calculating return on investment, and evaluating infrastructure expansion and upgrades, as well as for monitoring SLAs.
The network monitoring tasks are currently spread across the hypervisor, legacy management tools, and some newer infrastructure monitoring tools. Collecting and consolidating this management information adds further complexity to the control plane for both the data center operator and multi-tenant enterprises.
The next article in the series will examine two ways of scaling the control plane to accommodate these additional packet processing requirements in virtualized data centers.
DCIM Yields Return on Investment
By: Michael Potts
DCIM will not transform your data center overnight, but it will begin the process. While it isn’t necessary to reach the full level of maturity before seeing benefits, the areas of benefit are significant and can bring results in the short-term. The three primary methods in which DCIM provides ROI are:
- Improved Energy Efficiency
- Improved Availability
- Improved Manageability
DCIM LEADS TO IMPROVED ENERGY EFFICIENCY
In his blog, Dan Fry gets right to the heart of DCIM’s role in improving energy efficiency when he says, “To improve energy efficiency inside the data center, IT executives need comprehensive information, not isolated data. They need to be able to ‘see’ the problem in order to manage and correct it because, as we all know, you can’t manage what you don’t understand.”
The information provided by DCIM can help data center managers in reducing energy consumption:
MATCHING SUPPLY WITH DEMAND
Oversizing is one of the biggest roadblocks to energy efficiency in the data center. In an APC survey of data center utilization, only 20 percent of respondents had a utilization of 60 percent or more, while 50 percent had a utilization of 30 percent or less. One of the primary factors for oversizing is the lack of power and cooling data to help make informed decisions on the amount of infrastructure required. DCIM solutions can provide information on both demand and supply to allow you to “right-size” the infrastructure, reducing overall energy costs by as much as 30 percent.
IDENTIFYING UNDER-UTILIZED SERVERS
As many as 10 percent of servers are estimated to be “ghost servers,” servers which are running no applications, yet still consume 70 percent or more of the resources of a fully-utilized server. DCIM solutions can help to find these under-utilized servers Which could be decommissioned, re-purposed or consolidated as well as servers which do not have power management functionality enabled, reducing IT energy usage as well as delaying the purchase of additional servers.
MEASURING THE IMPACT OF INFRASTRUCTURE CHANGES
DCIM tools can measure energy efficiency metrics such as Power Usage Effectiveness (PUE), Data Center Infrastructure Efficiency (DCiE) and Corporate Average Datacenter Efficiency (CADE). These metrics serve to focus attention on increasing the energy efficiency of data centers and to measure the results of changes to the infrastructure. In the white paper “Green Grid Data Center Power Efficiency Metrics: PUE and DCiE,” the authors lay out the case for the introduction of metrics to measure energy efficiency in the data center. The Green Grid believes that several metrics can help IT organizations better understand and improve the energy efficiency of their existing data centers as well as help them make smarter decisions on new data center deployments. In addition, these metrics provide a dependable way to measure their results against comparable IT organizations.
DCIM solutions can improve availability in the following areas:
Understanding the Relationship Between Devices
A DCIM solution can help to answer questions such as “What systems will be impacted if I take the UPS down for maintenance?” It does this by understanding the relationship between devices, including the ability to track power and network chains. This information can be used to identify single points of failure and reduce downtime due to both planned and unplanned events.
Improved Change Management
When investigating an issue, examination of the asset’s change log allows problem managers to recommend a fix over 80 percent of the time, with a first fix rate of over 90 percent. This reduces the mean time to repair and increases system availability. DCIM systems which automate the change management process will log both authorized and unauthorized changes, increasing the data available to the problem manager and increasing the chances the issue can be quickly resolved.
Root Cause Analysis
One of the problems sometimes faced by data center managers is too much data. Disconnecting a router from the network might cause tens or hundreds of link lost alarms for the downstream devices. It is often difficult to find the root cause amidst all of the “noise” associated with cascading events. By understanding the relationship between devices, DCIM solution can help to narrow the focus to the single device — the router, in this case — which is causing the problem. By directing focus on the root cause, the problem can be resolved more quickly, reducing the associated downtime.
DCIM solutions can improve manageability in the following areas:
Data Center Audits
Regulations such as Sarbanes-Oxley, HIPA and CFR-11 increase the requirements for physical equipment audits. DCIM solutions provide a single source of the data to greatly reduce the time and cost to complete the audits. Those DCIM tools utilizing asset auto-discovery and asset location mechanisms such as RFID can further reduce the effort to perform a physical audit.
DCIM can be used to determine the best place to deploy new equipment based on the availability of rack space, power, cooling and network ports. It then can be used to track all of the changes from the initial request through deployment, system moves and changes, all the way through to decommissioning. The DCIM solution can provide detailed information on thousands of assets in the data center including location, system configuration, how much power it is drawing, relationship to other devices, and so on, without having to rely on spreadsheets or home-grown tools.
With a new or expanded data center representing a substantial capital investment, the ability to postpone new data center builds could save millions of dollars. DCIM solutions can be used to reclaim capacity at the server, rack and data center levels to maximize space, power and cooling resources. Using actual device power readings instead of the overly conservative nameplate values will allow an increase in the number of servers supported by a PDU without sacrificing availability. DCIM tools can track resource usage over time and provide much more accurate estimates of when additional equipment needs to be purchased.
How Do I Select a DCIM Tool to Fit My Data Center?
- By: Michael Potts
It is also important to remember that DCIM cannot single-handedly do the job of data center management. It is only part of the overall management solution. While the DCIM tools, or sometimes a suite of tools working together, are a valuable component, a complete management solution must also incorporate procedures which allow the DCIM tools to be effectively used.
CHOOSING A DCIM SOLUTION
It is important to remember that DCIM solutions are about providing information. The question which must be asked (and answered) prior to choosing a DCIM solution is “What information do I need in order to manage my data center?” The answer to this question is the key to helping you choose the DCIM solution which will best suit your needs. Consider the following two data centers looking to purchase a DCIM solution.
DATA CENTER A
Data Center A has a lot of older, legacy equipment which is being monitored using an existing Building Management System (BMS). The rack power strips do not have monitoring capability. The management staff currently tracks assets using spreadsheets and Visio drawings. The data has not been meticulously maintained, however, and has questionable accuracy. The primary management goal is getting a handle on the assets they have in the data center.
DATA CENTER B
Data Center B is a new data center. It has new infrastructure equipment which can be remotely monitored through Simple Network Management Protocol (SNMP). The racks are equipped with metered rack PDUs. The primary management goals are to (1) collect and accurately maintain asset data, (2) monitor and manage the power and cooling infrastructure, and (3) monitor server power and CPU usage.
DIFFERENT DCIM DEPLOYED
While both data center operators would likely benefit from DCIM, they may very well choose different solutions. The goal for Data Center A is to more accurately track the assets in the data center. They may choose to pre-load the data they have in spreadsheets and then verify the data. If so, they will want a DCIM which will allow them to load data from spreadsheets. If they feel their current data is not reliable, they may instead choose to start from ground zero and collect all of the data manually.
If so, loading the data from a spreadsheet might be a desirable feature but is no longer a hard requirement. Since the infrastructure equipment is being monitored using a BMS, they might specify integration with their existing BMS as a requirement for their DCIM.
Data Center B has entirely different requirements. It doesn’t have existing data in spreadsheets, so they need to collect the asset data as quickly and accurately as possible. They may specify auto-discovery as a requirement for their DCIM solution. In addition, they have infrastructure equipment which needs to be monitored, so they will want the DCIM to be able to collect real-time data down to the rack level. Finally, they want to be able to monitor server power and CPU usage, so they will want a DCIM which can communicate with their servers.
Prior to choosing a DCIM solution, spend time determining what information is required to manage the data center. Start with the primary management goals such as increasing availability, meeting service level agreements, increasing data center efficiency and providing upper-level management reports on the current and future state of the data center. Next, determine the information that you need to accomplish these high-level goals. A sample of questions you might ask includes the following:
- What data do I need to measure availability?
- What data do I need to measure SLA compliance?
- What data do I need to measure data center efficiency?
- What data do I need to forecast capacity of critical resources?
- What data do I need for upper-level management reports?
These questions will begin to define the scope of the requirements for a DCIM solution. As you start to narrow down the focus of the questions, you will also be defining more specific DCIM requirements.
For example, you might start with a requirement for the DCIM to provide real-time monitoring. This is still rather vague, however, so additional questions must be asked to narrow the focus.
How do you define “real-time” data? To some, real-time data might mean thousands of data points per second with continuous measurement. To others, it might mean measuring data points every few minutes or once an hour. There is a vast difference between a system which does continuous measurement and one which measures once an hour. Without knowing how you are going to use the data, you will likely end up buying the wrong solution. Either you will purchase a solution which doesn’t provide the data granularity you want or you will over-spend on a system which provides continuous measurement when all you want is trending data every 15 minutes.
What data center equipment do you want to monitor? The answer to this question may have the biggest impact on the solution you choose. If you have some data center equipment which communicates using SNMP and other equipment which communicates using Modbus, for example, you will want to choose a DCIM solution which can speak both of these protocols. If you want the DCIM tool to retrieve detailed server information, you will want to choose a DCIM solution which can speak IPMI and other server protocols. Prior to talking to potential DCIM vendors, prepare a list of equipment with which you want to retrieve information.
Similar questions should be asked for each facet of DCIM — asset management, change management, real-time monitoring, workflow, and so on — to form a specific list of DCIM requirements. Prioritize the information you need so you can narrow your focus to those DCIM solutions which address your most important requirements.
Important Functionality for DCIM Solutions
- By: Michael Potts
With more than 100 companies offering some type of Data Center Infrastructure Management (DCIM) solution (see Appendix 1 of the DCK Guide to DCIM for a partial list of vendors), it is difficult to narrow down a defined set of functional components. There are some critical elements found in many of the solutions, which include:
Asset management is a key component of DCIM. A data center can contain thousands of assets, from servers, storage and network devices to power and cooling infrastructure equipment. Tracking these assets is an ongoing and often monumental task. A Digital Realty Trust survey asked data center managers how long could it take to find a server that has gone down. Only 26% of the respondents said they could locate the server within minutes. Only 58% could find the server within 4 hours and 20% required more than a day. The inability to locate equipment in the data center increases the mean time to repair (MTTR) for the equipment and decreases the overall availability.
Asset management encompasses more than simply locating a data center asset, however. It also involves knowing detailed information about the asset’s configuration. Consider a server, for example. It may be powered by one or more rack power strips. Disconnecting these power sources will shut down the server. The server may be connected to one or more switches or routers.
Rerouting these network devices may make the server unreachable. The server may host multiple virtual machines. Shutting down the server will disable these virtual machines. Without knowing the details of the server configuration, it is very difficult to make reasonable decisions concerning that erver and its supporting infrastructure. Changes to any part of the configuration may render the server — and ts associated services — unusable.
In order to accurately manage assets and their detailed configurations, we must also manage change. It s estimated that change is often the cause of as much as 80% of system downtime and that 80% of mean time to repair (MTTR) is used trying to determine what changed. Change management therefore becomes an important part of a DCIM solution. In the book The Visible Ops Handbook: Implementing ITIL in 4 Practical and Auditable Steps, the authors examined a number of high performing IT rganizations and found that by just looking at the scheduled and authorized changes for an asset (as well as the actual detected changes on the asset) problem managers could recommend a fix to the problem over 80% of the time, with a first fix rate of over 90%. The authors also found that organizations which implemented automated change auditing were “surprised and alarmed to see how many changes are being made ‘under the radar’.” The ability to track both authorized changes and detected changes — changes made but not necessarily authorized — is key DCIM functionality which can reduce MTTR and increase overall system availability.
There are three categories of real-time monitoring systems in the data center:
- • Building Management System (BMS) – A BMS is typically a hardware-based system utilizing Modbus, BACnet, OPC, LonWorks or Simple etwork Management Protocol (SNMP) to monitor and control the building mechanical and electrical equipment. These are often custom-built systems priced on the number of individual data points being monitored (a data point might be the output load on a UPS or the return temperature on a computer room air conditioner unit). In some cases, the BMS system is extended into the data center to monitor and control power and cooling equipment.
- • Network Management System (NMS) – An NMS is typically a software-based system utilizing SNMP to monitor the network devices in the data center. Network devices can usually be auto-discovered, so installation can be automated to some degree.
- • Data Center Monitoring System (DCMS) – A DCMS can be hardware-based and/or software-based and is used to monitor a data center or computer room. Device communication is typically done using SNMP, although some data center monitoring systems can also communicate using Modbus, IPMI or other protocols.
There are some important attributes to consider when evaluating the real-time monitoring capabilities of a DCIM solution. One of the key considerations is what devices you intend to monitor. The answer to this question may have the biggest impact on the solution chosen.
If, for example, you want to monitor some devices which use SNMP to communicate and others which use Modbus, it would be important to choose a solution which supports both SNMP and Modbus protocols. Avoid solutions which only work with one vendor’s specific equipment as you will then need to purchase multiple disparate systems to monitor your entire data center. Ideally, you want a DCIM solution that can work with a wide variety of hardware “out of the box” — in other words, without any vendor customization — and can also integrate with other existing monitoring systems such as a BMS.
Another attribute to consider is whether or not the real-time monitoring utilizes a hardware component. There is nothing inherently wrong with a hardware-based system. In fact, a hardware-based system may be capable of gathering data more quickly and frequently than a softwarebased system. Depending on the number of hardware components required and the price of each component, however, the hardware cost may cause the overall DCIM solution to become prohibitively expensive.
One additional attribute to consider is whether or not the system supports auto-discovery of devices. Auto-discovery provides many benefits, including faster, easier installation and less chance for user error in manually configuring a device. It is important to note that not all devices can be auto-discovered as discovery is dependent on the device configuration and the communication protocol used (SNMP devices can usually be discovered while Modbus devices cannot, for example).
Many data centers have implemented at least some level of ITIL-like processes. A DCIM solution can help you to orchestrate these processes. For example, the installation of a new server typically has multiple steps, some of which may be performed by different groups within the data center.
A DCIM solution might allow tracking of the various steps, with different groups able to report status of their individual tasks in order to verify that all required steps have been completed. In this case, workflow functionality will coordinate the server installation steps so that all preparatory work been completed before the technician installs the server in the rack, thereby streamlining the entire process.
It is important that the workflow functionality provided by the DCIM tool is adaptable to work within your defined process structure rather than having to modify your processes to match a pre-defined workflow.
Analytics and Reporting
Another important capability of a DCIM solution is data analysis and reporting. With thousands of devices in the data center each reporting multiple measurements, the amount of data collected can quickly become overwhelming. It is imperative that the DCIM tool can quickly sort through this data and provide actionable recommendations for the management team. These recommendations can be presented in the form of alarm messaging, graphing of historical data to show changes over time, dashboards and reports. The DCIM tools may come with pre-defined reports but should also support ad hoc reporting based on user-selectable parameters.
Visualization of the Physical and Virtual Infrastructure
One important component of a DCIM solution is the ability to view the physical and virtual infrastructure. The DCIM tools on the market today vary widely in their capabilities here. Some interact with visualization tools such as AutoCAD or Visio, while others provide a visual editor to allow you to lay out your infrastructure entirely within the tool. While most of the current solutions provide top-down views, some also provide 3-D views with the ability to “fly through” the data center. Many solutions provide various layered views of the data center with the ability to view various parameters such as temperature, rack utilization, power and so on.
This visual view is typically extended down to the rack level, with DCIM tools providing a visual view of the devices in the rack. This view shows the actual location of a device within a rack and also serves to provide additional data such as the temperature in the rack at various points and the power usage within the rack.
If DCIM boils down to information, a good DCIM user interface boils down to providing that information in such a way as to allow the user to make informed decisions. In his white paper Five Essential Components of an Elegantly Engineered Data Center Operating System, Kevin Malik describes the importance of the DCIM user interface, saying “It is essential for a data center operating system to have an intuitive interface so users can quickly navigate through alerts, review environmental levels and review other detailed analytics.” He goes on to add, “Companies should be able to customize the views of real-time data of mechanical, power, cooling and electrical usage so decision-makers see information needed based on their roles to optimize data center operations.”
Like the visualization component, DCIM user interfaces vary widely in both their look and feel and their overall capabilities. While most DCIM products are web-based, allowing access to the data from anywhere, the user interfaces can take many forms, including dashboards, touch screen technology, and application support for hand-held devices such as iPads and smart phones.
One of the primary uses for the data collected by DCIM applications is to provide information for capacity planning. Data centers operate most efficiently when they maximize the use of key resources, particularly power and cooling. By storing the resource consumption over time and analyzing growth patterns, data center managers can more accurately predict when a given resource will be exhausted. Through the use of DCIM tools, data center builds can frequently be postponed due to more effective management of key resources.
Integration with Other Data Center Management Solutions
Contrary to what some DCIM vendors might have you believe, DCIM solutions will likely never replace all of the management tools available for the data center space. Typical management solutions include change management, CFD modeling, asset management, building management systems, maintenance management and a number of other third-party or in-house developed tools. A good DCIM solution will provide some type of integration with external systems, ranging from loading Excel spreadsheets to direct database interaction to sophisticated web-based API (application program interface) which might allow the data to be passed both into and out of the DCIM solution.