Future Facilities recently wrote on the impact of predictive modeling for data centers. From the article:
Predictive Modeling: The Next Frontier in Data Center Condition Maintenance
Dave King, Senior Data Center Engineer, EMEA Future Facilities
Steve Davies, Product Marketing Manager Future Facilities
Given the fast pace of change within any company, the chances are fairly small that the IT plans created by the design consultant bear any resemblance to the equipment that is actually installed in the current facility. Add to this IT disparity the various energy efficiency drives which will have changed the infrastructure from the original design, and data center managers are left trying to fit square pegs into round holes.
Adding more environmental monitoring will have yielded some useful information, helping to make a few informed choices that have reduced the number of critical events, but putting out fires remains too large a part of the job. Many of those fires could be avoided if the right information had been available. This is precisely what engineering analysis and predictive modelling does.
Engineering analysis and predictive modelling are essential tools in a data center operator’s fight against downtime. Data provided from modeling different scenarios in a Virtual Facility can provide crucial information that is simply not available using any other method. In addition, calculating your Availability, Capacity and Efficiency (ACE) Data Center Performance Score provides a simple way to analyze, compare and communicate the effect different options have on your very complex system.
Understanding Predictive Modeling
Predictive modeling is the process of using a computer model to derive information about the future state of a system, in this case a data center. This computer model is based on the mathematical descriptions of the physical components within the data center. While this sounds like a complicated premise, in reality you run models of this kind in your head all the time. For example, you know that to work out what the current draw will be in a particular rack after an installation, you just add the current draw of the new server to the existing current draw of the rack. That is a predictive model. You analyze this model to predict whether the breakers can handle the extra current when the server is installed and, if not, you act accordingly.
Unfortunately, air is not as well behaved as electric current. The rules for working out how it flows around a data center are many times more complicated than a human can work out in their head. Luckily, the complicated rules needed to predictively model airflow are exactly the kind of math that a computer loves to chew through. This is where the Virtual Facility comes in.
The Virtual Facility is an accurate 3D computer model of the data center that is built by data center engineers at a predictive modeling firm and is then maintained by the facility staff. It will integrate with your current data center management toolset, be that one or more spreadsheets, or a full data center infrastructure management (DCIM) suite. The powerful computing engine built into the Virtual Facility, using well understood models of the different physical properties of air (temperature, pressure etc.), takes that 3D model and uses it to work out the conditions and the state of the cooling system. This data is then presented back in an easy-to-understand way.
Figure 1. Take the laws of physics and the future configuration of the data center, then put them into a 3D model of the facility. You end up with a model that shows the impact of future configuration changes on your data center’s performance.
One of the major benefits in this system is that the Virtual Facility can be set up in any way and the conditions calculated and analyzed. This means that an operator can investigate any scenario the data center might be faced with. This can include future IT layouts, cooling failure scenarios, or energy efficiency measures. An accurate Virtual Facility gives the operations team a window into the future – they can visualize the consequences of changes in the facility on the availability of IT, the efficiency of the cooling delivery and the remaining physical capacity, before they actually take place.
The ACE Data Center Performance Score
At a high level, the data center is a trade-off between three intertwined variables: availability of IT, physical capacity and cooling efficiency (ACE). Running simulations in a Virtual Facility generates a great deal of data, and analyzing the information through an ACE score will help to interpret it. An ACE score takes the information provided by the Virtual Facility and condenses it into three numbers and a triangle that easily conveys the most important information:
1. Availability – How much of the IT load is available under all designed failure conditions. Expressed as a percentage, this indicates any single point of failure in the cooling delivery system.
2. Capacity – The amount of the design capacity that is available for use. This looks into the future of the data center and tells you how far you will be able to load your racks towards the design value before hot spots prevent you from going further.
Figure 2. An example of a data center’s current ACE score and the changes that would occur if a new cabinet layout was implemented. Data from the Virtual Facility as interpreted by the ACE Score shows a dramatic decrease in IT availability and physical capacity without any benefit to cooling. By modeling the layout in the Virtual Facility, the company avoids making a costly error.
3. Efficiency – The efficiency of the cooling delivery. The ACE score takes the data center’s previously established efficiency metric (Power Usage Effectiveness, or perhaps something more customized), and adds to the mix how efficiently air is being delivered to the IT. See Figure 2 for an example of an ACE Score.
to decide what equipment is going into the building. By utilizing the Virtual Facility and analyzing the results from predictive modeling and the ACE score, data center operators can enjoy greater control. By being able to predict the impact of proposed changes on IT availability, cooling efficiency and long term capacity, and then communicate the information back to the stakeholders, it now becomes a business decision to either accept any issues or work to find a solution.
Unlike the operations team of most data centers, where the focus is almost exclusively on the availability of the IT, the job of a data center engineer is dependent on there being no loss of IT service in the facility due to a failure in the infrastructure. In real terms, this means ensuring that every piece of IT hardware in the facility can get enough power and enough cooling under all circumstances. The challenge is implementing a system to ensure that this happens.
Figure 3: Engineering Analysis and Predictive Modeling with a Virtual Facility: Before (top) and After CRAH Failure (bottom). In this example, the equipment in the rack is cooled by a single CRAH (top image). The operator wishes to know whether it would remain resilient were that CRAH to fail. However, this model shows how the failure of the feeding unit (bottom image) would result in the storage rack pulling in hot air between gaps in the containment. This hot air (red and orange streamlines) originates from the exhausts of the IT in the racks opposite! This could result in hard drive failures, shutdown of the unit or loss of data. In this case, the operator discovered this only because they had predictively modeled it in the Virtual Facility – remedial action can be taken before downtime occurs.
Engineers strive to provide each piece of IT equipment with two sources of power, and there will be schematic single line diagrams for the entire power system that can be inspected to see any pinch points or single points of failure. The impact of any failure or maintenance on elements in the power system can be seen. On the cooling side, there are usually a number of redundant cooling units (maybe N+1 or N+2), which means that some CRAHs (computer room air handlers) can be lost and heat will continue to be removed from the facility. However, this does not always guarantee that the cooling can still be delivered to all the IT. What is truly needed is a line diagram for the airflow, which is exactly what the Virtual Facility provides. The Virtual Facility not only models temperature and pressure, but also models where the air goes. This means that it can be used to visualize how IT equipment is being cooled and identify any potential pinch points or points of failure. See Figure 3 for a visual example of how a Virtual Facility tracks airflow from the IT inlet back to the source.
Conclusion
Working in the operations team in a modern, mission critical data center is a difficult job. With no control over the equipment being installed, yet with complete responsibility for the infrastructure supporting it, staff are placed in a difficult position while trying to maintain 100% uptime. This in turn has a dramatic effect on the running efficiency and long term capacity of the data center. Implementing an engineering analysis on the results of predictive modeling and the ACE performance score through the use of a Virtual Facility is the best way to regain control and start actively managing IT service availability. The ACE score provides an invaluable way to communicate complex engineering conclusions to all the stakeholders in the data center, illustrating to IT the consequences of their actions in an easy to understand format, allowing both groups to together to achieve the goal of 100 percent IT service availability.
For more information contact Future Facilities.