Most of us are familiar with the quote that “if you cannot measure you cannot manage”. In all fields, spanning technology and management, a set of metrics are established to measure against stated objectives. The metrics should tell the stakeholders about how the system is performing. The metrics on a business can be from several different perspectives: financial, customer satisfaction, environmental impact etc. Just one aspect, such as financials, do not tell the whole story. If the board of a company looks at only the financial aspect ignoring other areas, it may be myopic. Today a company may be doing fine from financial metrics such as EPS, revenue, profitability numbers. However, if customer satisfaction index and its brand value due to environmental impact are poor, it doesn’t augur well for the company. Similarly, a data center needs to be viewed from different angles: cost efficiency, power consumption, reliability, customer satisfaction to make the measurement all rounded.
PUE – is that the only metric needed in a data center?
PUE – Power Usage Effectiveness is the most well-known of all data center metrics. At the core of the data center are the computing units – server, storage, switches, which runs the application, stores the data, and communicates internally/externally. One of the primary cost of running a data center is the power consumed. The power consumption has two components: power consumed by computing units and power consumed by rest of the facilities equipment such as cooling. The PUE is calculated by dividing the total power consumed by the data center with power consumed by the computing units. The lower the PUE the more efficient the data center is. If the PUE of a data center is 2 it means 50% of the power is used by computing units. Now if we can bring down the total power assuming that the power drawn by computing units remain the same, then we have increased the efficiency by reducing the overhead of such functions as cooling.
The importance of PUE cannot be denied and every data center should strive to get it as close to 1 as possible. However, PUE is not the only metric. The data centers have to consider several other metrics. Furthermore, PUE can also be deceptive. For e.g., if one replaces the computing units by something which consumes less power , the total power drawn will be less but PUE will increase. For similar reasons PUE cannot be used to compare data centers. If a data center is running mostly on renewable energy then its impact on environment is marginal even though its PUE may be slightly worse than PUE of comparable data centers running on conventional energy.
Reliability and availability
A data center not only needs to be efficient from a cost and power perspective, it needs to be reliable and available, considering that most data centers are running business critical applications as more and more applications are hosted on the cloud. No customer will tolerate partial downtime, let alone for the whole data center. Hence the metrics which measure reliability and availability are important. The metrics that measure availability for assets such as MTBF (Mean Time between Failures) and MTTR (Mean Time to Repair) are important and should be measured. The other measure of reliability is the number and category of alarms being raised in the data center and how quickly the alarms are being responded to.
A data center needs to be customer centric, gone are the days when a data center ran outside the glare of the core business. Today it is intimately connected with a business whether it is a captive data center or a data center providing facilities for others. A captive data center runs the core business of different LOBs and it needs to respond to the needs of the LOBs. A data center, which provides colocation and hosting services, has to be customer centric in its operations. It has to ensure customer provisioning requests are satisfied and any customer ticket closed with satisfactory SLA. So for data centers, captive or otherwise, compliance with SLA is extremely important and that can be measured by provisioning request or service tickets that fall outside the SLA – percent not meeting SLA. Closely tied with customer satisfaction is the capacity of a data center. As long as the data center has sufficient capacity in terms of power, cooling and resources it will be able to service provisioning request quickly. Hence measuring the capacity at all times is paramount for a data center.
I had recently hosted a panel discussion on data center metrics and the panelists pretty much concluded that metrics is extremely important for a data center operations and the metrics need to be viewed for the different areas, as outlined above. Also with the availability of DCIM software from companies such as Greenfield it is easy to capture and view these metrics on a real time basis. Greenfield’s software GFS Crane provides dashboard with key metrics such as PUE, availability, capacity utilization etc. In addition, one can have drill down reports to see a granular view. With automation, provided by such software such as GFS Crane, it is easy to stay on top of things and react with agility as situation changes or take pro-active steps wherever possible.