Thoughts on British ICT, energy & environment, cloud computing and security from Memset's MD
Normally our data centres have plenty of over-capacity in their air-conditioning systems. Cooling a data centre is one of their big design challenges – each of our 1 metre square racks uses around 4KWatts, all of which gets turned into heat which is roughly the same as four electric fire bars; standing behind one is positively toasty! Believe it or not we are fairly conservative as to how we stack the servers as well – a rack full of blade-servers might easily double that figure.
The record temperatures this month have caused problems though. When the outside air temperature increases it becomes harder for the air-con units to dump heat – after all, for the external units to be able to radiate heat away they need to be hotter than the ambient temperature, and that is compounded by the fact that the area they are trying to cool is being additionally heated as well. When temperatures spiked to over well over 30 degrees Celsius earlier this week one overworked air-conditioning unit at our Fareham site failed. The data centre team was swift to respond and it was back up and running within an hour, however what under normal circumstances would have been a reduction in capacity well short of the safety-margin over-capacity actually meant there was not quite enough cooling for that brief period, thanks to reduced efficiency of the air-con units and the generally increased ambient temperature.
The result was a small rise in the building’s internal temperature, which was then compounded. As the temperature increased slightly, the hotter-running servers had to increase their fan-rates to keep cooler, and hence use more energy. On top of that CPUs tend to become less efficient as they heat up, again using more power. More power usage means more heat generation, and suddenly you have a positive-feedback loop, although thankfully quite a slow acting one.
Thanks to a swift response no serious harm was done; however one of our busier machines did manage to pull a whopping 400Watts and contributed to a power-trip being blown which, frustratingly, caused an unscheduled reboot for the handful servers on that power bar.
Along with increasing energy costs and a moral responsibility to battle climate change, this sort of technical consideration in the face of ever hotter Summers is yet another reason why IT hardware & infrastructure providers need to have energy firmly on the agenda. We certainly do.