Kate's Comment

Thoughts on British ICT, energy & environment, cloud computing and security from Memset's MD

Why We Don’t Need Icelandic Data Centres

Better cooling is definitely one reason why data centre operators are attracted to colder climates like Iceland, but there is a hole in this argument. Whilst free cooling is very efficient, in most mainland European climates, adiabatic cooling works really well and allows you to achieve almost the same efficiency.

In mainland Europe & UK it never gets hot and humid at the same time. On hot summer’s day in the UK, 35°C for example, the the wet-bulb temperature (the temperature of something wet) never actually gets above about 23C. Even accounting for global warming if you ramped the wet-bulb temperature up to 25°C it really is the maximum humidity you need to account for.

Adiabatic cooling takes advantage of this by spraying water into the dry air coolers (big radiators) located on the roof. You normally get about a 10°C temperature variation, enabling you to maintain a temperature of 35°C even on the hottest summer day.

We’ve designed, built and operate a data centre that makes use of that theory, which is also recommended under the ASHRAE guidelines. We have no need of back up DX cooling (traditional compressor-based air conditioning) except for our plant room either.  The reason is that we know the maximum wet-bulb temperature will never go above 25°C and modern servers are warranted up to 45°C – therefore we have 10°C degrees of headroom even in the event of extreme temperature excursions. The plant room needs to be kept at a constant 20-22°C because of the UPS batteries, so if planning a new data centre don’t put your UPSes in the same space as your servers.

By letting the data centre run hot, with the occasional excursion up to 35°C on a few days of the year, comes significant cost savings – and not just in less DX cooling kit. If you have a higher temperature differential between your dry air coolers and the ambient air the waste heat is transferred more quickly. Therefore the cooling system does not have to work as hard overall, eg. the fans and water pumps can run slower. The energy required to move air (or water) tends to go up as the square of velocity (in a turbulent system, which these are), so everything is much more efficient when running slowly.

Overall we expect our data centre to operate at a PUE of roughly 1.2 when full.  Given that even the best free cooled data centre is unlikely to manage better at than a 1.1 PUE, we think its worth the extra 10% on our power bill.  In addition, being located in Dunsfold Park in Surrey, which houses a large scale solar farm, means our data centre is partially solar powered which more than makes up for the 10% difference.

Memset’s data centre was built to the old CESG IL3 standards (IL4 for physical security) and they signed off on the resilience and architecture based on our designs & recommendations at the time. It was encouraging to find them so enlightened given conventional wisdom.

 

Toasty servers

Most people think there is an increase in hardware failures at higher temperatures. The highest Annual Failure Rate (AFR) is rotary hard disks (HDDs). It is about ~0.5% for SSDs. We typically see an AFR of about 2% for HDDs younger than 3 years. Google did a study of a population of 100,000 HDDs and surprisingly found that HDDs actually have a lower AFR with higher temperatures up to about 45°C (see chart). HDD failures are not generally a big issue for us anyway since on our VPS and dedicated servers everything is at least RAID(1) and hot-swap, and on our OpenStack and Memstore platforms all data is triplicated.

HDD AFR temperature distrubution - Google

HDD AFR temperature distribution – Google

Server chassis (CPU, PSU etc) failures are potentially more disruptive to our customers. We typically see an AFR of 0.9% for machines under 3 years old. The University of Toronto did a study of server failure correlations with temperature spanning over a dozen data centres across three organisations. They found that RAM and node failures showed no overall correlation with temperature, and server internals AFR only increased linearly up to about 50°C, contrary to generally accepted wisdom.

If you’re planning on operating at a higher temperature you do need to adjust the behavior of your servers though. They generally tend to crank up the fans as soon as the temperature goes up a little and because they are very small they have to spin very fast to move a useful amount of air. This means they can use a quite-astonishing amount of power (easily 20% of total server energy draw). You need to tweak the BIOS settings to modify their behavior otherwise they will negate the positive effects.

 

Latency

For most services, cloud based or otherwise, latency is a big factor. Ideally, you want your data centre to be close to your customers. Generally on the same major landmass is a good rule of thumb. That’s why you don’t tend to find US based data centres serving content to UK businesses or customers. Amazon’s Ireland based data centres are an exception.

Some services are more latency sensitive, especially gaming. If you’re in the UK, having your data centre located near a major Points of Presenc (POP) like London will help keep latency under 20ms.

When you start having to use under sea links or going to remote parts, like colder countries like Iceland for example, you definitely have to factor in a big latency difference. Even when browsing a website, a human can perceive a delay of 50 milliseconds, especially with modern, complex, image-heavy Web pages.

That’s not to say that you need to be right in the middle of an urban centre – that just adds unnecessary costs. Out here in rural Surrey, outside the M25, we get about a 1.5 millisecond latency back to central London hubs (Telehouse etc).

 

Risking Power Supply

The advantage of being in relatively build up areas, is the mature and resilient power structure you have access too. It is a gross generalisation, but common sense that the further you go from major populated centres and the further north you go the more sparse the population is, the worse your power connectivity is.

This is slightly mitigated by the fact that backup power is good. But any data centre operator knows you don’t want to be using it unless you have too. If you are having to deal with 1-2 brown-outs a month you are elevating the risk factor. Whereas near urban areas you’d be unlucky to see 2 brown-outs a year.

 

In Summary

In summer, you don’t need to go somewhere cold. You just need to let your data centre run hot and spray some water into your dry air coolers. That way you can avoid the potential risk around your power supply and you don’t have to compromise your latency.

2 comments