Virtualisation and the crowded elevator
Over the last few weeks I’ve been sitting in on Sun’s advanced administrator courses in a “Train the Trainer” type capacity, so that I can deliver them to the next batch of unsuspecting students. Given that virtualisation is a hot topic at the moment, it’s not surprising that there is demand for courses that cover these technologies. Right now, I’m all over topics such as Solaris Containers, CMT LDOMs and hard partitioning on the high-end server platforms. Although these are technical courses aimed at hard-core Solaris admins, I figured that I'd throw in a quick overview of how Virtualisation saves money, even if it’s just so the engineers can explain to their managers that it is more than just really cool technology and to justify being sent on more courses.
The classes are run in a third party training facility, who provide the rooms, desks, network connectivity and break out areas replete with lukewarm drip filter coffee. By keeping an ear open on the other classes running concurrently, I’ve learnt about the next revolution in chocolate marketing, the funky electronic control systems that the control process engineers are mastering, and the pending successes and fortunes of the “Yes You Can!” self-helpers as they are visualised from a prone position on the floor of their darkened room.
Last Friday was a little different. Nearly every room was booked by the proctors of the English Proficiency Test which overseas Medical Professionals much pass in order to practice in
The facilities provider was, naturally enough, pretty happy to be fully booked. I was slightly less happy, as the exam candidates drank all the drip-filter coffee, and in order to go outside to grab my double shot skinny latte with one, I had to queue with this mass of humanity for the one of the two functioning lifts, accompanied by all the pushing and shoving that you would expect from an overseas doctor facing a must-pass English exam.
Anyway, back to Virtualisation, and a quick reminder of how it saves you money. Feel free to skip the following paragraph if numbers scare you.
Somebody somewhere determined that your typical datacenter server is has a CPU utilisation of approximately 10%. This is a nice round figure, so let’s run with it. Depending on the technology, the hypervisor or host OS or other “shim” will add overhead of between 5-15%. Let’s call it 10%, because it’s another nice round number. That leaves 90% of your CPU cycles for guest operating systems, and knowing that each one only needs 10% off a full CPU, we can consolidate our server count down by 9-to-1. This reduces datacenter rack space, power and cooling costs, server maintenance costs, and if you standardize platforms, management overhead as well. These are the sort of figures that everybody wants to hear in these tough economic conditions, with the possible exception of the server vendors.
If that last paragraph left your head spinning, allow me to paraphrase (steal) an analogy I heard during a virtualisation pitch. (I can’t remember from which vendor, so can’t attribute it, but please let me know if it’s yours):
“Imagine what the CEO would say if you’d authorised the purchase of a ten storey building, but then only used the ground floor? What if you bought nine such buildings and only used the ground floor in each? You’d be sacked, right? So why is it acceptable to only use 10% of the server assets you’ve purchased?”
Here the office building is analogous to a Server, an individual floor to a single application and floor space is to CPU resources. Because we are only occupying one floor, 90% of the building is going to waste. If we had nine such buildings, we could move everybody into one, and sell the other eight to developers for swank inner-city apartments, reducing our land-tax bill, council rates, and other ongoing expenses.
Let me stretch the analogy a little bit.
The front foyer door is the connectivity to the outside world. Everything coming in and out of the building must pass this way. The internal elevators move everybody from the foyer to their particular floor, where they congregate in functional groups – accounting, sales, marketing etc.
The front foyer obviously represents our connectivity to the outside world: our IP network, FC disk, tape. The internal lifts form the physical I/O subsystem: the internal busses, ASICs and drivers that get the information in and out of the physical server. The people working in the building are data, and their functional areas are applications.
Now, let’s look back at the training centre during last week’s peak. While the building (CPU) was fully utilized with many concurrent exams (applications) spread across multiple Floors (Virtual Machines), there was a bottleneck in the lifts (I/O subsystem) while the examinees (data) were loaded in. This was fine for the testing (a load and run batch process), but prevented me from getting to my coffee (a real-time process unable to access external resources).
If you’ve ever come between me and coffee, you understand how critical this situation really is.
The upshot of all this?
While virtualisation technologies can be used to consolidate several physical machines, reducing business costs such as server hardware maintenance, power and cooling, it’s not always as simple as marketing suggests.
- Understand that CPU is just one single resource inside a server, and different applications use different resources in different ways, and all must be accommodated.
- Dig out the block diagram of your server. Find out how the PCIe buses are distributed. Look for different ports and populated slots sharing ASICs. Look for bottlenecks in the backplane or I/O bus. Move HBAs, CNAs and network connections to minimize conflict.
- Recent servers tend towards more independent I/O channels with wider paths, but CPU speeds, core count and multithreading thread technologies are developing much faster than I/O technologies. While a CPU upgrade may be as simple as dropping in a faster chip, I/O requires the entire mainboard to be redesigned, so be aware that these issues may reappear as older models get longer in the tooth.
- Run up a test platform with a production-like instance of the applications you want to co-host, and stress-test those that are likely to run simultaneously. Remember week-end, month-end and year-end cycles and test as if they all fall on the same day.
- Test your I/O – your network responsiveness, your disk access, your backup and archiving procedures.
Finally, consolidation is but one aspect of the virtualisation story, so even if your key applications aren’t a target for consolidation, there may be other reasons to bring it into the virtual world.
I’m off for a coffee.
Comments
Post a Comment