“Introduction to the World of Virtualization and Defragmentation Part 2.”
Complete Transcript of
Michael Materie – Rick Cadruvi – Diskeeper Interview
on Let’s Talk Computers
Host Alan Ashendorf
June 6 2008
Alan: Today on Let’s Talk Computers we will continue our conversation on “The Introduction of the World of Virtualization.” Our guests today are Michael Materie, Spokesperson for Diskeeper Corporation and Rick Cadruvi, Senior Software Engineer with Diskeeper. Welcome back to Let’s Talk Computers, Guys.
Michael: Hi, Alan. Thanks for having us on.
Alan: The last time we touched on just some of benefits of why companies and corporations are turning to virtual computers and you talked about the fact that virtualization is just in its infancy.
Alan: Are there any kind of standards that all companies must adhere to or is it that each company is trying their own way of virtualization?
Michael: There definitely are a number of types of common virtualization – three that we could cover really quickly. The hypervisor type of solution that would include the VMware ESX technology, which would include the Microsoft Hyper-V functionality, which is coming out within the next few months from Microsoft for Server 2008. More common in developer arena or in the gamer or home user arena are some of the freeware tools like Virtual PC that Microsoft puts out or VMware Server product. VMware has that is free.
Parallels would be a great example of a popular product that Mac users like to use. And then there is application virtualization, which would allow companies like Altiris that just virtualize an application itself and kind of sandbox it.
Alan: When you are looking it from an Admin’s point of view, I know that you have graphs and you have statistics on where the files are on the hard drive, but when you’re talking about virtualization, you could be running an email server, you could be running an SQL server all on the same physical box, but each one of those is like a separate virtual process. How can you tell which is being defragged and which needs to be defragged?
Michael: The best way to handle that is to let the Software make the decision for you. It takes the Administrator’s management overhead or required intervention out of the picture and makes it much simpler to come to a solution.
That’s one of the technologies that we have with for example, the Diskeeper Product, is the ability through special technology like InvisiTasking to allow the Diskeeper Software to run with a zero impact on the computer, which is definitely vital in a virtualized arena and it allows that SQL server, for example in Diskeeper running on that virtual machine instance to determine what it needs to do to keep that SQL Server Operating System running at peak performance within the logic of the Product, itself.
And on the email server, it allows it to figure out what it needs to do there to keep it at peak performance, as well.
Alan: You’re talking about your InvisiTasking and you’re talking about being “at idle,” which means it’s not going to have any impact to the computer system; but in a virtual world, aren’t we always doing something, so how does it decide when the best time is to jump in and do defragging?
Rick: In the virtual world, you are still going to have some form of idle cycles. You may or may not have, depending upon the servers that are running, but generally speaking, there will still be some and with InvisiTasking and the work that we’re in the process of doing and we have some capability to look at the resources that are being used and figure out that not only is the other resources in one particular instance of a virtual machine, but that there are resources globally across all the instances and across the server, as a whole.
And therefore, to try to fit in and just use those resources that are available at any one point and time, as opposed to having to work on the basis of the data from a single instance. We take a more global approach. We look at this as a whole, across all of the instances of virtual machines, rather than just on a single server.
Alan: I wouldn’t even try to guess how many hours upon hours of testing and benchmarking it has taken to figure out exactly the best way to do this. Just kind of give us an idea of how this new technology came to market?
Michael: As Rick mentioned earlier, it has been a couple of years now that we have been working on the technology, so it’s been on our plate and it is something that we have been working closely with at Microsoft. We’ve had numerous occasions where we have sat down, face to face with their lead product managers over their virtualization technologies and figured out how can we really best work together to optimize performance. It’s a back and forth knowledge exchange.
It’s definitely been conducive to our research and I think we have definitely provided some good information and feedback to Microsoft and the other virtualization vendors that we’re working with. The end result will be from our knowledge of how systems and expertise in disk performance and working with the vendors who provide virtualization that obviously will rely on the disk and getting peak performance out of it.
I think the end result will be that this infant market of virtualization will really come to fruition and be much more readily adaptable to an administrator who just wants to deploy and not have to micro-manage or have to figure out all the specifics of if something will work or not. Basically, we will get a technology that they can really plug and play and get peak performance.
Alan: Can you share some of what the worst mistakes that somebody makes when they are going in to try to virtualize a server? What kind of guidelines do you not want to do?
Michael: That’s actually one of the trickiest parts of – especially in the enterprise – to figure out what can be virtualized and what cannot be. Like for example, VMware and some of the major vendors out there have really good tools to help the administrator do so, but the key consideration really is – what are the physical resources? You don’t want to put so many virtual machines on a computer that the CPU is always running at 100%. That’s a bad scenario. It will kill every machine on that computer.
So, you have to know what a particular operating system’s requirements are going to be and the applications within and what kind of resources they need and make sure that you use in a virtual environment only where it’s not going to overtax the actual physical hardware. So, for an end user, say a gamer for example, they are more than likely only going to be running one virtual machine at a time or so. So, it’s less of a consideration.
But in the Enterprise, you may have multiple running at the same time, all needing to do production type of activity.
Alan: From a gamer’s standpoint, being in a gamer’s world is completely different than any thing else that we have seen out there. How do they use virtualization?
Rick: I would say in the large part, probably it’s not being used for gaming. The key consideration that virtualization is one of the key concepts of it is to emulate hardware. A gamer’s going to spend $500 to $1000 plus just on their video card. That high-end video card is not going to be virtualized or emulated exactly to its full specifications in a virtual environment. More than likely the high-end type of graphics requirement gains are going to be run right on the bare metal. There’s still some technology that can be added to virtualization for the gaming kind of needs.
Alan: Well, when we did an interview AMD, talking about their new Spider technology, the motherboard that they suggest runs four video cards at the same time, can have massive amounts of hard drive and massive amounts of memory and all of this has to be watched over to make sure that everything is running smoothly, doesn’t it?
Rick: Those kinds of levels of resources becomes a nightmare for the administrator, the person who is administrating the thing, because you have a huge amount of resources and you have environments that want to use those resources and you have to make them available and you have to think about the consequences of the use of those resources.
Alan: It’s like a common mistake or fallacy that people have that, “If I start using massive amounts of hard drive space, like Terabytes of hard drive space, especially in a virtual environment, I really don’t have to keep my machine defragged because there’s plenty of free space on the drive. I’m never going to have to worry about it.”
Michael: Maybe a common misunderstanding on how file systems allocate file rights, given free space or lack of free space. Even if there’s a big, huge chunk of say, a GigaByte of free space, there is no guarantee that as a virtual machine on the hard drive is going to expand into that one GigaByte. They may, unfortunately, expand into a small, little 4K or 8K space, directly in front of it.
Fragmentation is really just a fact of life with really any modern file system. Windows is one that, obviously, we have specific attachments to because that’s where our Products, our designed for, NTFS, that Fat Systems, but there is no miracle functionality within the virtual environment that obfuscates the need for defrag.
Alan: Is there a difference between, say the NTFS file system and the FAT32 file system, as far as which one needs to be defragged the most?
Rick: NTFS is definitely a more efficient file system in terms of the ability to deal with space, but it’s much more sophisticated and complex. And that level of complexity tends to also add to the level of fragmentation that happens. So, while in general, one can expect better performance from NTFS than from FAT, one can generally also expect a higher level of fragmentation, which means that running some kind of a defragmenter like Diskeeper becomes essential one NTFS file systems.
Alan: I can see why it would have to be essential on any kind of machine that’s going to be running any kind of virtualization. Let’s just take an example: We have a server that is a physical box and it has like an email server that is virtual and the email server, of course, has to have some kind of anti-virus, anti-threat software running in the background and the physical box has to have kind of anti-threat, software. And both of them are going to be hitting that hard drive like lightening, aren’t they?
Rick: Yes, and they’re going to be competing for resources and every time they have to hit a file that is fragmented, they are going to be adding significantly more work to be done on the hard drive and that’s going to slow things down and it’s just going to magnify itself when you have like the example you used earlier – an email and an SQL server, both running on the same virtual server, along with what’s going on down on the actual partition for the main operating system for the hypervisor that’s running the virtual systems.
Alan: And I imagine this has a major effect when you’re trying to do backups, because nowadays, you’re trying to do real-time backups to make sure that if anything happens, this oops factor, you can always get back in just as quickly as possible; because if you’re backing up files, you’re reading the whole file into memory to move it over to some place. And fragmentation can really slow things down, can’t it?
Rick: Oh, it can make things grind to a hault and you’re right, as you add more of those kinds of things on a single system in a virtualized environment, things really grind to a hault. So, it’s absolutely essential that fragmentation be dealt with so that at least when you are trying to read things, you are reading from contiguous areas of the hard drive, as opposed as to trying to read from places that are scattered throughout the hard drive and you’re banging the head from one end to the other – and requesting lots of additional I/O’s to the hard drive in order to deal with picking up the data, as you describe it to make it available for backups.
Alan: When we first started looking at computers, we were lucky just to have a hard drive itself and now we’re looking at Terabytes of hard drives that we can hold in our hand and all of this has to be really efficient or else it becomes the bottleneck, itself.
Rick: The disk drives have always been the bottlenecks, since we started having disk drives and they will continue to be; the CPU’s have gotten to be so fast; memory has become cheap that you can have a massive amount of it.
Work has been done on the operating systems and in software applications to utilize more memory, to avoid hitting the disk. But you’re right; with Terabytes, we’re even worse, PetaBytes of data in the enterprise marketplace, the disk becomes a huge bottleneck. Anything that slows that down such as fragmentation is definitely the bane of the enterprise server.
Alan: What are we looking as far as the price of getting started taking care of our virtual computer systems?
Michael: Our Products start at $29.95 and that’s primarily the consumer space and they go all the way up to the top of the line Server Products that is for somebody that is running a SAN may want to use and they go on up to $999.
Alan: And if somebody would like to find say, some White Pages on virtualization or defragmentation and the effect of fragmentation, can they go to your Website and find these White Pages?
Michael: They go to our Home Page, http://www.diskeeper.com
Alan: Michael and Rick, it’s been our pleasure to have as our guest here today and I look forward to having you both back on the air again, real soon.
Michael: Well, thank you very much.
Rick: Thank you, Alan.