I felt compelled to write a little bit about this subject
after reading recently about some new updates to software SANs. The glamour of
the virtual platform layers and Cloud have somewhat overshadowed all of the
virtualization already occurring within storage, and the extra levels that are
added “below decks”. It’s a topic meriting some scrutiny from any storage
administrator committed to high performance.
Outside of the physical data store itself, every element of
the I/O path above it is virtual. It should also be noted that at essentially
each step along this I/O path, infrastructure customization and proprietary
technologies can (and often do) vary or add new virtual layers to the process.
All of these logical abstractions have evolved from various sources in the
storage ecosystem in order to drive scalability and agile responses to disaster
or growth.
Let’s consider a common hypothetical path that an I/O
request takes from a Windows client VM to the physical data store in a modern
infrastructure. In this example, the storage for the client VM is a virtual
RAID 5 configured of LUNs from a SAN. An I/O request originating at the top OS
level, Windows in this case, will go through these underlying levels before
getting to the actual physical storage device. Windows > volume manager >
virtual RAID > SAN LUN > physical store (with the possibility of
additional abstraction levels based on storage customization).
Based upon how the storage infrastructure has been
established in this scenario, there is a virtual RAID 5 implemented above the
SAN LUN layer. That being the case, the volume manager directs the request to
the virtual RAID beneath it. Due to Striping, I/O at this stage can end up
fractured (intentionally) by the RAID, based on how the array has been
provisioned. The I/O Path has now become distributed, and may be even further replicated
on its way to physical storage.
The RAID sends its request to the SAN LUN below, another
abstraction from the physical storage itself slicing the store into basic
logical units. The SAN LUN layer completes the request directly to the physical
storage. The data is then returned along the same route to the requester.
Now, numerous solutions exist for managing communication and
throughput within this data pipeline. Administrators can tailor their RAID
presentation, ensure partition alignment, upgrade the underlying hardware, even
add new software abstraction layers intended to organize data better at lower
levels. However, an interesting concept emerges after review.
None of these solutions handle the most basic, and one of the
most critical vulnerabilities in the existing ecosystem’s performance: assuring
that the file request is as sequential and rapid as possible at the point of origin. Whether
virtualized as is so common today or installed over direct-attached storage,
Windows Read and Write performance is degraded by file and free space fragmentation
at this top level as it causes more I/O requests to occur. Each request through
all of the abstraction layers greets its first bottleneck at the outset, in how
contiguous the file arrangement is within Windows. Optimizing reads and writes
at this upper level helps ensure in most cases the fastest I/O path no matter
how much or how little storage abstraction has been structured beneath.
In a recent white paper, Diskeeper Corporation tested a
variety of I/O metrics over SAN storage with and without file fragmentation
being intelligently prevented and handled. In one such test, Iometer (an open
source I/O measurement tool) displayed over a 200% improvement in IOPS (I/Os
per second) after Diskeeper 2011 had handled Windows volume fragmentation.
Testing was performed on a SAN connected to a Windows Server 2008 R2 virtual
host:
You can read the entire white paper here: http://downloads.diskeeper.com/pdf/improve-san-performance.pdf