When anyone hears the word “fragmentation,” they typically think of a physical disk platter problem where the fragmentation of physical blocks results in disk latency due to the extra head movement required to process a file. What about solid-state drives (SSDs)?
Even though blocks can also become fragmented on an SSD, the conventional thinking is that the problem of latency is no longer an issue since flash has no moving parts unlike a mechanical disk.
Once again, the assumption being that fragmentation is a physical layer issue.
How does this relate to a SAN populated with SSD - the most common application of flash in the data center?
The most misunderstood issue about fragmentation and SANs is the physical layer aspect. In a SAN environment, fragmentation has nothing to do with the physical media, whether that media is SSD or spindles. Rather, it’s actually an I/O overhead issue from a fragmented logical disk that inflates the IOPS requirement for any given workload regardless of the media used storage-side.
Let look at this more closely.
What’s unique about the Windows OS in a SAN environment is that it is abstracted from the physical layer. All physical layer management is left to the SAN itself. When a LUN or volume is presented to Windows, the OS sees a logical disk and controls how data is written to the logical disk software layer. The SAN, in turn, references the allocations within the logical disk then makes its own decision on how to physically manage and store the blocks.
SAN technologies have a come a long ways in their ability to combat the physical effects of fragmentation on either mechanical disks or SSDs. However, the performance penalty from fragmentation occurs entirely outside the SAN where it has no control. Fragmentation is inherent to the fabric of Windows. Even though the SAN might be mitigating the effects of fragmentation at the physical layer, it has no influence or control on how Windows fragments the logical layer. So what does this mean?
Since Windows is unaware of file sizes as it begins to write data, it takes a “one-size-fits-all” approach by writing out data in the next available logical address, even if the allocation is not a proper fit. When that allocation is full, the OS will split the file, fill another address then split again and repeat until the whole file is finally written. This results in increasingly smaller and smaller pieces over time as more data is written, modified and deleted.
How this becomes problematic from an I/O efficiency perspective is that every address or allocation within the logical layer requires its own dedicated I/O operation to process as either a read or write. If Windows sees a file existing as multiple pieces at the logical layer, that means the server and SAN device will execute multiple I/O operations to process the whole file. Ultimately, it doesn’t matter what a SAN does regarding data housekeeping on the backend, nor does it matter the kind of storage media being used – it has no control over fragmentation at the logical layer. If Windows sees a file in 20 pieces at the logical layer, the SAN will execute 20 separate I/O operations to process the whole file as a read or write.
As you can see, when it comes to a SAN environment, the fragmentation problem is not a physical layer problem at all, but rather an I/O overhead issue from the logical disk that inflates the IOPS requirement for any given workload from server to storage by overwhelming systems with increasingly smaller, fractured I/O, which impacts SAN SSD performance just as much as it might impact a SAN populated with spindles.
Unfortunately, IT professionals aren’t left with many options to address this problem. There are some that might try to use a traditional defragmentation process to restore a healthy relationship between I/O and data. However, there’s not a single SAN vendor who would recommend a defragmentation process on a live, production SAN as that creates more problems than it solves due to the negative effect of change block activity that occurs from “defragging” as it skews thin provisioning and triggers features like replication. For this reason, some admins will take on the laborious task of migrating data off a volume, take it offline, defrag and then bring it back. However, most admins simply do what they’ve always done, which is throw more spindles or flash at the problem to mask this I/O penalty with brute force.
Virtualized customers should also be aware of the additional performance penalty from the “I/O blender” effect which exacerbates performance problems even further as I/O is not just unnecessarily small and fractured, but also mixed and randomized with other VM workloads at the point of the hypervisor. By preventing fragmentation, the I/O requirement for any given workload is reduced which further reduces the amount of I/O per GB that gets randomized at the point of the hypervisor. It’s for this reason that virtualized customers are looking to I/O reduction software that includes an additional server-side caching engine to further reduce I/O to the SAN and further reduce the performance penalty from the “I/O blender” effect.
As much as nearly every assumption around fragmentation is about how blocks are physically stored, the problem is very different in a SAN environment. Fragmentation in a SAN environment has nothing to do with the physical layer at all, nor does it have anything to do with the SAN itself. However, it’s the SAN that incurs the penalty of a fragmented logical disk, since it now has to work much harder to process any given workload.
Just when administrators thought they had little to no viable options other than throwing more hardware at the problem, I/O optimization software can provide an easier and more economical path to improved performance while protecting existing CapEx investments.
Ultimately, the benefit is putting an end to any and all performance degradation to keep systems running like new. Most admins are simply unaware of how much performance has actually degraded due to this I/O inefficiency. Fragmentation dampens overall performance by 25% or more on I/O intensive applications. In more severe cases, it’s much more than that. In fact, for some organizations, it’s not just an issue of sluggish performance but reliability as they have to regularly reboot servers and have issues with certain data sets. When spending tens to hundreds of thousands of dollars on SAN storage systems and new flash systems, why would anyone want to give back 25% of the performance they paid for when it can be solved so easily and inexpensively?
As you can see, fragmentation in a SAN environment is a real problem. It’s not just a physical disk platter problem on the SAN itself.
The key thing to understand about fragmentation prevention is that it doesn’t touch the SAN at all. It’s an approach that adds a layer of intelligence to the Windows OS, helping it find proper allocations within the logical disk layer (instead of the very next available address regardless of size) so files are written in a contiguous manner, requiring minimal I/O. Although this happens outside the SAN, it’s the SAN that receives the most benefit as all the I/O overhead of small, fractured I/O is eliminated. This approach significantly improves throughput and reduces the dependency on IOPS since the relationship between data and I/O is no longer eroding at the logical layer.
Image courtesy of Shutterstock