How to reclaim thin provisioned space on IBM FlashSystem V840

It seems that the ability to reclaim deleted VMware space on thin provisioned LUNs is a topic that has been discussed for a long time, and in fact it has since it was introduced in vSphere 5.0.  This VMware link includes some background on the efforts VMware made to address the problem of thin provisioned volumes growing but never shrinking.

The status of this functionality for VMware today stands as described in this link, I can save you the trouble and summarize.  The process requires a manual command to be executed with a VMFS volume specified and a number of blocks to process per iteration.  VMware will create a flat file on the VMFS volume which zeroes out blocks that are no longer used, and then instruct the supported storage array via a VAAI API that those blocks can be reclaimed.  For an administrator this means starting a CLI command per VMFS data store and monitoring it for completion since “If the UNMAP operation is interrupted (for instance by pressing CTRL-C), a temporary file may be left on the root of a VMFS datastore.”  The key to the VMware procedure is that the storage array must support the VAAI UNMAP primitive, which IBM XIV does however several of the IBM storage products (IBM FlashSystem, Storwize, and SVC) do not.

I was working with a customer this week that was deploying their VMware environment onto storage volumes with IBM Real-time Compression enabled.  This has become a very popular option since Real-time Compression on IBM FlashSystem V840 can reduce certain virtualized data sets by up to 80% while still providing microsecond latency.  A compressed volume is by default thin provisioned, so this customer was concerned about how to reclaim deleted space since VMware UNMAP was unavailable.  Fortunately compressed volumes CAN reclaim deleted space for VMware (or other OS/apps) without the VMware UNMAP functionality.  Here is how:

Real-time Compression runs a periodic maintenance, and I will remain vague here as I have not yet spoken to developers to understand the details for how this process runs. This maintenance will clean up deleted blocks and reclaim space automatically assuming it can tell that the blocks are clear, aka zeroed out.

There are a couple of scenarios when an administrator may want to reclaim space.  First would be the changes within the guest VM file systems. Blocks can be zeroed out from guests using a tool such as Microsoft SDelete.  That will zero out the space within guests which may have been previously consumed and then deleted.  The second scenario would be changes at the VMFS level, for example moving VMs to/from volumes or creating/deleting VMs.  I will focus in on this second scenario.

So the first thing I did was provision a 4TB FlashSystem V840 compressed volume, added it as a VMFS data store, and used a script to provision many VMs from a Windows 2008 R2 base template.  After that was all done the stats for my datastore from VMware appeared as :

RTC Reclaim - VMware space allocated

4.00 TB capacity, 6.67 TB provisioned, 557.98GB free.

And from my V840 they appeared as:

RTC Reclaim - V840 space allocated

4TB capacity, 3.44 TB consumed before compression, 1.78TB consumed after compression.

So for this base template Real-time Compression reduced the capacity from 3.44TB to 1.78TB, or a 48% reduction.

Next I deleted all of the VMs from my datastore so that it was 100% free from a VMware viewpoint, but from a V840 view-point my usage did not change from the 1.78TB from the previous screenshot, which is expected as the V840 is not aware that those blocks are now empty.  So similar to the process of VMware UNMAP or the in guest SDelete referenced earlier, the VMFS datastore needs to be zeroed out.  One way to zero out the capacity is by provisioning an Eager Zeroed Thick (EZT) VMDK.

Since in this example the VMFS datastore is empty, my VMDK could be created to consume the majority of the capacity.  In a real environment I would probably provision the VMDK to consume all but 5% of the free capacity.

My next step was to create a 4TB VMDK:

RTC Reclaim - Create VMDK

Something to consider is that the FlashSystem V840 will not actually write zero blocks to the flash capacity.  Our controllers detect zeros and discard them.  Plus since we utilize VAAI block zero, the process for creating a large VMDK is off-loaded to the FlashSystem V840 were it executes extremely quickly.

Once my VMDK creation completed I deleted the new VM from the datastore, which resulted in my 4TB VMFS datastore being 100% free and zeroed out.  Then all I had to do was wait for Real-time Compression on the FlashSystem V840 to automatically perform the cleanup operation on the volume.  I actually went to bed for this part and upon logging on the next morning my volume had been reclaimed back down to near its original size:

RTC Reclaim - V840 reclaimed

 

This process could be easily scripted to perform over a weekend or downtime period, or simply used on an as needed basis.  Since I don’t have a production environment it is hard to get a feel for how frequently an administrator may want to perform this operation.  It will be heavily dependent upon the data change rate in your environment.  But in any case, I hope I have demonstrated how easy this process is to use.

Advertisements
This entry was posted in FlashSystem V840, Storwize V7000, VMware and tagged , , , , , , . Bookmark the permalink.

2 Responses to How to reclaim thin provisioned space on IBM FlashSystem V840

  1. MBH says:

    Alternatively, you could create a vdisk mirror of the volume (assuming you have enough free space), and the newly created vdisk will reclaim the space, since it will be forced to recalculate the zeroes and reduce the overall capacity. When mirroring is done, mark new as primary then delete the older image.

    This is useful if the storage admin is not the VMware admin, and the storage admin is responsible for the clean up jobs & housekeeping.

    Also, the method described in the blog is a bit risky because VMware recommends leaving 20-30% of each volume free for snapshots. If there are backup jobs running at night, they’ll need space to create snapshots of every VM. If you follow the blog method and make 1 large vDisk/VMDK, it’ll consume all space. You need to do this before backup time, and be sure it’ll finish before backup time.

    If at any time a VMware datastore (volume) is full, it’ll crash and halt all VMs running on it.

    vdisk mirror is a cleaner & safer approach, in my opinion.

    • rburbridge says:

      Yes the volume mirror function will also reclaim space on both thin and compressed volumes. The advantage to this method (assuming you’re using compression) is that it is a one step process that can be executed solely from VMware. If you use the volume mirror function you still have to zero out the VMFS prior to creating the mirror. The other thing to consider with the volume mirror is that the copy rate is limited on a mirror operation (even when maximized) so if there are many of these datastores to clean up than it could be a long process.

      As for free space, that’s a good point. The only recommendation I can find is that VMFS requires 200MB at a minimum for free space. There will be little growth with snapshots used for backups, but if you had persistent snapshots on VMs (never a good idea anyway) then you want to maintain a 15-20% buffer of free space. As long as VMs are not thin provisioned than the risk of running out of space is that a snapshot would grow and affect that single VM, but that won’t result in VMFS crashing and bringing down all VMs. A large VMDK can be created rather quickly because there is no data being written to the actual Storage Pool, it’s all processing done in the controllers and zeroes being discarded, and as soon as it is completed its deleted. I guess I don’t see the risk.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s