Warning – SVC Stretched Cluster with vSphere 5.5

A new feature was introduced in vSphere 5.5 called Permanent Device Loss (PDL) AutoRemoval.  This KB outlines the feature, what it does and why.  The basic idea behind it is to auto remove devices from an ESXi host when the device no longer returns SCSI sense codes.  It helps to keep things clean and tidy.

When first hearing of this feature I thought there may be implications for vMSC environments (SVC stretched cluster running VMware) and this morning Duncan Epping from Yellow-Bricks confirmed this.

The recommendation is to disable PDL AutoRemoval for SVC Stretched clusters.  If devices are auto removed it will cause undesirable behavior (the volumes will not be accessible from a VMware perspective without manual intervention) when access is restored.

Posted in VMware | Leave a comment

Need a storage solution for 120,000 Exchange 2013 users? Read on…

The phrase “we need storage for 120,000 Exchange 2013 users” most likely causes most storage administrators to shudder, in some cases faint, and in extreme cases run away.  To be fair, Microsoft has made significant steps in reducing the storage performance requirements of Exchange with each release.  Exchange 2013 has reduced IOPs by 50% in comparison to Exchange 2010 for example.  But regardless of the efficiency of Exchange 2013, 120,000 users is still a large number of mailboxes to manage in terms of capacity, resiliency, and performance.

In case you are not familiar with Microsoft ESRP, here’s a little back-ground.  Microsoft Exchange Solution Reviewed Program (ESRP) is a program designed by Microsoft to facilitate third parties with the storage testing and validation of solutions for Exchange.  IBM was an early participant in this program and has since been very active.  The ESRP submissions are evaluated by Microsoft and include log files so customers can review the performance themselves.

One of the things we try to do when putting together an ESRP solution is to utilize configurations which make sense to customers.  We ask what type of configuration a customer is likely to deploy (or could deploy with the storage) and use that in the solution.  The tests are not indicative or intended to be benchmarks for the storage systems, but instead demonstrate the storage configuration that can be used for a given Exchange workload.

Late last week the Exchange 2013 ESRP for IBM XIV Gen3.2 was published.  It can be accessed here.  The XIV is unique in its methods of provisioning storage, “simple” is the single best word I can use to describe it.  When you are designing a storage solution for 120,000 Exchange users “simple” is a welcome adjective.

I will leave the details as to why XIV makes sense for Exchange to the white paper but a few highlights of the solution which was tested:

-          120,000 mailboxes

-          2 GB per mailbox

-          .16 IOPs per mailbox

-          IBM XIV Gen3.2

-          360x 4TB disk drives

Posted in VMware | Leave a comment

Benefits of Atomic Test and Set (ATS) with IBM Storwize family

As the regular author of this blog I wanted to provide a short introduction for a guest contributor   Over the past few months I have been transitioning to new challenges so my day to day work on VMware and IBM storage has become limited.  A new technical expert has taken on the role of author VMware papers and best practices and he has graciously agreed to write about some of his recent work.  Thank you Jeremy Canady for the great contributions. – Rawley Burbridge

At this point, vSphere Storage APIs for Array Integration (VAAI) is old hat for most VMware administrators. In fact this blog has already written about VAAI in a previous post and white paper back in 2011. Sense that time there has been multiple releases of vSphere and support for various VAAI primitives continues to increase among storage systems for both the Block and NAS primitives. With the code release for the Storwize family we decided to revisit the topic using current vSphere versions and the new release.

I have no intention of diving into all the details of VAAI as there are many very good resources online as well as the full white paper. What I would like to highlight in this post is the testing and results of the Atomic Test and Test (ATS) primitive.

Atomic Test and Set

As you are likely aware, VMFS is a clustered file system that allows multiple hosts to simultaneously access the data located on it. The key to a clustered file system is handling conflicting requests from multiple hosts. VMFS utilizes file level locking to prevent conflicting access to the same file. The locking information is stored in the VMFS metadata which also must be protected from conflicting access. To provide locking for the metadata, VMFS originally only relied upon a SCSI reservation for the full VMFS volume. The SCSI reservation locks the complete LUN and prevents access from other hosts. This results in actions such as snapshot creation or virtual machine power on temporarily locking the complete VMFS datastore. This locking mechanism poses performance and scalability issues.

Atomic Test and Set (ATS) provides an alternative to the SCSI reservation method of locking. Instead of locking the complete LUN with a SCSI reservation, ATS allows the host to lock a single sector that contains the metadata it would like to update. This results in only conflicting access being prevented.

With the update to the VAAI paper we wanted to show the actual benefits of ATS with the Storwize family. To do so, two tests were devised, one with artificially generated worst case locking and one a with more of a real world configuration. The setup consisted of two IBM Flex System x240 compute nodes running ESXi 5.1 sharing a single VMFS datastore provided via FC from a Storwize V7000. Each test was run with the locking load and ATS disabled, without the locking load, and with the locking load and ATS enabled.

Simulated Workload

To generate worst case locking we needed a way to quickly generate metadata updates on the VMFS file system. To do this a simple bash script was created that continually executed the touch command upon a file located on the VMFS datastore. The touch command updates the time stamp of the file resulting in a temporary metadata lock each time it is ran.

To measure the impact, a VMDK located on the shared VMFS datastore was attached to a VM that had IOMeter loaded. A simple 4 KB sequential read workload was placed on the VMDK and measured for each run. The results can be seen below.


As you can see without ATS there is a severe impact upon the disk performance. When enabling ATS the performance increased by over 400% and equaled the performance when no locking workload was applied. This is to be expected as the artificially generated locking should never conflict with accessing the VMDK when ATS is enabled.

To measure the severity of the locking esxtop was utilized to monitor and measure the number of conflicts per second. As you can see, ATS all but eliminated the conflicts.


Real World Backup Snapshot Workload

Backup processes need to be quick and have the smallest impact possible. In a VMware environment most backup processes require a snapshot of the virtual machine to be created and deleted. Snapshot creation and deletion require metadata updates and when scaled out can cause issues on heavily populated datastores.

To test the effects of ATS upon large scale snapshot creation and deletion, a VMDK was located on a VMFS volume that contained ten additional virtual machines. The VMDK was attached to a VM running Iometer with a simple 4KB sequentail workload applied to the VMDK. A script was created to continually create and remove snapshots from the ten running VMs. The resulting Iometer performance was monitored while the snapshot workload was applied. The results can be seen below.


As you can see there was a sizable impact on performance with ATS disabled. Additionally notice that the performance with ATS enabled is nearly identical to the performance when no snapshot workload was running.

To measure the severity of the locking esxtop was utilized to measure the conflicts per second. As you can see below, the number of conflicts was significantly reduced.


I hope these simple tests show the huge improvement that ATS provides to the scaling of a single VMFS datastore.

A complete white paper on this topic as well as the other supported VAAI primitives is available here.

Posted in VMware | Leave a comment

Using Microsoft Windows Server 2012 thin provisioning space reclamation on IBM XIV

Earlier last month the IBM XIV 11.2 microcode was announced.  One of the features for this code is support for the T10 SCSI UNMAP standard.  The benefit of SCSI UNMAP is that it solves a problem commonly associated with thin provisioned volumes.

Typical behavior of thin provisioned volumes is that as data is generated and space consumed, the thin provisioned volume in turn consumes more physical capacity.  If data is removed from the thin provisioned volume (for example a virtual machine migration) that space is freed on the file system, however the physical capacity on the storage system remains consumed.  This puts your physical storage capacity in a state of perpetual growth.  There have been solutions (usually manual), that consisted of running tools like SDelete and then mirroring the volume.

Enter SCSI UNMAP which solves this problem by issuing SCSI commands to the physical storage when blocks are freed.  In simple functional terms, if I free up space on my file system by for example migrating a virtual machine, then that space would also be reclaimed from the physical capacity on the storage system.  More details on SCSI UNMAP is available in this Microsoft document.

SCSI UNMAP is a standard so the support built into XIV should enable other applications to take advantage of it in the future.  For example VMware released SCSI UNMAP before announcing issues, eventually this functionality should be available.

Microsoft Windows Server 2012 included thin provisioning support of SCSI UNMAP upon its release and is the key application supported by XIV in the 11.2 code.  IBM has released a comprehensive white paper which discusses this feature in greater detail and also discusses implementation considerations.  The white paper is a good read for anyone but particularly those that are running or will be running Windows Server 2012 on IBM XIV.

Microsoft Windows Server 2012 thin provisioning space reclamation using the IBM XIV Storage System

Posted in VMware | Leave a comment

Using the Storage Tier Advisor Tool (STAT) to gauge the effectiveness of Easy Tier

One of the valuable features included in the Storwize V7000 storage system at no extra cost is Easy Tier.  Easy Tier continuously monitors data access patterns and will automatically migrate high activity “hot spots” from Hard Disk Drives to Solid State Drives.  Easy Tier is a key feature for helping customers increase the cost effectiveness of SSD drives.

IBM developed a utility called Storage Tier Advisor Tool or STAT, to interpret historical usage data from DS8K, SVC, and Storwize V7000 systems.  When Easy Tier is enabled on a storage pool, historical data is recorded, regardless of whether SSD disks are present or not.  Simply put, STAT reads the historical data files and outputs information that is helpful in determining if there is value to adding SSD storage to a system.  I thought I would take a moment to document how to setup and use the STAT tool.

The first step is to download and install the STAT tool.  It can be found at the following URL: http://www-01.ibm.com/support/docview.wss?uid=ssg1S4000935 Edit this URL is no longer valid.  The download can be obtained by searching for Storwize V7000 on the IBM Fix Central website.  (http://www-933.ibm.com/support/fixcentral/)

The next step is to enable Easy Tier monitoring on an existing or new storage pool.  This is done from the CLI with command:

IBM_2076:ISV7K7:admin>svctask chmdiskgrp -easytier on “storagepoolname”

 Easy Tier will capture historical performance data over a 24 hour period.  The output file can then be downloaded from the support section of the V7000 as shown below.

Once the output file is downloaded and placed in the STAT directory (Program Files\IBM\STAT), the STAT utility can be ran to generate the reports.

The STAT tool will generate a couple of html pages within the STAT\Data_Files\ directory (System Summary and heat_distribute) which can be used to determine the affect of adding SSD drives to a storage pool.  As you can see from my text example, 12% of the data was determined to be hot, and by migrating that data to SSD storage I would realize a 60-80% performance improvement.

System Summary

Heat Distribution

In case you are curious this storage pool is housing VMware View linked clone disks.  The virtual desktops have been running a simulated office worker workload.

Posted in VMware | Leave a comment

IBM Flex System Manager Comparison video

In my last post  I provided a short introduction and overview for the new IBM Flex System V7000 Storage Node.  I did not go into great detail about the IBM Flex or PureFlex platforms as there is much more useful information (than I could provide) available already.

I did want to share this video though as it provides a good overview of the Flex System Manager interface, and how it compares to a competitive offering.  If you look closely, and go back and watch any of the Flex launch videos, you can see how the interface has already been updated.

Posted in VMware | Leave a comment

Introducing the IBM Flex System V7000 Storage Node

Those who follow the IBM announcements may find this old news, but this morning one of the many IBM announcements was the new IBM Flex System V7000 Storage Node.  You read all about the product here.

Along with the IBM Storwize V3700 which we also announced within the last two weeks, it has been a busy quarter for those of us who work primarily with (what I unofficially call) “SAN Volume Controller and the Storwize Disk Family”.

The IBM Flex System V7000 Storage Node adds further integration to the Flex/PureFlex offerings by bringing the V7000 storage controllers into the Flex chassis.  The Flex System V7000 Storage Node will come with version 6.4 of the SVC/Storwize disk family software.  All features and functions available in Storwize V7000 are also available in the Flex System V7000 Storage Node.  In my opinion the major difference right now comes in the way of further integration of connectivity.   Here is a snippet from the announcement which describes this:

Host connectivity: Host connectivity to compute nodes is provided through optional Flex System V7000 control enclosure network cards that connect to the Flex System Enterprise Chassis midplane and its switch modules. Available options are:

  • 10Gb Converged Network Adapter (CNA) 2 Port Daughter Card for FCoE and iSCSI fabric connections
  • 8Gb Fibre Channel (FC) 4 Port Daughter Card for Fibre Channel fabric connections.

Both cards must be installed in pairs (one card per node canister) and the following configurations are supported:

  • Two or four 10Gb CNA cards
  • Two 8Gb FC cards
  • Two 10Gb CNA cards and two 8Gb FC cards

This is really cool because the IBM Flex System V7000 Storage Node connects directly to the Flex System Chassis midplane and the switches which are installed…no cabling necessary.

Look out on other IBMer blogs today for more information about this announcement.


Posted in VMware | 1 Comment