1.1 VCAP-DCA Study Guide - Implement and Manage Storage, part 2

Posted on 26 Jul 2011 by Ray Heffer

This is the second part to my VCAP-DCA study guide on storage (section 1.1 of the blueprint). As mentioned in part 1, I intend to focus these study notes on what you need to know with essential learning points. There is a big section on LUN masking using PSA related commands in this part of my study guide, and also an introduction to analysing performance stats with esxtop. Whilst esxtop is covered in many of the VCAP-DCA blueprint sections, 1.1 is the first section it is mentioned in regards to assessing storage performance, so at the very end of this post I have included the topic ‘Using ESXTOP for Storage Performance Analysis’.

Knowledge Required

  • Identify RAID levels
  • Identify supported HBA types
  • Identify virtual disk format types

Key Focus Areas

  • VMware DirectPath I/O (see part 1)
  • NPIV (see part 1)
  • Storage Best Practices (see part 1)
  • Raw Device Mapping (see part 1)
  • Storage filters
  • VMFS resignaturing
  • LUN masking using PSA-related commands
  • I/O workloads
  1. Configuration Examples and Troubleshooting for VMDirectPath
  2. Configuring VMDirectPath I/O pass-through devices on an ESX host
  3. How to Configure NPIV on VMware vSphere 4.0
  4. VMware Storage Best Practices
  5. Best Practices for Configuring Virtual Storage
  6. Performance Best Practices for VMware vSphere 4.0
  7. VMware Multipathing with the SAN Volume Controller and the Causes of SCSI-2 Reservation Conflicts
  8. Fibre Channel SAN Configuration Guide
  9. iSCSI SAN Configuration Guide
  10. Performance Characterization of VMFS and RDM Using a SAN
  11. VMware VMFS Volume Management
  12. ESX Configuration Guide (RDM, Storage Filters, Resignaturing)
  13. Masking a LUN from ESX and ESXi 4.x using the MASK_PATH plug-in
  14. vSphere Command-Line Interface Installation and Scripting Guide (good section on LUN masking)
  15. Identifying disks when working with VMware ESX
  16. Storage Workload Characterization and Consolidation in Virtualized Environments
  17. Using vscsiStats for Storage Performance Analysis
  18. VIFS reference document
  19. VMware KB 900: Moving or Copying Virtual Disks in a VMware Environment

Storage Filters

Storage filters are used by default with vCenter to avoid storage corruption by retrieving only the storage devices or LUN’s that can (or should) be used. VMware KB article 1010513 details how to switch off vCenter storage filters, and the ESX Configuration guide (page 133) also contains details on how to do this.

This explanation is taken from the aforementioned KB article:

This LUN filtering mechanism helps prevent LUN corruption that might occur if the following conditions are not met:

  • The same LUN cannot be used for a VMFS datastore and RDM simultaneously.
  • Two virtual machines cannot have access to the same LUN using two different RDM mapping files.

Follow these steps to switch off vCenter storage filters:

1) Within the vSphere client, go to Administration > vCenter Server Settings, then click on Advanced Settings.

2) Depending on the storage filter you want to add, enter one or both of the following values:

config.vpxd.filter.rdmFilter; false – This filters RDM’s that are already assigned to a VM.
config.vpxd.filter.vmfsFilter; false – This will filter LUN’s with an existing VMFS volume.

Duncan Epping also has an excellent blog post on storage filters, where he details two other filters that can be used:

config.vpxd.filter.hostRescanFilter – Disables the automatic rescan that occurs after a VMFS datastore is added.

config.vpxd.filter.SameHostAndTransportsFilter – This will filter LUN’s that do not have the same masking applied as the original VMFS volume. Also filters which LUN’s are available to add as an extent.

I actually have an excellent use case for the Host Rescan Filter because I experienced some horrendous problems a few years ago where this saved the day. I had a VI 3.5 cluster which was hosted externally and there was a fault with the SAN. Each time a datastore was added, which incidentally was weekly due to DR testing, it caused the database LUN to go down and an externally hosted web application went offline. Not good. By adding config.vpxd.filter.hostRescanFilter to the vCenter advanced settings, it prevented an automatic rescan on all of the hosts when a datastore is added.

VMFS Resignaturing

The topic of VMFS volume resignaturing will usually occur when you have taken a LUN snapshot, for example a DR site may have a replicated LUN, and you need to present an existing VMFS volume to an ESX host (this is done by taking a snapshot of the replicated LUN). This changed since VI 3.5 and it was made easier with ESX/ESXi 4 as described in VMware KB article 1011387. I would recommend that you read page 120 of the ESX Configuration guide, as this contains more information on resignaturing a VMFS volume.

For the VCAP-DCA lab you should understand what to do when presented with a replica VMFS LUN as you’ll have three options:

  1. Keep the existing datastore (do not change the signature)
  2. Assign a new signature
  3. Format the disk (this will create a new VMFS volume)

You need to learn how to list volumes, resignature or mount without resignaturing which is detailed in KB article 1011387. Here are a summary of commands you need to know: esxcfg-volume -l – This will list all volumes detected as snapshots (replica LUN’s)

Mount the volume without resignaturing the LUN (take the VMFS UUID or label from the output of esxcfg-volume -l): esxcfg-volume -m <VMFS UUID|label> (Use -M to keep the LUN mounted after host reboot)

Mount the volume and resignature the LUN: esxcfg-volume -r <VMFS UUID|label>

LUN Masking and PSA Commands

LUN masking is usually performed on the storage array (remember that in the VCP exam?), NetApp uses Initiator Groups, EMC Clariion uses Storage Groups, but you can also perform masking on the ESX/ESXi host using esxcli corestorage claimrule. Read page 57 of the vSphere Command-Line Interface Installation and Scripting Guide, this is listed in the key learning materials section above (14) and VMware KB article 1009449 also describes masking a LUN from ESX and ESXi 4.x using the MASK_PATH plug-in.

This is a long topic and a complicated process, especially if you plan to perform this on multiple ESX hosts. I’m not sure if this needs to be performed in the VCAP-DCA exam, but I would strongly recommend doing this a few times in your home lab (if you have one).

Two things you should do first:

1) Check what multipath plugins are currently installed on your ESX/ESXi host with: esxcfg-mpath -G

You should see NMP and MASK_PATH

2) Check what the next available rule ID is, using: esxcli corestorage claimrule list

The claimrules for MASK_PATH will start at 101, so as you can see from my screenshot the next rule will be 102.

3) Add the mask rule with:

esxcli corestorage claimrule add –rule <number> -t location –A <hba_adapter> -C <channel> -T <target> -L <LUN> -P MASK_PATH

Example: esxcli corestorage claimrule add --rule 102 -t location vmhba35 -C 0 -T 0 -L 10 -P MASK_PATH

To get the location of the path you want to mask, I use esxcfg-scsidevs –vmfs to find the NAA device and then use esxcfg-mpath -L: to get the paths. Choose the path you want to mask.

List the datastores and get the NAA device of the datastore:

esxcfg-scsidevs --vmfs

List the paths for the datastore and that will give you the location:

esxcfg-mpath -L | grep <naa_device>

4) Verify that the rule has been added successfully:

Note: You’ll see the new rule listed under class type ‘File’.

esxcli corestorage claimrule list

5) Load the claimrules, and list them to see that the new rule is in runtime.

esxcli corestorage claimrule load

esxcli corestorage claimrule list

6) Next we need to disassociate our path from the PSA plug-in which is owned by NMP, this will associate it with MASK_PATH.

esxcli corestorage claiming reclaim -d <naa.ID>

7) Verify that our device is not used by the host, and the LUN is not active.

esxcfg-mpath -L | grep <naa.ID>

esxcfg-scsidevs --vmfs (the device should no longer be listed).

I/O Workloads

This is the final subject on the Implement and Manage Storage section, but measuring I/O workloads is an important part of administering a vSphere environment, not just a topic you should learn just for the VCAP-DCA exam. A great place to start is vscsiStats, a tool that is available on ESX and ESXi (prior to ESXi 4.1 this had to be downloaded to the host). vscsiStats measures I/O on the VMDK (virtual machine SCSI disk) so the underlying storage architecture can be NFS, Fibre Channel, or iSCSI. ESXTOP is another obvious choice for measuring performance, in particular IO workloads but it lacks somewhat as it only provides latency for F/C and iSCSI (not NFS). See the community document listed under my key materials above (17) for further info on vscsiStats.

Note: Section 6.4 of the VCAP-DCA blueprint also has ESXTOP and vscsiStats listed as the skills required for troubleshooting storage performance and connectivity. I feel these are best introduced here and then re-visited again when you start study for 6.4.

Using vscsiStats

vscsiStats will measure:

  • *I/O Length (size)
  • Seek distance (Understand disk seek time)
  • Outstanding I/O’s (IO queues)
  • Latency (ms) *Most useful statistic
  • Interarrival (Measures the time of read/write IO requests and the arrival in Microseconds to the virtual disk)

Using vscsiStats is really easy, and involves the following steps:

1) List the available virtual machines with vscsiStats -l

vscsiStats -l

Note: Find the virtual machine you want to collect statistics for and make a note (or copy) the worldGroupID number.

2) Next, start vscsiStats and specifiy the worldGroupID:

vscsiStats -s -w <worldGroupID>

3) Use vscsiStats -p to specify the metric and print the statistics:

vscsiStats -p latency (type vscsiStats on it’s own to see the help and other options)

Using ESXTOP for Storage Performance Analysis

ESXTOP is a tool based on the ‘top’ command in Linux, but is specifically geared towards ESX/ESXi performance metrics. It’s all over the VCAP-DCA blueprint so you must be comfortable using esxtop for the exam or you might be in trouble. The best place to start is, well… running esxtop! Seriously it can be a bit daunting at first, but try and use it in a lab environment and understand what the stats are telling you.

Here are the top things you should know about esxtop:

  1. Interactive mode (default)
  2. Custom configuration – saving your configuration to a file (W to save c to load)
  3. Batch mode (esxtop -b filename)
  4. Identifying I/O latency

esxtop Configuration Files

Once you are familiar with esxtop, you need to know how to save your configuration file using W. You might want to do this because you’ve changed the order of some columns or added some additional fields. The default filename is esxtop41rc (just press ENTER to save the default) and this will load each time at default, otherwise you can press c and load your own configuration.

Interactive mode (default)

When you run esxtop, by default it’s running in interactive mode. Press h (for help) and you’ll see the help screen (below). I often refer to this to remind myself how to switch from cpu (default) to virutal machine disk stats by pressing v.

Batch Mode (exporting to CSV)

The first thing you need to do is configure esxtop in interactive mode to display the fields and columns you want to export. Once you’ve done this save your configuration, following the steps above, and then use the -b option for batch mode and specify a csv filename to use. We will also specify the a delay (-d) in seconds between the samples taken, and a number of iterations (-n) otherwise esxtop will run until you press CTRL & C. See the following:

esxtop -b -d 5-n 20 > filename.csv (this will sample every 5 seconds, 20 times)

Identify I/O Latency (Fibre channel & iSCSI only)

This is one of my favorite sections and uses of esxtop, it really shows how powerful it is. Run esxtop in interactive mode then press u to switch to ‘disk device’ and press f to change the fields so you can see the following metrics (turn off F and G).

  • DAVG/cmd – Device average latency (should be under 25, preferable 15-20)
  • KAVG/cmd – Kernel average latency (should be less than 2, or zero)
  • GAVG/cmd – Guest average latency, this is the sum of DAVG and KAVG (should be less than 25)
  • QAVG/cmd – Queue average latency (should be zero)

The VMware communities document Interpreting esxtop Statistics has more information on this under section 4.2, as does Duncan Epping’s esxtop page. Simon Greaves has recently posted on esxtop. Finally, read VMware KB article 1008205 which summarises the steps to monitor I/O latency with esxtop.

This is part 2 of the VCAP-DCA Study Guide – Storage, go back to part 1 for:

  • VMware DirectPath I/O
  • NPIV
  • Storage Best Practices
  • Raw Device Mapping