SAN Storage For Disaster Recovery Solutions
Posted on 31.Jan 2010 by Ray Heffer in SAN Storage, Virtualisation, VMware
It’s Monday morning and you arrive late at the office thanks to the trains being delayed yet again. At that particular moment in time as you grab your morning coffee, several hundred users have already logged in and started launching their email client, web applications, and a myriad of documents and spreadsheets. So far this sounds like any other morning, but what I didn’t mention was the fact that just 30 minutes before you arrived at the office, a water from a pipe in the ceiling started to leak into the rack containing your SAN’s disk array.
What a nightmare. Not only has the water managed to get into both SAN controllers, but it has caused the trip switch for that rack to shut off. But wait… not a single user has called to say they can’t access their applications or data. Thanks to storage mirroring between two SAN arrays in separate racks, the business has continued to operate and all of the servers are now communicating with your secondary array. Seamless.
Walking back to your desk with your morning coffee, your phone receives the first SMS message. Here it is, an alert from your monitoring system to say that the primary storage array is offline. “09:24 PRI-SAN01 offline, critical.”
At this point, it would certainly be pertinent to discuss best practice for data centre design, environment monitoring, and DR procedures. To achieve a solid DR solution for your infrastructure you must have the basics in place before anything else. This means your DR strategy has got to be reviewed on a regular basis, and business continuity planning must be in motion with all areas of the business. Without a solid continuity plan, your DR might not serve the actual needs of the business. The focus on this article is disaster recovery for your SAN rather than business continuity, but BCP must never be ignored. Lets rewind back to the implementation of a highly available SAN architecture, it’s far more interesting!
In 2005 I started to look at how storage mirroring can protect your data in this type of situation, and also provide you with ‘zero downtime’ maintenance windows for your SAN. Over the past few years, storage vendors have been implementing mirroring, thin provisioning, snapshots, and asynchronous replication for remote sites in entry level SAN solutions, not just the large enterprise offerings. Don’t be fooled into thinking that designing a highly available SAN architecture is limited to those with massive budgets. There are other solutions, such as SANmelody or SANsymphony by Datacore, that allow you to present your existing disk arrays or SAN to storage servers. It is far more cost effective than upgrading your entire SAN hardware, and you can even increase performance by using the storage servers RAM for your write cache.
Datacore SAN software is what I have been working with, in conjuction with EMC and HP SAN storage over the past few years. The main reason being that we can present storage from different SAN vendors, and create pooled storage that can then be partitioned up into virtual volumes (or LUN’s) for your application servers. On top of that we gain mirroring, thin provisioning, snapshots, and other features that our HP and EMC didn’t have without an expensive upgrade. Datacore are releasing SANSymphony-V in 2010, which I’ve had the pleasure of using in a technology preview recently. Datacore were talking about storage virtualisation back in 1999, so I’d certainly recommend you speak to them about what they can offer.
Lets familiarise ourselves with some key storage technologies:
Synchronous storage mirroring – When data is written to the primary array it is also written to the secondary array. Will require a high speed link between both arrays, such as fibre channel or iSCSI. This provides high availability for your SAN, but can double up on the storage cost in some situations.
Asynchronous mirroring – SAN replication to a DR site or remote office. Will replicate data in the background, using queuing, buffering and scheduling to the remote site. Typically used over WAN connections.
Snapshots – The ability to take a ‘point-in-time’ snapshot of your data. Very useful in a DR scenario, and for testing.
SAN Building Blocks for Disaster Recovery & High Availability
To set the scene I’ll use a typical IT infrastructure that you would find in most SME organisations. They have already implemented virtualisation for at least 50% of the server infrastructure, and have a midrange SAN from a well known vendor using fibre channel. SAN capacity is up to 8TB which contains a mix of virtual machine, database, and file store LUN’s. The majority of servers are running Microsoft Windows Server 2008, and some Linux servers for key network services.

The diagram shown here (1.0) is certainly simplified, but represents the core components of most SME infrastructures.
Using this example you’ll see that virtualisation is already in place having implemented VMware with High Availability, and additional high availability has been implemented with a Microsoft SQL database cluster. There is enough capacity to support a single host failure using VMware high availability, but there are still some physical application servers that are yet to be virtualised. Given this is a typical SME infrastructure, lets also imagine that the SAN has dual controllers, and it’s connected to a fibre fabric consisting of two core fibre channel switches (A and B). This is a very good situation to be in as we have most of the servers virtualised, SQL databases are stored on the SAN, in addition to file server storage for our shared drives.
Implementing on-site HA (High Availability) using synchronous mirroring, even to another building with a fibre link between the two, gives this environment an excellent level of resilience. However synchronous mirrors do have some pitfalls, mainly due to the cost as you need twice the amount of storage as the solution is split into two. One rack will contain a SAN with an 8TB array, and the other rack will contain another SAN with an 8TB array with mirroring between the two. You will then need to decide on the level of disk redundancy within each array as you could use a basic RAID0 stripe, given the fact you have mirroring between separate arrays. I personally prefer to stick with RAID5 arrays, even though they are mirrored between two arrays.
An asynchronous mirror is where true disaster recovery comes into play. By selecting key SAN Lun’s (or data volumes) to be replicated to a remote site you can specify which databases, virtual machines or file stores are part of the replication. This does introduce an extra layer of complexity though, which you don’t get with synchronous mirrors. First of all you need to have a suitable location / site for the destination SAN, unless you consider using a co-location service with an ISP. Depending on how much the replicated data changes, the link between these sites could be very busy so bandwidth is a consideration. That being said, a 20Mb private circuit between two sites around 40 miles away should be in a fairly realistic price bracket. If you are using a co-location provider, they should be able to provide this for you. As a rough estimate, I would say £10,000 to £20,000 per annum for a 20Mb link in the UK.
Adding further complexity to the asynchronous mirroring solution is what to do with the destination data in the event of a disaster (or DR test). When a SAN Lun is first presented to an application server, whether that is a VMware host, Windows or Linux host, it will need to write a disk signature to the disk (LUN). When using asynchronous mirrors, the destination LUN (at the DR site) will have exactly the same signature. In this case you must make sure the disk isn’t resignatured by the application servers at the remote site. VMware servers (ESX and vSphere) have an advanced option to disable resignaturing, whereas Windows servers shouldn’t cause an issue unless they are part of a cluster.
When testing your DR site with the replicated data, it is recommended that snapshots are used to take a ‘point in time’ snapshot of the destination volume. The snapshot volume is then presented to the application servers at the DR site, leaving the replication of live data to continue. Using asynchronous mirroring and snapshots provide the ability to carry our DR tests without impacting the live environment, so can be done during normal business hours in most cases.
Summary
Storage replication and snapshot technology certainly provide the key ingredients to form part of your DR solution, but there are still important factors to consider. Do you want high availability, replication to another site, or both? Does your existing SAN support these technologies, or should you consider an upgrade? Obviously your budget is going to be a major factor, and I’m not here to lecture you on ‘what would it cost if you actually had a disaster’, you can make that decision!
If you decide to adopt mirroring and snapshot technologies as part of your DR solution and you are already running a virtual infrastructure, then you are on your way to an excellent DR solution. There are some technical complexities you need to be aware of, but if you have a good knowledge in these areas they are only minor factors.


The scenario you described happened to my firm on Sunday 28 Feb. Although we had a spare SAN and some standalone servers, we were still in the process of preparing it as a disaster recovery solution. We therefore had to copy data from our backup server to the spare SAN. We survived with minimum disruption on Monday. As with all these things we are strengthening the speed of recovery solution and am considering asynchronous mirroring. Have you any suggestions for software solution to achieve this including snapshotting. Currently we have SQL databases and Linux filesystems. Together with this we have a GroupWise email system which backs up to a ‘GWAVA Reload server’
Hi Ian,
If you are looking for a software SAN solution that you can ‘bolt-on’ to your existing storage then I’d recommend looking at either Datacore or Lefhand Networks (now HP). Later this year, Datacore should be releasing SANsymphony-V which is a combination of their previous two products SANmelody and SANsymphony.
The Lefthand solution by HP worth looking at is the P4000 VSA (Virtual SAN Appliance). This product is fairly new to me, but given the choice of the two I am tempted to opt for the P4000. That is just based on my past experience of Datacore, but at the end of the day it might just come down to cost / scalability.
Check out their websites here:
Datacore
HP Lefthand P4000 VSA
Ray
Ray
I am still investigating a solution and have found Novell’s Platespin.
Any thoughts?
Many thanks
Ian
Hello Ian,
I have been using Platespin (now owned by Novell) for many years now, but not as a DR solution, but to migrate physical servers to virtual (P2V). Platespin Protect would be the best product for DR replication, but I think it can be quite expensive. Unless it has changed, it is licensed per workload (i.e per server that you want to replicate / protect).
It’s a perfectly viable option for DR, and well suited to virtual machines. In terms of technology it works in the same way a P2V migration does, except it leaves the secondary DR copy switched off ready for a DR scenario.
Certainly worth a further look if you don’t have SAN replication.
Ray
Hi,
We have GEO clustering in-place and I want to perform DR test.
Please advise, what wil lbe involve?
Do I have to change the settings in LUN, etc.
Appreciate your help.
Thanks
Dear Mr.Ian,
We are the reseller at Oman, One of our client is looking for the below solution. Can u please arrange to give price and availability;
SAN solution of 20TB upgradable to 200TB size in 3 yrs for disaster recovery.
Best regards,
S Sukumar
Atheer Technical Services LLC
Gsm-00968-99266525
UPDATE: DataCore haven’t released SANsymphony-V in 2010 and I have no idea when they are due to release it. It’s been nearly two years since I had a technology preview, but when I spoke to them at VMworld 2010 in Copenhagen last year they said they still plan to release it but much later than originally anticipated.