VMware vSphere Metro Storage Cluster (vMSC) is not a product but a certified configuration where a vSphere cluster spans geographical locations. This could be spread across a campus, metropolitan area or a larger area up to 200 km apart.
vMSC was introduced with vSphere 5.0.
It relies on a stretched storage solution such a NetApp MetroCluster and a stretched layer 2 VLAN. The storage must be treated as a single storage solution that spans both sites. The storage is synchronously replicated between the sites so that both sites are always in sync and there is zero data loss in the event of a failure. The storage solution must allow the datastores/LUNs to be access from either location.
This brings the functionality of a local VMware vSphere cluster to hosts spread across two locations so that VMware HA, DRS and vMotion can be performed across the sites as if all the hosts were local. But is this a better solution than SRM? They are different solutions aimed at resolving different problems. vMSC is targeted at disaster avoidance whereas SRM is targeted at disaster recovery.
vMSC achieves disaster avoidance by allowing you to move workloads off failing components without outages.
SRM achieves disaster recovery by automating recovery plans to bring workloads back online in a controlled manor following a disaster.
SRM can be used in a planned migration to move workloads from one site to another site, for example when maintenance is required at the primary site; however an outage is always required to move the workloads with SRM. vMSC can restart workloads at the secondary site when the other site fails using VMware HA but there is little control over the order the failed workloads restart.
This is a good document and case study based on NetApp MetroCluster using iSCSI http://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STOR-CLSTR-USLET-102-HI-RES.pdf
It states that a single vCenter is required.
“In a stretched-cluster environment, only a single vCenter Server instance is used. This is different from a traditional VMware Site Recovery Manager configuration, in which a dual vCenter Server configuration is required. ”
It also recommends using DRS “should” affinity rules to control which VMs are running out of which site. Have a look at the single host failure scenario and note that HA may start the VMs from the failed host on hosts in the other site but DRS will then vMotion them back. It is best to have the VMs running in the correct datacentre as the LUNs are only accessed from one site in a NetApp MetroCluster as it uses a uniform configuration. So when the VMs are restarted in a different site to where the datastore is being accessed from all I/O has to go across the inter-site connects increasing latency.
Also note that if the storage fails in one site then this document states that with NetApp MetroCluster it is a manual process to initiate a takeover to the other site. Also note that VMware HA will only attempt to restart VMs for 30 minutes so if it takes longer than that for the Storage Administrator to initiate the takeover then the VMs will manually need to be restarted.
You can set VMs a restart priority of High, Medium or Low but this is not as comprehensive as the control you have of the restart order with SRM.
This VMware Knowledge Base article (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2031038) mentions a MetroCluster TieBreaker that runs on a OnCommand Unified Manager host at a 3rd site which will monitor the MetroCluster availability and automatically issue the failover commands when there is a site failure.
This Technical White Paper (http://www.vmware.com/files/pdf/techpaper/Stretched_Clusters_and_VMware_vCenter_Site_Recovery_Manage_USLTR_Regalix.pdf) states that SRM is incompatible with stretched clusters.