I was five minutes late to this session, so missed just a little information. Without making you read to the end… this session was possibly the best one I attended. Two presenters, Sheridan Kooyers, HP EVA Firmware Architect and Joseph Algieri, HP Master Solution Arechitect.
Warning: Extremely technical post! What follows is almost a stream of consciousness and might not be useful to anybody. It may only be useful as personal notes. Still, just in case…
EVA Continuous Access Considerations
- Exchange based routing must be disabled
- WAN Accelerators are not supported(yet) – they can actually decrease performance. However, they have started testing units from Riverbend and 3COM. (3COM isn’t surprising since HP bought them)
- IP Acceleration must be disabled unless you are at a supported XCS version (sorry, I didn’t get the version — I’ll update when the presentations become available next week)
Data Currency or RPO (Recovery Point Objective)
This is a measure of all committed IOs that were applied to the primary storage device at the time of a disaster but not to the secondary. Synchronous replication guarantees data currency of the remote copy as long as the replication link remains operational. This is RPO=0.
However, problems can occur if a disaster strikes during synchronization. Commonly known as “Rolling Disaster” which can result in destruction of data not only at the source EVA but also at the target EVA. (It is possible to prevent changes to the source data if replication stops — failsafe “ENABLE” on the EVA (most people will not enable this, or disable it the first time they experience it)
Basic Asynchronous Replication
This is the original version of asynchronous replication available to EVAs. The news here is that late this year or early next year HP will make available a code upgrade that will give the ability to choose between basic and enhanced asynchronous replication because HP found out that there are actually use cases for both.
Enhanced Asynchronous Replication
This is available on xcs v6.x and later, It tends to be better than basic asynch replication because it provides consistent server write IO performance independent of link speed and link latency. However, link sizing is still important to assure the solution’s RPO can be achieved.
- no host throttling even if WHL overflows (write history log)
- host IO will be throttled during normalization on pre-xcs v6.200 releases
- note – basic asynchronous replication is supported between EVAs running xcs v6.2x and vcs v3.x and v4.x
- crash-consistent – IOs are synch’d over in the same order they are written by application to the source EVA. Any disruption of the synchronization results in a copy that is only crash-consisten, so for example, your Oracle database may have to roll back incomplete transactions when it brings up the database on the target volume.
Not all supported configurations are recommended in all situations.
These best practices quite often have resulted in the best customer experience with the product, but they are NOT REQUIREMENTS. HP just wants you to understad them and why they are recommended. There are other configurations that don’t follow best practices, but they may work best for your circumstances.
ROLLING DISASTER PROTECTION
These occur when recovery from an initial failure has not completed before a second catastrophic failure occurs. The initial failure results in the replication software needing to perform an IO inconsistent resynchronization of the source and target volumes (replication link fails or WHL, Journal, PIT or Snapshot overflows)
- during this resynchronize the destination volumes do not contain IO consistent data
- the volumes are logically current and consistent only after the resynchronize completes
What is a rolling disaster?
- it’s not particular to CA solutions
- it can happen with any replciaion solution and must be planned for if the solution is to be considered complete (for DB replication it will require a re-seeding of the target database)
- if during the resynch a failure in the primary data center occurs, the data on the target volumes in the target DC is not usable
- to protect from this, before resynchronize is allowed to occur, a BC of the target volumes must be created (the target volumes are IO consistent at the point of the first failure so the BC will be IO consistent and usable if necessary)
- IF during the resynchronize the primary DC suffers a catastrophic failure, the BC made of the target volumes becomes the new usable data (data is IO consistent, but it is NOT current)
See Command View settings for “Suspend on link down” and “Suspend on log full” which can be used to prevent rolling disaster from occuring. Do the BC, snap, whatever on the target, and then resume the CA synchronization.
- we must ensure there is enough available space in the remote storage array to create a BC of the target volumes before the resynchronization is allowed to occur
- BC can be used to make a copy of the target volumes (EVA mirrorclone, snapclone, snapshot in certain versions)
(NORMALIZATION = OUT OF ORDER RESYNCH)
- Fully allocated snapshots or snapclones are recommended.
- Best practice for adding a new member to a DR group is to add it to the gruop BEFORE the new volume is used by the application
- If the new member already has data in it when it is added, it will have to be normalized and the target volumes will be IO inconsistent until the normalization completes. (if this is the case, you must protect from a rolling disaster before adding the new volume)
- During this normalization the target members of the DR group will be IO inconsistent and unusable in the event a CA fail-over is required (similar to a rolling disaster)
- by adding the new member to the DR group before it is used by the application, a normalization is not required and the DR group will continue to be IO consistent and usable even in the event a CA fail-over is needed. (The new member should be a new vdisk that has never been presented and written to by a host)
EXPANDING THE WHL
- expanding the WHL when in enhanced asynchronous mode
- the DR group must be placed into synchronous mode and the WHL must completely drain before the WHL can be reduced or expanded
- if the WHL expansion is planned, ensure both arrays have enough space avalable to support the new planned size
- HINT: since the solution will be running in sycn mode for this operation, it’s a good time to execute a planned fail-over for testing purposes.
- expanding WHL while in Enhanced Async mode
- while in synchronous mode host IO rates will drop precipitously
- there is a reason the solution has been configured for enhanced asynchronous mode
- most likely host IO performance over a small link with high latency will be poor so replicating in synchronous mode will be very painful
- during WHL drain in synchronous mode we allow one new io into the log for every two that are merged out
- once the WHL has drained the WHL log size can be changed (increase or decrease)
- consider suspending the application while waiting for the WHL to drain
BUSINESS COPY – Point in Time (PIT) copy capability
- 3 primary options snapshot, snapclone mirrorclone
- controlled from Command View, RSM or SSSU
- space efficient or fully-allocated.
- space efficient can fail if there’s not enough space in the diskgroup.
- fully allocated never fail because they grab as much space as needed up front.
- mirrorclone – fully copy of source vdisk, which maintains a relationship with the source that are always kept in sync. This can then be fractured and mounted by another server. Can be resynchronized later. Synchronization is done on only 1MB chunks that have bee modified.
- snapshot affects performance. but snapshots can be done against a mirrorclone, which then doesn’t affect app performance.
Typical uses for PIT copies include
- create online backups
- keep apps online while backing up data
Requirements to consider are:
- capacity efficiency of backups
- number of online backups
- frequency of creating backups
- should backup data be stored on a lower tier of disks?
- is full volume “instant restore” feature desired?
- if array runs out of space is “over commit” acceptable?
- performance considerations
Provides a PIT copy of the source LUN at the moment the snapshot is created. The source LUN can be either a regular source vdisk, or the mirrorclone of a source vdisk.
There are tw0 types of snapshot:
- 1. space efficient (aka vsnap or demand allocated)
- snapshot dynamically grows as it needs space
- this type of snapshot can fail (due to over-committed disk space) if the disk group runs out of space.
- not recommended for critical application backups
- a snapshot can be immediately presented to hosts
- snapshots DO NOT protect against a source vdisk fail. They only protect against accidental file deletion, virus attack or other corruption.
- CURRENTLY this even breaks mirrorclones… but that will change in a future release.
- snapclone is a FULL COPY of a source vdisk at PIT. these can survive source vdisk failures.
- can be created in a different disk group
- completely standalone.
- not space efficient
- can be any vraid type, not dependent on source vdisk raid type
- this is a vdisk that has no user data but has all metadata structures initialized and space allocated by the EVA
- primary purpose is to create a snapshot or snapclone very quickly
- created by storage admin in advance.
- cannot be presented to a host
- an existent vdisk can be converted into an empty container
- empty containers can help eliminate the “first write penalty”
Why use empty containers?
- A snapshot or snapclone created using an empty container can never go over-commit due to running out of space
- Snapclone/snapshot creation happens must faster because the metadata for the new snap is already initialized
- Some apps are very time sensitive. Fast snapshot/snapclone creation is improved when using an empty container.
MIRRORCLONE – Synchronous Local Replication
- a synchronous mirror copy of a source vidks that can be used to form a PIT copy or standa-alone visk
- can be in a different visk and any vraid type
- it’s a full copy and is kept insynch with source vdisk as soon as normalization completes.
- then it can be fractured to establish a PiT copy by ending replication of writes to the mirrorclone.
- can be presented for read/write access just like a snapshot or snapclone
- deltas are tracked in a bitmap (for writes to both source vidks and mirrorclone) – sync after the fracture is very efficient, only the chunks identified in this bitmap are re-synch’d from the source to target.
- can be detached, which breaks the mirrorclone relation and converts the mirrorclone to a standalone vdisk(can use this to mimic snapclone behavior)
Snapshots of Mirrorclone- Advantages
- can create cross disk group snapshots by putting mirrorclone into a different diskgroup
- protected from disk failures in source disk group
- snapshots of mirrorclones allocate space in the mirrorclone disk group
- snapshot read load is in the mirrorclone disk group
- rapid fire creation of backups in a separate disk group from the source vdisk. superior to snapclone in this respect. facilitates a very small recovery point objective RPO. reads to snapshots are against the fracturered mirrorclone, not the source vdisk.
- if a source vdisk becomes corrupted it can instantly be recovered from an online backup via the instant restoure feature
- provides the same effect as recovering from tape, but data isa vailable immediately instead of waiting hours for a restore.
- works by routing source vdisk ios through the online backup.
- to handle new writes to the source vidks, it uses Restore Before Write (RBW) tech that is similiar to copy-before-write but in reverse
- instant resourte also stars a background process that copies data from the online backup to the source vdiks
Online backups that can be used with Instant Restore include:
- Any snapshot of a source vdisk.
- Snapclone that is still normalizing
- Fracturered mirrorclones.
ANY SNAPSHOT OF A MIRRORCLONE
– use cases
1) Can fracture a mirrorclone and take a snapshot on a regular interval, say once an hour
- if DB corruption occurs, go back through snapshots and find the last snapshot without the corruption.
- instant restore the last good snapshot
- DB can be quiesced before each mirrorclone resync to get a transactional consistent copy of the database.
2) Customer can change mind and restore from a different online backup without waiting for the first instant restore to finish. This feature can help a customer experiment with different online backups to figure out which one is best.
MULTISNAP and MULTIMIRROR
- These are SSSU commands., used to create a write-order consistent set of snapshots/snapclones/factured mirrorclones across a set of vdisks with a single command
- Used when host app data spans multiple vdisks on an eva
- Up to 28 “atomic” snaps/ractured clones of different soure vdisks with XCS 9.5
- Allows a crash consistent copy of a DB to be created without having to quiescethe db before fracturing the mirrorclones or creating the snapshots or snapclones
- MULTISNAP and MULTIMIRROR are the only way to get an IO consistetn view of a group of Vdisks (up to 28) without first quiescing the DB
- Not available via CV or RSM, ONLY SSSU
- Up to 28 snapshots or snapclones can be created atomically with the MULTISNAP command
- Up to 28 mirrorclones can be created/fractured/resync’d atomically with the MULTIMIRROR command
- Create an IO consistent snaps and fracture mirrorclones without having to quiesce the db first
- To get a transactionally consistent copy you must quiesce the DB first or put it in online backup mode.
LATEST CA and BC EVA Functionality Changes
- this is found in XCS v9.5x releas for 4400/6400/8400
- CA and BC changes in XCS v9.5x
- allow CA failover during normalization
- autosuspend of DR gruop when marked for full copy – allow administrator a window to create a mirrorclone or snapshot of the CA targets before the full copy proceeds so we can protect from a rolling disaster
- preferred port algorithm changes
- persistent bitmap so that CA executes a “fast copy” normalization once a DR Group’s initial full copy normalization has completed (used on WHL overflow, used if admin invalidates the WHL (CV force copy), if controllers are swapped…
- support for LUN expand/shrink for DR group members – CA now will fully support DCM LUN grow and LUN shrink (not supported for large luns > 2TB)
- LUN Grow has been supported for awhile, LUN shrink capability is new
- server OS mus be able to work with LUNs that grow/srhink
- exchange based routing suport
- increase total capacity of a DR group from 32 to 80TB
- the DR group size includes all members of the DR group and the business copies of those members (snaps, clonse, mirrorclones)
- raid 6 vdisks
- perform a “delta resync” to destination when the source of a mirrorclone or snapshot of vdisk in a DR group is instant-restored to the source volume.
- perorm a fast copy instead of a full copy .
- put the existing bit-map into non-valotaile mmory so fast resync continues after a controller failover
- user selectable asynch (basic/enhanced) in xcs v10.x
LARGE LUN SUPPORT FUTURES
- Currently support for up to 32TB Vdisk
- With XCS 10.x, large lun expand/shrink/snapshot and thin-provisioned large LUNs
- Follow on the XCS v10.x release, large LN mirrorclone, snapclone, instant restore, CA and On-linLUN/RAID migration.
ONLINE LUN MIGRATION
- Relocates vdisks to another disk group transparently without significant impact to host IO.
- Feature uses a mirrorclone to copy data from source to destination disk group
- After operation there is an active mirrorclone in the source disk group and a new vdisk in the destination disk group
- Target LUN raid mode can be different from the source
- Same as online LUN migration except this just changes the RAID level of a vdisk without significant impact to host IOs
- As long as there is enough space available in the disk group, the user can change the raid level to another raid level supported by the target disk group
- Planned to be released with xcs v10.x, but no CA support, no thin provisioning support, no large LUN support, no BC support
- Space efficiently empty container
- Used to quickly create space-efficient snapshots, support for 3-phase creation, container has all metadata initialized, but does not allocate any space
And that’s all I have to say about that.