On The Road to Recovery
Negative impact on data center in case of calamity
Earthquake. Fire. Hurricane. Blackout. Virus. Terror attack. Any of these natural or manmade events can cause the obliteration of business data. And if that isn’t frightening enough, Sarbanes-Oxley, HIPAA and Securities and Exchange Commission requirements will really scare you. Face it, data needs to be kept alive and accounted for to survive and thrive in the testing conditions of modern-day business.
The challenge is to provide solutions that allow a business to continue to operate in the event of any number of calamities that can have a negative impact on a data center. The problem is not knowing which one it will be or how bad it will get—but each scenario involves the data center being wiped out or, if luck prevails, just being offline for a few days.
The two biggest challenges for disaster recovery in IT are the movement of data to the recovery site and the actual process of recovery. Now that virtualization technology is mainstream, and thanks to the emergence of new IT business models, such as Infrastructure as a Service (IaaS), some compelling new ways to approach the problem are becoming available.
Getting Warmer
The ideal scenario for most organizations is a hot site. A hot site is a near replica of the entire production environment at another data center, ideally several hundred miles away and in a different environment. Unfortunately, hot sites are extremely expensive and can increase production cost by as much as 250 percent. This large cost increase generally drives most companies away from implementing a true hot site.
The majority of companies end up with a cold site—a physical location held in reserve with a promise from the supplier that there will be an adequate number of computers waiting when they arrive. A cold site is a form of insurance. Businesses share the costs with other companies and hope that not everyone has a disaster at the same time. This makes cold sites a much more affordable option.
Clearly, neither of these options—the hot site or cold site—offers sufficient protection for today’s data-dependent businesses. The best available option, however, may be a combination of the two.
Odds are that you’ve already used virtualization technology in the data center to consolidate old servers or perform testing. Using the same concepts, it’s possible to create a warm site—a hybrid between a hot site and a cold site—at a secondary location. Using replication software like Doubletake, one normally needs a one-to-one server ratio in the data center and at the recovery site.
It All Adds Up
Caught on Tape
Many companies have a 36-hour recovery time objective (RTO) because they have determined that’s how much time it will take to get the backup tapes, fly to the recovery site, pull everything off of tape, test the systems and be back online. That is, if it works.
Oftentimes the hardware isn’t exactly the same as you use in production. Tape-based recovery isn’t perfect, either. It might contain a good copy of your data, but all of the information about the network -- IP addresses, registry configurations, patch levels — is frequently not on the tape. It takes a huge amount of time to reconstruct this information, even provided you have great (and up-to-date) documentation. |
With IaaS, instead of purchasing a stock of extra servers and a SAN, it’s possible to rent 60 processor cores, 2 Terabytes of storage and 64 GB of memory, and pay on a monthly or quarterly basis. Most IaaS vendors run VMware or a similar operating system that enables virtualization.
This operating system approach is the key to putting a shim between the hardware and your environment, allowing the hardware to scale, move around and be replicated. It also is what makes an IaaS provider different from a traditional service provider or hosting center
Suppose a company has approximately 100 physical servers in its environment. Forty of these are designated as Tier 1 mission critical. If these servers failed, so would the business.
In the old model, choices would be to go with either a hot, cold or warm site. Assuming the cost per server is $6,000 fully loaded, it would be approximately $240,000 worth of server equipment at the disaster recovery site to do a true hot site approach. Add to this the cost for co-location at $4,000 a month (including 5 kw of power/cooling load per cabinet) for two cabinets. This figure doesn’t include switches, routers, operating systems, replication software, human resources, bandwidth and recovery testing.
Given that the remote site is not using resources on the servers, it’s a perfect opportunity for consolidation. Conservatively, consolidate virtual servers to physical servers at a rate of 10-to-one because this is for disaster recovery, not production. That means you can use four physical servers instead of 40. It also means you are using 90 percent less of the co-location space. If the co-location center doesn’t charge in less than half-cabinet increments, this puts the cost at $1,000 per month. Now beef up the memory, adding 16 GB to each host. Keep in mind that the VMware also costs money. But even with these additional costs, savings will net more than $200,000 per year.
However, there are other costs involved, such as replication software. This can range dramatically in price, depending on features and vendors. Use an agent-based approach, and figure the cost per agent to be $2,500. Even though there are now only four servers at the recovery site, it’s still 40 virtual machines. That means there are 40 agents in production and 40 agents at the recovery site, for a total of 80 agents, amounting to $200,000 in software licenses.
Virtual Reality
SAN vendors and customers already know the secret here. They boot all of the servers from the SAN and use SAN replication software to transport data to the recovery site. However, there are two problems with this method. Depending on the SAN vendor, that replication software might be an additional replication license. It also might mean the purchase of one or more additional SANs for the recovery site. Not only does this amount to more of an expense, but the recovery picture gets complicated. With both approaches, recovering the actual data is possible, but the configuration information that’s so important is lost. Although better than tapes, it is still not a perfect system.
Here is where virtualization comes to the rescue once again. Whether using SAN-to-SAN replication or an agent-based approach, virtualize the production servers. This makes it possible to replicate at the VMware level instead of within Windows®.
Take the existing scenario of 40 servers. Say these are spread across four very robust physical servers. If a software replication product that runs on ESX is employed, you only need four licenses at the production site and a similar number at the remote site. This means even more significant cost savings, as you are replicating all of the patches, configuration data and permissions by sending fully bootable virtual machines instead of SQL data or an Exchange store.
Even though companies are virtualizing more production servers than ever, it still may not be ready to virtualize certain servers. A new class of replication agents is coming. These will enable the user to take a traditional, non-virtualized server and convert it into a virtual machine during the replication process. This way, the source production server stays the same, but users can gain all of the efficiencies of using virtual machines instead of physical servers at the recovery site
In the new model, recovery and testing work the same way. Since there are fully-replicated, bootable virtual servers at the recovery site, one simply needs to access them remotely and power them up.
In most cases, leave virtual machines in a heavily consolidated mode if they’re just being tested. During testing, users have opportunity to make sure they can view tables, access files or even mount a mail store and open a replicated e-mail inbox.
During an actual or simulated recovery, it might be necessary to spread virtual servers around on enough physical servers to provide the performance the production environment normally requires.
This is where IaaS can be a huge help. An IaaS provider already has many racks of servers and network capacity available on demand. The only thing that’s required is a small foothold in that environment that can replicate data. Expensive resources, such as processing power and memory, aren’t necessary. Since the IaaS vendor already has the capacity available, this process can generally be completed in a few hours or less. In a declaration or test, the provider will take one or two replication servers onto 20, 30 or 100 real physical servers, whatever is appropriate, to equal the resources that were there in production.
Bonus Features
Virtualization software features like VMware’s DRS will even allow users to move already booted (but slow) servers from the consolidated hardware onto the new hardware resources as they become available without any downtime.
As a bonus, push the development, staging or other tertiary applications out to run in production at the DR site on the DR equipment. That way, IaaS is can actually be considered a production investment. This makes the CFO happy.
An even bigger challenge is to recover people and business processes. Suppose a prolonged power outage in one facility forces a company to shift data to computers in another state. What about its employees? Instead of moving people into a workplace recovery center, with IaaS, all an employee needs is remote Internet access from a home office until the issue is resolved.
Welcome to the brave new world of disaster recovery. These new approaches go a long way toward making business continuity simpler, more affordable and more reliable than ever.