Backup and restore of an N-Tier application such as SharePoint can be a scary task for admins new to the product. This is compounded in large global companies with distributed infrastructure. Often the focus is on the tools and not the overall process, policy, operational costs and managing end user expectations.
From a business perspective, SharePoint still isn’t mission critical (linked to revenue) so funding to support a full backup and recovery solution is still beyond the reach for most. When I say solution I’m talking about staffing, tools, process, policy and data center footprint. As opposed to simple SQL backups and a 1-2 week recovery time.
So to begin designing a solution what questions should you ask? Here are a few to get you started:
- Whats is the revenue loss per hour?
- What is the risk to the brand?
- What is the productivity loss cost?
- What is the impact to customers? Service providers?
- What’s the Service Level Agreement? RTO/RPO
- What am I backing up? How much is there? How often does it change?
- Whats must I plan for if I’m backing up lists and libraries? Sites and Site collections? Servers and Farms?
- Do I have customizations? What are their specific requirements?
- Do I have other software on my servers? such as a document management client?
- Can we backup centrally? Will our network handle it?
- How do I test my ability to restore? Whats the % of data loss?
- What operational windows must I plan for? Work around?
- Can I slipstream the installation? SharePoint, SPs, customizations.
- What scenarios must I support? restore a farm, a server, site collection site, library or list.
- How do I test the rebuild steps for a farm? Server? Site collection and site? List and library?
I’m sure more questions come to mind but I hope you get the point.
The service level agreement should be your starting point for requirements and form the basis of your design and testing. If used correctly, it will provide your design decisions with an element of tracability back to the business requirements – help you justify your design come purchasing time and manage end user expectations.
When backing up SharePoint you are backing up several things:
- Server binaries
- SharePoint SQL databases
- Search Index
- Windows SharePoint Services sites
- Personal Sites
- Custom Web Parts
- Thirdparty add ons
Using the SharePoint 2007 backup tool is a good starting point for small sized companies (100GB, 200GB database, one farm?) – it will backup most everthing except IIS, custom web parts etc… For large companies (Multiple farms, 200GB + databases), there are a few options such as backing up the content databases in SQL Server directly (Coarse grained approach – there are risks taking this route but its probably the fastest!) or using a third party product such as those offered by www.comvault.com, www.quest.com and www.avepoint.com if you want a fine grained approach. Note that not all products will restore in place, some require a separate farm (call it a restore farm) to restore to such as Microsoft s DPM. I’ve been told this is a SharePoint 2007 limitation and am yet to get the details. For these tools keeping your production and restore farms in sync using change control is critical especially if you have customizations. The aforementioned products would be my first choice for large global organizations – though it’s costly to add more backup tools in a large enterprise I prefer to minimize risk and rebuild time. If speed is your goal the nightly backup of the databases to a standby farm is probably the best approach but costly – you should use recycle bin settings and versions as well to compliment this approach. If farm (and data center) recovery is the priority a secondary farm and SAN replication is probably your best approach.
So where are the risk areas? Managing to the SLA (Rebuild window – 4 hours? 10 hours? week?, data loss expected?) specific to when the service will be re-established after a failure. Your architecture has an impact on this as well as your backup/restore systems. For example, where do you cap site collection size? Are you backing up (restoring) over a WAN? Can your backup software restore the search database and index? Speed of backup and restore? All these elements impact recovery time and the cost involved to meet you SLA. Another example would be if your only backing up SQL Server, then your backup is limited to the content and SSP databases – you wont backup up the config (Because you have to rebuild and attach the databases manually), search databases and index (search database and index won’t be synchronized) – longer rebuild times. You will also have to think about how to recover customizations and custom web parts. For example, custom Web Parts should be deployed using WSP files.
Aside from backup there are other options to consider:
- Architect SharePoint based on SLA – group high value data seperate from low value data and tailor your backups accordingly
- Separation of concerns – seprate farms for Portal, Search and Collaboration or at least isolate using site collections and applications
- Having a standby farm that you could failover to while the production system is being rebuilt
- SAN attached backup and restore solution for speed purposes – get as close to the data as possible, eliminate network constraints
- SAN replication technology that would enable you to point the standby farm to the replicated data and fire up a farm very quickly
If your using SharePoints backup and restore tool the following are two links that will help you get started:
What ever route you take thorough testing of recovery must be performed on a regular basis to make sure your policy, process and tools are effective. Additionally, testing must be done using thorough “test cases” that verify data recovery and accuracy -should be tested twice a year and after major updates/changes.
Tests should include:
- Farm recovery (Soup to nuts…)
- Individual server recovery (WFE, Index, Search, SQL)
- Site collection(s)
- Sites and sub-sites
- Web Parts
- List/library items such as documents and pictures
- Configuration such as Workflows, Meta data, Content types and security
Also, remember you get what you pay for.