Friday, July 13, 2012

A backup strategy for MS SQL Server

In this post I want to talk about the basics of a backup plan for SQL Server databases.The question is often, "What do I need to keep and for how long when I backup my SQL Server databases?"

The long answer is it depends on how large the database is, what type of database, how large are the maintenance windows, etc. However, for most databases you will want to minimize your risk for losing data. The bottom line is that you need to know how the different types of backups in SQL Server work. I'll try to explain this.

First some general thoughts. You should always backup to a disk that is different from the one the thing (database, log file, etc) that you are backing up. A backup to tape or another backup of the file system is advisable as well. This way in the event of a fire, natural disaster, etc you will have at least something to work with. Please note that you cannot backup the actual database file in the system, you have to use SQL Server 

The most important thing you can do is a periodic full backup of your database. It is best to do your system databases nightly since they are very quick. This will give you your jobs, etc. You should do your user databases during a time of low usage because this can take a while. Typically you would do a full backup once a week.  If that is the only thing you do you will lose a week of data if your database becomes corrupted, accidentally deleted, hardware failure, etc. In most cases this is not good enough, so we need to reduce the window of risk.

To reduce the window of risk, we could do a full backup everyday, but typically there is not a maintenance window that allows the time or resources to do this. So, this means we need a faster way to backup. This can be done by doing a differential backup in SQL Server. It is typical to do a differential backup each day of the week. This means that each time you do a differential backup you will be backing all the changes to the database since the last FULL backup. This also means that if you are doing your full backs weekly that the day before your full backup your differential backup will potentially be quite large because each day after the full backup the differential backup keeps getting larger and larger each day. The size grows because each differential includes the previous differentials data as well.

With the weekly full backup and the daily differential backs running we still have at most a day of data we could lose. We also have the problem that our log file itself will grow indefinitely (assuming full or bulk-logged recovery mode for the database is selected). Once solution is to periodically backup and truncate the log file manually after a full backup. This is not an optimal solution if you have many databases. The highly recommended method for solving that problem AND decreasing your window of risk is to do hourly log file backups. This action will save only the changes since the last log file backup which is one hour of changes in this case. The log file backup is extremely fast in most cases. The more often you run it the faster it is (for the most part). So at the end of the day you will have 24 log file backs and your log file will be truncated as well (pending a bunch of conditions that are outside the scope of this entry).

Now that we have a weekly full backup, a daily differential backup, and an hourly log backup our window of risk has been reduced to 1 hour max. This is great, but we still don't know what we need to keep and for how long in order to make a restore of our database. The answer to this pretty simple if you remember the following:

  1. FULL BACKUPS: You need to keep a full backup until you have made another full backup AND verified it.
  2. DIFFERENTIAL BACKUPS: You only need to keep the latest differential backup and it can be deleted when you do a full backup. You don't need differential backup files if you have log backups from the time of the full backup to the time the differential backup was last done. This means that you don't really need to ever do a differential backup if you keep all your log backups between each full backup. This actually recommended by some DBAs because it really simplifies the backup strategy.
  3. LOG FILE BACKUPS: You need to keep log files for the time period that you don't have a differential or full backup. For instance, if you have a full backup and your last backup, you only need the log backups that occurred since the last differential or full backup. This would mean that you don't need to at least 24 hours of log backups if you do a differential backup every 24 hours.
So, as you can see there is no definite answer, but it is easy enough to figure out given your backup strategy.

One thing that was not obvious the first time I scheduled backups is that it is probably best to do a differential backup EVERYDAY including the day you do the full backup or keep more than 24 hours of log backups. Why? Consider, you do a differential backup on everyday except Sunday because that is when you do you full backup. Full backups take longer and thus I usually schedule them earlier in the day than I do for the differential. This difference in time creates a period where we are not protected and unable to restore using our log backups. The additional risk is that let's say your full backup on Sunday starts at 10am and your next differential backup is on Monday at 6pm. There are more than 24 hours between the two backups, but we are only keeping 24 hours of log backups. This means the log files between 10am and 6pm will be deleted. Since we are missing those log backups , it makes the rest of the log backups not very useful either. That means that between Sunday 10am and Monday 6pm we are not longer getting the benefit of the log backups and instead have opened our window of risk back to 1 day and on a day that we are doing extra maintenance most likely. This is not a good scenario. The simplest solution seems to do the differential backup on Sunday as well (as the full backup we did that morning). The backup will be fast and small and we will be safe. The other option would be to keep 32 hours of log backups to cover that period. The downside is that we will have to keep an extra 8 hours of log files throughout the week or make a special change just for that day of the week, and that is just messy.

I hope this helps. If you are looking for a backup implementation that is both quick and easy to use, I recommend reading my other entry on Implementing a backup strategy in MS SQL Server.

No comments: