{"id":139,"date":"2015-12-21T00:00:00","date_gmt":"2013-01-01T00:00:00","guid":{"rendered":"https:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations"},"modified":"2026-03-18T21:52:18","modified_gmt":"2026-03-18T21:52:18","slug":"performance-tuning-backup-restore-operations","status":"publish","type":"post","link":"https:\/\/sqlhammer.com\/index.php\/2015\/12\/21\/performance-tuning-backup-restore-operations\/","title":{"rendered":"Performance tuning backup and restore operations &#8211; SQL Hammer"},"content":{"rendered":"<p>The standard backup strategy seems to always include backups running at midnight, hoping that they complete before anyone starts using the applications. There is value in performing maintenance on off-hours. However, many applications do not have off-hours or\u00a0our databases grow in size and they stop fitting in static maintenance windows, or even into a whole day.<\/p>\n<p>Here you will learn some performance tuning strategies to keep your backup and restores under control.<\/p>\n<h2>Tuning priorities<\/h2>\n<p>It is important to remind and stress that tuning your backup is a secondary priority. The primary priority is tuning your restore. When it comes time to restore a database, every second that it processes is impacting your recovery time objective (RTO). You RTO likely impacts your company\u2019s revenue, whether directly or indirectly.<\/p>\n<p>By tuning your restores, you will establish strategies which also are effective with your backups. Once your restores are tuned to meet your RTO and RPO (recovery point objective), then you should look back at the backups and evaluate how they mingle with your system resources and work loads. At this point you can concern yourself with you backup\u2019s duration and resource usage.<\/p>\n<h2>Evaluate what is happening now<\/h2>\n<p>Every <a href=\"https:\/\/en.wikipedia.org\/wiki\/SMART_criteria\">SMART goal<\/a> must be specific and measurable. In order to tune your backup and restore processes, you will need to first understand the existing configuration and articulate your pain point(s).<\/p>\n<ul>\n<li>Are your daily FULL backups taking nearly 24 hours?<\/li>\n<\/ul>\n<ul>\n<li>Do you experience application response times increasing during backup operations?<\/li>\n<\/ul>\n<ul>\n<li>Does your restore not meet your RTO?<\/li>\n<\/ul>\n<ul>\n<li>Do you overwhelm your network, memory, or storage subsystem?<\/li>\n<\/ul>\n<h3>Infrastructure<\/h3>\n<p>Infrastructure is usually the area which is least likely to be changed. Compared to changing BACKUP command options, increasing the number of IOPS you achieve seems like an epic undertaking. Even though I will be focusing on non-infrastructure tuning techniques in this post, it is important to understand your existing setup and highlight the under performing areas.<\/p>\n<p>At a previous job, I had a particular system owner complain to me that he would receive application timeouts every time I performed a backup with the BUFFERCOUNTS option set above 32 (<em>more on buffer counts later<\/em>). The conflict brewed from that fact that these, multi-terabyte, databases took over 24 hours to do a FULL backup. I tried to open up the throttle and, when I did, it caused application issues. The reason for this was because the servers had a single gigabit NIC each and 90%+ of the bandwidth was being used almost all day and night. When I increased the backup through-put, it pushed more over the network and the NIC would saturate, causing application timeouts (very low timeout configurations).<\/p>\n<p>There were two problems to solve in my\u00a0story. One problem was that my recovery strategy could not meet any reasonable RTO. The other problem was that we were experiencing network constraints. To solve the first, I needed to backup and restore much faster, which included increased through-put. To solve the second, we needed either to reduce the chattiness of the application or slap some more, or faster, NICs into the servers to support the bandwidth requirements.<\/p>\n<p>In addition to network constraints, disk IO can be a severe bottleneck. When considering your storage through-put limitations, understand your through-put and IOPS but also think about potential hot spotting. If you are using shared or centralized storage for your backup destination, you might encounter slow downs because of concurrent backup operations. Maybe you have a dedicated backup SAN and all servers backup to it. Then you stick to the common strategy of performing all backups off-hours, let us say midnight-6am. In that case, you are going to stress the SAN to its limits for 1\/4th of the day and then let it rest for the other 3\/4ths. By spreading your backups through-out the day (assuming that\u00a0it does not negatively impact your applications) you could eliminate a storage bottleneck. Smaller numbers of backups would be able to fully utilize the SAN\u2019s through-put, at any given moment, rather than sharing it with hundreds or thousands of concurrent processes.<\/p>\n<p>Finally, do not forget about your memory. To backup or restore a database you have to load data pages into memory. We will talk more about memory below and how the internal buffer pool comes into play\u00a0and can cause operating system paging or\u00a0<em>out of memory<\/em> conditions.<\/p>\n<h3>Commands<\/h3>\n<p>When understanding your current process, you will need to inspect the backup commands, and restore commands if you pre-script them. These same factors that you inspect will be the ones that you tune for performance gains.<\/p>\n<p>We care about\u2026<\/p>\n<h4>Number of backup devices<\/h4>\n<p>The backup devices are the files that you specify in the command. This cannot be influenced by any factors external to the <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/ms186865.aspx\">BACKUP<\/a>\u00a0command.<\/p>\n<pre><code class=\"language-\">BACKUP DATABASE BackMeUp TO\nDISK = 'C:\\MyFile_1.bak',\nDISK = 'C:\\MyFile_2.bak'<\/code><\/pre>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-files-1\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-files-1.png\" alt=\"\" \/><\/a><\/p>\n<p>In this command, there are two\u00a0backup devices. In theory, this will\u00a0make your backup run twice as fast because you\u2019ve increased your thread count from one to two. I say, in theory, because the entire backup process is a stream. A bottle neck at any point in the process will slow everything down. For example, this command should write to the disk faster because of the two threads running the writes, however, the number of threads performing reads depends upon the number of database devices (file groups).<\/p>\n<h4>Number of database devices<\/h4>\n<p>The database device count is equal to the number of logical drives that the database files reside on. With multiple file groups, you could have any number (<a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/ms143432.aspx\">32,767 actually, but who would do that?<\/a>) of database devices. By increasing the number of database devices, along with the number of logical devices in the BACKUP command, you come closer to a fourfold performance gain.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-files-2\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-files-2.png\" alt=\"\" \/><\/a><\/p>\n<p>While the log file is a third file,\u00a0you need to add\u00a0data file groups because the log file does not get backed up at the same time as the data files. Rather, it is backed up at the end, after the data files have been backed up.<\/p>\n<h4>COMPRESSION\u00a0setting<\/h4>\n<p>The <a href=\"https:\/\/msdn.microsoft.com\/en-us\/library\/bb964719.aspx\">COMPRESSION<\/a> option of the BACKUP command was not available in standard edition until SQL Server 2008 R2. If you are using 2008 R2 or above, or are using enterprise edition, I always recommend using it.<\/p>\n<pre><code class=\"language-\">BACKUP DATABASE BackMeUp TO\nDISK = 'C:\\MyFile_1.bak'\nWITH COMPRESSION<\/code><\/pre>\n<p>At first glance, compressing your backup might seem like a space conservation feature. Well, it is. However, the write performance gain is equally, or more, important.<\/p>\n<p>By compressing your backups, you increase your CPU usage a bit in exchange for transferring fewer bytes over the network, writing fewer bytes to disk when backing up, and then reading \/ transferring fewer bytes back to the server when restoring. Like all things, you should test and verify that you can afford the CPU hit but I have never been in a situation where the performance gain was not worth the CPU cost.<\/p>\n<p>Exact performance gain will vary based on how compressible the data is.\u00a0My experience has been that you can expect a 20-50% improvement when using non-encrypted data.<\/p>\n<p>If you want to go the distance\u2026<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/Go-the-Distance-disney-300x168.jpg\" alt=\"\" \/><\/p>\n<p>There are some systems which could benefit from following <a href=\"http:\/\/www.brentozar.com\/archive\/2010\/02\/how-to-really-compress-your-sql-server-backups\/\">StackOverflow\u2019s lead<\/a>\u00a0and achieve even smaller, faster, backups.<\/p>\n<h4>BUFFERCOUNT and MAXTRANSFERSIZE\u00a0settings<\/h4>\n<p>Here is where things get fun. The BUFFERCOUNT and MAXTRANSFERSIZE options of the BACKUP command are used to regulate the amount of memory being used in the backup or restore process. By default, these operations use very little memory and that limitation is often a bottleneck. It is also, however, a protection. It prevents out of memory exceptions, which is easy to cause when manipulating these options.<\/p>\n<p>In this section I will show you how to see what your current commands are using and then,\u00a0in a later section, I will discuss how to find the right configuration for your database.<\/p>\n<pre><code class=\"language-\">DBCC TRACEON (3604)\nDBCC TRACEON (3213)\n\n<p>BACKUP DATABASE BackMeUp TO DISK = 'C:\\MyFile_1.bak';<\/code><\/pre>\n<\/p>\n<p>Above, I began my script with a couple of trace flags. The first, trace flag 3604, is used to print the output of a trace to the client. In this case, to the messages tab in the SSMS query window. The second is trace flag 3213, which will output information regarding backup \/ restore throughput and configuration.<\/p>\n<h5>The output<\/h5>\n<ul>\n<li>YELLOW: The\u00a0<em>memory limit<\/em>\u00a0is the\u00a0amount of internal buffer pool memory\u00a0that is available for this backup \/ restore process. When you use more than this amount of memory, you are going to encounter out of memory exceptions.<\/li>\n<\/ul>\n<ul>\n<li>PURPLE: The\u00a0<em>buffercount<\/em> is the number of buffers that your process will be using. By default, this number is a calculated variable. The equation will not be covered in this post.<\/li>\n<\/ul>\n<ul>\n<li>BLUE: The\u00a0<em>maxtransfersize<\/em> is the maximum amount of data which will be processed, per buffer.<\/li>\n<\/ul>\n<ul>\n<li>ORANGE: The\u00a0<em>total buffer space<\/em> is calculated as (<em>buffercount <\/em> maxtransfersize<em>). This is the number which you do not want to exceed the\u00a0<\/em>memory limit<em>.<\/li>\n<\/ul>\n<ul>\n<li>GREEN: Elapsed time and average throughput.<\/li>\n<\/ul>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/backup-output-1\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/12\/backup-output-1.png\" alt=\"\" \/><\/a><\/p>\n<h5>Errors you could see during an out of memory condition<\/h5>\n<p>Msg 3013, Level 16, State 1, Line 5<\/p>\n<p>RESTORE DATABASE is terminating abnormally.<\/p>\n<p>Msg 701, Level 17, State 123, Line 5<\/p>\n<p>There is insufficient system memory in resource pool \u2018internal\u2019 to run this query.<\/p>\n<p>or\u2026<\/p>\n<p>Msg 3013, Level 16, State 1, Line 5<\/p>\n<p>BACKUP DATABASE is terminating abnormally.<\/p>\n<p>Msg 802, Level 17, State 2, Line 5<\/p>\n<p>Error: 802, Severity: 17, State: 2. (Params:). The error is printed in terse mode because there was error during formatting. Tracing, ETW, notifications etc are skipped.<\/p>\n<h2>Tuning your commands<\/h2>\n<p>We have covered the important elements of your backup \/ restore process. We have also covered how to inspect your system and understand what is happening. Now you need to decide what changes to implement.\u00a0As you make changes, it is important to test them to verify that they are not causing performance degradation with your applications or out of memory conditions.<\/p>\n<p>I will demonstrate various alterations to the BACKUP command\u00a0using a database\u00a0that you have seen a couple of times already,\u00a0<\/em>BackMeUp<em>. This database is approximately 4 GB and only has one file group. Our baseline, <\/em>see above backup command output<em>, is approximately 45 seconds elapsed time, with about 80 MB\/sec.<\/p>\n<h3>Number of backup devices<\/h3>\n<pre><code class=\"language-\">BACKUP DATABASE BackMeUp TO\nDISK = 'C:\\Backups\\MyFile_1.bak',\nDISK = 'C:\\Backups\\MyFile_2.bak'<\/code><\/pre>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-output-2\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-output-2.png\" alt=\"\" \/><\/a><\/p>\n<p>By adding a second logical device the elapsed time reduced from 45 sec to 30 sec and throughput increased from 80 MB\/sec to 120 MB\/sec. This approximately a 33% increase in backup speed.<\/p>\n<p>I would also like to point out that the buffer counts increased to 12, giving us nearly a 70% increase in our memory usage. Which\u00a0demonstrates the variable nature of the default buffer count setting and explains that the 33% performance improvement is only partially attributed\u00a0to the parallel writes.<\/p>\n<h3>Number of database devices<\/h3>\n<p>To analyze the performance increase from having more database devices, I have added a new file group and divided the data between the groups.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-files-3\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-files-3.png\" alt=\"\" \/><\/a><\/p>\n<pre><code class=\"language-\">BACKUP DATABASE BackMeUp TO\nDISK = 'C:\\Backups\\MyFile_1.bak'<\/code><\/pre>\n<p><\/em>NOTE: We are\u00a0evaluating these alterations individually. That is why only one logical device is used in this command.<em><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-output-3\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-output-3.png\" alt=\"\" \/><\/a><\/p>\n<p>Similar to adding a new logical device, adding a new database device reduced the elapsed time to 33 sec and increased the throughput to 110 MB\/sec. This works out to be approximately a 27% performance improvement.<\/p>\n<h3>Compression<\/h3>\n<pre><code class=\"language-\">BACKUP DATABASE BackMeUp TO\nDISK = 'C:\\Backups\\MyFile_1.bak'<\/code><\/pre>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-output-4\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-output-4.png\" alt=\"\" \/><\/a><\/p>\n<p>It seems that there is a trend here. About 29 sec elapsed time with 124 MB\/sec resulting in a 36% improvement. So far, compression is the best performing tactic that we have covered. Let\u2019s not forget that this option is dependent upon how compressible the data in the database is.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/ymmv\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/YMMV.png\" alt=\"\" \/><\/a><\/p>\n<h3>BUFFERCOUNT and MAXTRANSFERSIZE<\/h3>\n<p>When\u00a0adjusting the buffer counts and maximum transfer size, there will be a lot of trial and error. I recommend playing with these by backing up TO DISK = \u2018NUL\u2019. This is a backup device that does not actually write any data. It will give you misleading elapsed time and throughput readings because there is no IO subsystem to slow it down but you can use it to find the sweet spot where your memory usage is not too high and your throughput is highest.<\/p>\n<p>With my demonstration configuration, this command produced the memory\u00a0sweet spot, resulting is a 52% performance improvement.<\/p>\n<pre><code class=\"language-\">BACKUP DATABASE BackMeUp TO\nDISK = 'C:\\Backups\\MyFile_1.bak'\nWITH BUFFERCOUNT = 1024, MAXTRANSFERSIZE = 2097152;<\/code><\/pre>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-output-5\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-output-5.png\" alt=\"\" \/><\/a><\/p>\n<h3>Using them all together<\/h3>\n<p>Finally, I will bring all of the tactics together into one operation for\u00a0a 66% performance improvement!<\/p>\n<p><\/em>NOTE: I\u00a0was forced to reduce the buffer count because adding a second file destination increased my total memory usage and it caused an out of memory exception.*<\/p>\n<pre><code class=\"language-\">BACKUP DATABASE BackMeUp TO\nDISK = 'C:\\Backups\\MyFile_1.bak',\nDISK = 'C:\\Backups\\MyFile_2.bak'\nWITH COMPRESSION,\nBUFFERCOUNT = 575, MAXTRANSFERSIZE = 2097152<\/code><\/pre>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/performance-tuning-backup-restore-operations\/backup-output-6\/\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/11\/backup-output-6.png\" alt=\"\" \/><\/a><\/p>\n<p>I would like to point out the orange block\u00a0above. The new file group I made has the same row counts in the tables as the original file group. However, the data size is skewed. If the new file group I made for the demo was truly half of the total data pages, the results would have turned out even better. I chose not to fix this because it demonstrates a real world scenario. Your file groups are not likely to be balanced but you should keep that in mind when tuning your backup and restore processes so that you understand why you are getting better or worse\u00a0results than you expect.<\/p>\n<h4>References<\/h4>\n<p>[1] <a href=\"http:\/\/sql-articles.com\/articles\/dba\/how-much-memory-is-needed-taken-for-my-database-backup\/\">How much memory is needed \/ taken for my database backup?<\/a><\/p>\n<p>[2] <a href=\"http:\/\/www.sqlskills.com\/blogs\/kimberly\/backup-resources-where-oh-where-can-they-be\/\">A collection of great backup \/ restore references<\/a><\/p>\n<p>[3] <a href=\"http:\/\/download.microsoft.com\/download\/d\/9\/4\/d948f981-926e-40fa-a026-5bfcf076d9b9\/Technical%20Case%20Study-Backup%20VLDB%20Over%20Network_Final.docx\">Very large database backup performance case study (Word Doc)<\/a><\/p>\n<p>[4] <a href=\"http:\/\/blogs.msdn.com\/b\/psssql\/archive\/2008\/02\/06\/how-it-works-how-does-sql-server-backup-and-restore-select-transfer-sizes.aspx\">How does SQL Server Backup and Restore select transfer sizes<\/a><\/p>\n<p>[5] <a href=\"https:\/\/technet.microsoft.com\/en-us\/magazine\/2009.07.sqlbackup.aspx\">Understanding SQL Server Backups<\/a><\/p>\n<p>[6] <a href=\"http:\/\/www.brentozar.com\/archive\/2010\/02\/how-to-really-compress-your-sql-server-backups\/\">Brent Ozar \u2013 Taking compression to the next level<\/a><\/p>\n<hr \/>\n<p><img decoding=\"async\" src=\"\/wp-content\/uploads\/hammer-derik-172x140.png\" alt=\"\" \/><strong>Written by Derik Hammer of <a href=\"http:\/\/www.sqlhammer.com\/\">SQL Hammer<\/a><\/strong><\/p>\n<p>Derik is a data professional focusing on Microsoft SQL Server. His passion focuses around <a href=\"http:\/\/www.sqlhammer.com\/blog\/category\/high-availability\/\">high-availability<\/a>, <a href=\"http:\/\/www.sqlhammer.com\/blog\/category\/general\/disaster-recovery\/\">disaster recovery<\/a>, continuous integration, and automated maintenance. his experience has spanned long-term database administration, consulting, and\u00a0entrepreneurial ventures.<\/p>\n<p>Derik gives the SQL community credit for plugging the gaps in his knowledge when he was a junior DBA and, now that his skills have matured, started <a href=\"http:\/\/www.sqlhammer.com\/\">SQLHammer.com<\/a> as one small way to give back and continue the cycle of shared learning.<\/p>\n<p>Derik is the owner and lead author of SQL Hammer, a <a href=\"http:\/\/www.sqlhammer.com\/\">Microsoft SQL Server resource<\/a>.<\/p>\n<p>For more information, visit <a href=\"http:\/\/www.sqlhammer.com\/\">http:\/\/www.sqlhammer.com<\/a>. Follow Derik on <a href=\"http:\/\/twitter.com\/SQLHammer\">Twitter<\/a> for <strong>SQL tips and chat<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The standard backup strategy seems to always include backups running at midnight, hoping that they complete before anyone starts using the applications. There is value in performing maintenance on off-hours. However, many applications do not have off-hours or\u00a0our databases grow in size and they stop fitting in static maintenance windows, or even into a whole [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":578,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-139","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/139","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/comments?post=139"}],"version-history":[{"count":1,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/139\/revisions"}],"predecessor-version":[{"id":401,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/139\/revisions\/401"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/media\/578"}],"wp:attachment":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/media?parent=139"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/categories?post=139"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/tags?post=139"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}