{"id":101,"date":"2015-02-04T00:00:00","date_gmt":"2013-01-01T00:00:00","guid":{"rendered":"https:\/\/www.sqlhammer.com\/blog\/optimizing-for-unknown-and-nulls"},"modified":"2026-03-18T21:51:34","modified_gmt":"2026-03-18T21:51:34","slug":"optimizing-for-unknown-and-nulls","status":"publish","type":"post","link":"https:\/\/sqlhammer.com\/index.php\/2015\/02\/04\/optimizing-for-unknown-and-nulls\/","title":{"rendered":"Optimizing for unknown and NULLs"},"content":{"rendered":"<p>The other day I had an interesting conversation about something that sounds counter-intuitive. If NULL means unknown or missing, then does OPTIMIZE FOR UNKNOWN mean to optimize for NULL?<\/p>\n<p><strong>Let\u2019s take a step back for a moment and put this into context\u2026<\/strong><\/p>\n<p>I have found that stored procedures written for reporting purposes have a tendency to generate poor execution plans. This is because most reports include filters which, depending upon the input parameters, will produce result sets that differ greatly.\u00a0Physical operators such as the Nested Loop operator perform horribly with large data-sets. A 2 second execution could easily become 15 minutes or more by having a poor execution plan. Interesting enough, the reserve is not as true. When a Hash Match operator is selected for a small set, the procedure will not be optimal but the relative affect is typically less severe. For this reason, I have found myself, on occasion, disabling parameter sniffing with the <a href=\"http:\/\/blogs.msdn.com\/b\/conor_cunningham_msft\/archive\/2009\/01\/16\/optimize-for-unknown-how-could-that-possibly-make-any-sense.aspx\">OPTIMIZE FOR<\/a> query hint.<\/p>\n<p>> &#8220;<code><br \/>\n> <query_hint > ::=<br \/>\n> {\u00a0<br \/>\n>   ...<br \/>\n>   | OPTIMIZE FOR ( @variable_name { UNKNOWN | = literal_constant } [ , ...n ] )<br \/>\n>   | OPTIMIZE FOR UNKNOWN<br \/>\n> <\/code>&#8220;<\/p>\n<p>One method is to OPTIMIZE FOR a variable being set to a particular literal. If you have an isActive BIT column on a table and 90% of the records have isActive = 1, then you could OPTIMIZE FOR (@isActive = 1). When @isActive = 0, the plan will not be ideal but it is likely going to be more acceptable then pulling a poor cached plan.<\/p>\n<h3><strong>OPTIMIZE FOR @variable_name\u00a0demo<\/strong><\/h3>\n<p>Here I\u2019ve taken the Sales.Customer table in the <a href=\"http:\/\/msftdbprodsamples.codeplex.com\/releases\/view\/93587\">AdventureWorks2012<\/a>\u00a0and added a few million records with TerritoryID = 1. Then we can look at the\u00a0execution plans.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Optimize-for-unknown.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Optimize-for-unknown.jpg\" alt=\"\" \/><\/a><\/p>\n<p>The [IX_Customer_TerritoryID] histogram shows the large difference in row counts between TerritoryID = 1 and TerritoryID = 2.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Optimize-for-unknown-dbcc.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Optimize-for-unknown-dbcc.jpg\" alt=\"\" \/><\/a><\/p>\n<p>The first execution plan chooses a Merge Join and estimates about 900,000 records.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Optimize-for-unknown-query1.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Optimize-for-unknown-query1.jpg\" alt=\"\" \/><\/a><\/p>\n<p>The second plan chooses to use the Hash Match operator because of its estimate of over 7 million records. Both queries return the same results.<\/p>\n<p><img decoding=\"async\" src=\"\/wp-content\/uploads\/Optimize-for-unknown-query2.jpg\" alt=\"\" \/><\/p>\n<h3><strong>OPTIMIZE FOR UNKNOWN demo<\/strong><\/h3>\n<p>The other method is to OPTIMIZE FOR UNKNOWN. Finally, we\u2019ve caught up to the topic of this post. For this demo we will filter on PersonID\u00a0because it has NULL values.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/02\/Optimize-for-unknown-dbcc2.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/02\/Optimize-for-unknown-dbcc2.jpg\" alt=\"\" \/><\/a><\/p>\n<p>Remember that our question is whether OPTIMIZE FOR UNKNOWN means that it will\u00a0optimize for NULL, since NULL means unknown or missing. If it does, we would expect our results to match what we saw in the first demo, because the number of NULL rows is much larger than the number of rows where PersonID = 869.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/02\/Optimize-for-unknown-query5.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/02\/Optimize-for-unknown-query5.jpg\" alt=\"\" \/><\/a><\/p>\n<p>In this first execution plan we see that the optimizer chose a Nested Loop operator because there were only 1766 rows estimated.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/02\/Optimize-for-unknown-query3.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/02\/Optimize-for-unknown-query3.jpg\" alt=\"\" \/><\/a><\/p>\n<p>In the second query, we see that the estimation did not change. The reason for this is also the answer to our question. The OPTIMIZE FOR UNKNOWN query hint is referring to a different unknown than is implied with a NULL. It is optimizing for an unknown parameter value. It is not optimizing for records where the column value is unknown (NULL). In order to optimize for the unknown, the optimizer is basing the estimation off of the column density as it is recorded in the statistics. This is calculated by multiplying the total row count by the density vector, 7225260 * 5.230126E-05, and you get the estimated rows in the result set.<\/p>\n<p><img decoding=\"async\" src=\"\/wp-content\/uploads\/Optimize-for-unknown-query4.jpg\" alt=\"\" \/><\/p>\n<hr \/>\n<p><img decoding=\"async\" src=\"\/wp-content\/uploads\/hammer-derik-172x140.png\" alt=\"\" \/><strong>Written by Derik Hammer of <a href=\"http:\/\/www.sqlhammer.com\/\">SQL Hammer<\/a><\/strong><\/p>\n<p>Derik is a data professional focusing on Microsoft SQL Server. His passion focuses around <a href=\"http:\/\/www.sqlhammer.com\/blog\/category\/high-availability\/\">high-availability<\/a>, <a href=\"http:\/\/www.sqlhammer.com\/blog\/category\/general\/disaster-recovery\/\">disaster recovery<\/a>, continuous integration, and automated maintenance. his experience has spanned long-term database administration, consulting, and\u00a0entrepreneurial ventures.<\/p>\n<p>Derik gives the SQL community credit for plugging the gaps in his knowledge when he was a junior DBA and, now that his skills have matured, started <a href=\"http:\/\/www.sqlhammer.com\/\">SQLHammer.com<\/a> as one small way to give back and continue the cycle of shared learning.<\/p>\n<p>Derik is the owner and lead author of SQL Hammer, a <a href=\"http:\/\/www.sqlhammer.com\/\">Microsoft SQL Server resource<\/a>.<\/p>\n<p>For more information, visit <a href=\"http:\/\/www.sqlhammer.com\/\">http:\/\/www.sqlhammer.com<\/a>. Follow Derik on <a href=\"http:\/\/twitter.com\/SQLHammer\">Twitter<\/a> for <strong>SQL tips and chat<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The other day I had an interesting conversation about something that sounds counter-intuitive. If NULL means unknown or missing, then does OPTIMIZE FOR UNKNOWN mean to optimize for NULL? Let\u2019s take a step back for a moment and put this into context\u2026 I have found that stored procedures written for reporting purposes have a tendency [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":572,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-101","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/101","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/comments?post=101"}],"version-history":[{"count":1,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/101\/revisions"}],"predecessor-version":[{"id":363,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/101\/revisions\/363"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/media\/572"}],"wp:attachment":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/media?parent=101"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/categories?post=101"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/tags?post=101"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}