{"id":98,"date":"2013-01-01T00:00:00","date_gmt":"2013-01-01T00:00:00","guid":{"rendered":"https:\/\/www.sqlhammer.com\/blog\/sql-server-index-column-order\/?shared=email&amp;msg=fail"},"modified":"2013-01-01T00:00:00","modified_gmt":"2013-01-01T00:00:00","slug":"sharedemailmsgfail","status":"publish","type":"post","link":"https:\/\/sqlhammer.com\/index.php\/2013\/01\/01\/sharedemailmsgfail\/","title":{"rendered":"SQL Server Index Column Order"},"content":{"rendered":"<p>At times I am\u00a0in a position to advise development teams on their practices. What I have learned from this experience is that there is always a demand for me to reduce what I tell them to\u00a0<em>developer terms<\/em>. Before the developers reading this go grab their torches and pitch forks, I am not speaking negatively of developers in any way. I am acknowledging that the DBA\u00a0<em>it depends<\/em> does not work well for how developers do their jobs.<\/p>\n<p>When I begin to describe the half a dozen different scenarios which impact how SQL Server behaves, I can see our developers\u2019 eyes glaze over. Sometimes\u00a0I am asked to simplify my explanation into a single rule which they can follow.\u00a0Having rules like these implemented as development standards is wonderful. They do not replace performance monitoring and tuning, however.<\/p>\n<p>The ordering of index columns is one case where it is difficult to nail down a bullet-proof standard. There is common advise that you should always order the columns from most selective to least selective. This advise is pretty good\u00a0but it is really inadequate in my opinion because there are queries which would not perform\u00a0optimally when this advise is followed.<\/p>\n<h2>How column ordering affects your query<\/h2>\n<p>When the query optimizer is determining what index to use, it looks at the statistics that were available at the time. In the statistic, the histogram is only stored for the first column in the index key. This gives the best determination of column density or selectivity. Density is\u00a01\/(number of distinct values). For the other key columns, only the density of the left-based subsets are kept. In other words, the density is unknown for any column in the index unless all of the columns to the left of it are included in the predicates. Not filtering on the first column prevents the optimizer from knowing the density and may not chose to use that index, possibly resulting in a\u00a0table or index scan. The behavior is also different depending upon whether equality or inequality predicates are being used.<\/p>\n<h3>Demo setup<\/h3>\n<p>To demonstrate these behaviors I will be using the <a href=\"http:\/\/msftdbprodsamples.codeplex.com\/releases\/view\/93587\">AdventureWorks<\/a> database from Microsoft. I\u2019ve chosen to use the [Sales].[SalesOrderDetail] table based on its pre-loaded data but I have dropped all indexes from it and created a single, new, index.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail.jpg\" alt=\"\" \/><\/a><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/IX_SalesOrderDetail_rowguid_SalesOrderDetailID_SalesOrderId.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/IX_SalesOrderDetail_rowguid_SalesOrderDetailID_SalesOrderId.jpg\" alt=\"\" \/><\/a><\/p>\n<p>This index has the columns ordered with the most selective on the left and the least selective on the right, as seen by the reduced count of unique values found for\u00a0<em>SalesOrderID<\/em>.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity.jpg\" alt=\"\" \/><\/a><\/p>\n<p>To view the density vector for our index we will\u00a0use <a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/ms174384.aspx\">DBCC SHOW_STATISTICS.<\/a>\u00a0\u00a0Depending on your version of SQL Server, you may also use the DMF\u00a0<a href=\"http:\/\/msdn.microsoft.com\/en-us\/library\/jj553546.aspx\">sys.dm_db_stats_properties()<\/a>, explained <a href=\"http:\/\/www.sqlskills.com\/blogs\/erin\/new-statistics-dmf-in-sql-server-2008r2-sp2\/\">here<\/a>.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Density-Vector.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Density-Vector.jpg\" alt=\"\" \/><\/a><\/p>\n<h3>Equality predicates<\/h3>\n<p>Equality predicates are the easiest to understand because\u00a0the index will\u00a0likely be selected for use in the plan if you are filtering on one of the left-based subsets. You can see above, in the density vector, the three different combinations of columns that I will be able to get good performance out of.<\/p>\n<h4>Single column filtering<\/h4>\n<p>If I query the <em>rowguid<\/em> column, I can see that an efficient index seek is performed.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-1.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-1.jpg\" alt=\"\" \/><\/a><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-1.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-1.jpg\" alt=\"\" \/><\/a><\/p>\n<p>If I query on another single column within the index, an index scan is performed instead. The index is still used but it is much less efficient because the entire index must be read rather than taking advantage of the B-tree structure to traverse down the index levels.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-2.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-2.jpg\" alt=\"\" \/><\/a><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-2.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-2.jpg\" alt=\"\" \/><\/a><\/p>\n<h4>Multiple column filtering<\/h4>\n<p>When filtering on more than one column, an index seek will be chosen if an entire left-based subset is included in the filter. In this example, I query all three columns in the index which produces the desired index seek operation.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-3.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-3.jpg\" alt=\"\" \/><\/a><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-3.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-3.jpg\" alt=\"\" \/><\/a><\/p>\n<p>When I remove the <em>SalesOrderDetailID\u00a0<\/em> from the filter, though, the operation that occurs is a seek with a predicate. First it will seek out all rows based on the\u00a0<em>rowguid<\/em> column and then it will scan that entire result-set for the\u00a0<em>SalesOrderID<\/em> value. The seek is unable to penetrate further than the\u00a0<em>rowguid<\/em> because there is no density vector for\u00a0<em>rowguid, SalesOrderID<\/em>.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-4.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-4.jpg\" alt=\"\" \/><\/a><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-4.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-col-order-plan-4.jpg\" alt=\"\" \/><\/a><\/p>\n<h3>Inequality predicates<\/h3>\n<p>There is less of a chance of using index seeks with inequality predicates. There are two reasons for this.<\/p>\n<ul>\n<li>The selectivity of the\u00a0result-set is more important than the selectivity of the column.<\/li>\n<\/ul>\n<ul>\n<li>When a column partakes in an inequality predicate, all columns to the right of it in the index key can no longer be used with a seek.<\/li>\n<\/ul>\n<p>To address #1; when\u00a0using the\u00a0below greater than operator, the\u00a0<em>SalesOrderID<\/em> predicate results in only 301 rows while the\u00a0<em>SalesOrderDetailID<\/em> results in 46317 rows.\u00a0<em>SalesOrderDetailID<\/em> was determined to be the most selective of the two columns but the inequality operator causes the cost of the query to be lower, if\u00a0<em>SalesOrderID<\/em> is treated as the most selective and is put on the left of the index column list.<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity-2.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity-2.jpg\" alt=\"\" \/><\/a><\/p>\n<p>To address #2; I have queried with a mixture of equality and inequality predicates. The\u00a0<em>rowguid<\/em> column is the left-most column and is using equality which is why it is part of the index seek. Next I am using the greater than operator on the\u00a0<em>SalesOrderDetailID<\/em> column. This also is a seek predicate because SQL Server will seek to the minimum value of 75000 and then return all rows after that. Finally, the\u00a0<em>SalesOrderID<\/em> column is using an equality operator but this does not really matter. Since the\u00a0<em>SalesOrderDetailID<\/em> used an inequality operator, all columns to the right of it cannot take advantage of the seek. All rows returned in the result-set based on the first two columns will be scanned to find\u00a0<em>SalesOrderID = 43659.<\/em><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity-4.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity-4.jpg\" alt=\"\" \/><\/a><\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity-3.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/Sales-SalesOrderDetail-selectivity-3.jpg\" alt=\"\" \/><\/a><\/p>\n<h3>Are my indexes duplicate?<\/h3>\n<p>Now that we understand how the seek and scan operations are chosen by the optimizer, the concept of having a duplicate index is skewed a bit. When looking at these\u00a0indexes, it is easy to believe that they are duplicates. The only difference is that the columns are ordered differently. Every update to one of those columns will now need to update two indexes. It is possible for both to enter the buffer pool and take up twice as much space in memory. But what about queries that use inequality predicates?<\/p>\n<p><a href=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-dups.jpg\"><img decoding=\"async\" src=\"http:\/\/www.sqlhammer.com\/blog\/wp-content\/uploads\/2015\/01\/index-dups.jpg\" alt=\"\" \/><\/a><\/p>\n<p>If you have two separate queries; one which uses equality for\u00a0<em>SalesOrderDetailID<\/em> and inequality for\u00a0<em>SalesOrderID<\/em>, and another that uses\u00a0equality for\u00a0<em>SalesOrderID<\/em> and inequality for\u00a0<em>SalesOrderDetailID,\u00a0<\/em>then having both of these indexes can prevent an index scan for each query. Keep that in mind when searching your servers for duplicate indexes, the column order matters and there are cases where the write cost of having two is preferable to the read cost of only having one.<\/p>\n<h3>Credits<\/h3>\n<ul>\n<li><a href=\"http:\/\/sqlinthewild.co.za\/index.php\/2008\/11\/04\/what-are-statistics-and-why-do-we-need-them\/\">Gail Shaw<\/a>, I learned much of what I know about statistics from presentations and posts of hers.<\/li>\n<\/ul>\n<ul>\n<li>Erin Stellato on <a href=\"http:\/\/www.sqlskills.com\/blogs\/erin\/new-statistics-dmf-in-sql-server-2008r2-sp2\/\">New Statistics DMF in SQL Server 2008 R2 SP2<\/a>.<\/li>\n<\/ul>\n<ul>\n<li>Toad World Wiki \u2013\u00a0<a href=\"http:\/\/www.toadworld.com\/platforms\/sql-server\/w\/wiki\/9635.index-selectivity-and-column-order.aspx\">Index Selectivity and Column Order<\/a>.<\/li>\n<\/ul>\n<hr \/>\n<p><img decoding=\"async\" src=\"\/wp-content\/uploads\/hammer-derik-172x140.png\" alt=\"\" \/><strong>Written by Derik Hammer of <a href=\"http:\/\/www.sqlhammer.com\/\">SQL Hammer<\/a><\/strong><\/p>\n<p>Derik is a data professional focusing on Microsoft SQL Server. His passion focuses around <a href=\"http:\/\/www.sqlhammer.com\/blog\/category\/high-availability\/\">high-availability<\/a>, <a href=\"http:\/\/www.sqlhammer.com\/blog\/category\/general\/disaster-recovery\/\">disaster recovery<\/a>, continuous integration, and automated maintenance. his experience has spanned long-term database administration, consulting, and\u00a0entrepreneurial ventures.<\/p>\n<p>Derik gives the SQL community credit for plugging the gaps in his knowledge when he was a junior DBA and, now that his skills have matured, started <a href=\"http:\/\/www.sqlhammer.com\/\">SQLHammer.com<\/a> as one small way to give back and continue the cycle of shared learning.<\/p>\n<p>Derik is the owner and lead author of SQL Hammer, a <a href=\"http:\/\/www.sqlhammer.com\/\">Microsoft SQL Server resource<\/a>.<\/p>\n<p>For more information, visit <a href=\"http:\/\/www.sqlhammer.com\/\">http:\/\/www.sqlhammer.com<\/a>. Follow Derik on <a href=\"http:\/\/twitter.com\/SQLHammer\">Twitter<\/a> for <strong>SQL tips and chat<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>At times I am\u00a0in a position to advise development teams on their practices. What I have learned from this experience is that there is always a demand for me to reduce what I tell them to\u00a0developer terms. Before the developers reading this go grab their torches and pitch forks, I am not speaking negatively of [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":570,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-98","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/98","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/comments?post=98"}],"version-history":[{"count":0,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/posts\/98\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/media\/570"}],"wp:attachment":[{"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/media?parent=98"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/categories?post=98"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sqlhammer.com\/index.php\/wp-json\/wp\/v2\/tags?post=98"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}