Basics of Statistics in SQL Server 2005, Page 2
Updating Statistics
Just as the data in your database isn't static, statistics aren't either. As you add, delete, and update rows, SQL Server needs to adjust the statistics so that they still provide good guidance for the query optimizer. When you run a query, SQL Server checks statistics used in that query's plan to see whether they're outdated. It uses these heuristics to determine if a statistic is outdated:
- If the rowcount in the table has gone from zero to any nonzero value, statistics are outdated.
- If the number of rows in the table was under 500, and at least 500 rows have been modified (added/deleted/edited), statistics are outdated.
- If the number of rows in the table was over 500, and at least 500 plus 20% of the rowcount rows have been modified (added/deleted/edited), statistics are outdated.
Normally statistics are updated as soon as it's discovered that they
should be, which can hold up the execution of a query. You can set the
AUTO_UPDATE_STATISTICS_ASYNC database property to have the
update performed asynchronously instead, which may result in less-accurate
query plans but faster start of execution.
Sometimes, you may find that the heuristics aren't good enough for
telling when statistics should be updated. For example, if you're adding a
lot of rows at the end of the range of a column (think about adding
autonumbers, or timestamps), the histogram for that column will become
outdated rapidly. You can force an immediate update from T-SQL by
executing the UPDATE STATISTICS statement:
UPDATE STATISTICS table | view
[
{
{index | statistics_name}
| ({index |statistics_name} [,...n])
}
]
[WITH
[
[FULLSCAN]
|SAMPLE number {PERCENT | ROWS}]
|RESAMPLE
|<update_stats_stream_option>[,...n]
]
[[,] [ALL | COLUMNS | INDEX]
[[,] NORECOMPUTE]
] ;
<update_stats_stream_option> ::=
[STATS_STREAM = stats_stream]
[ROWCOUNT = numeric_constant]
[PAGECOUNT = numeric contant]
As you can see, you can update all of the statistics for a table or view, or pick a particular statistic to update by index name or statistic name. You can also specify some update options:
FULLSCANforces a complete read of every row of data.SAMPLElets you specify how much of the data SQL Server should sample in building statistics. By default, it will figure out its own sampling coverage.RESAMPLEuses the same sampling rate as the last time the statistic was gathered.ALL, COLUMNS,orINDEXspecify whether the update should be to all statistics, column statistics, or index statistics. The default is all statistics.NORECOMPUTEturns off automatic statistics gathering for this table or statistic.
The Undocumented Options
If you check Books Online, you'll find the
update_stats_stream_option documented as "This syntax is for
internal use only and is not supported. Microsoft reserves the right to
change this syntax at any time."
The STATS_STREAM option, so far as I know, remains
undocumented. But we now know what the other two options do, thanks to the
new blogging spirit of openness and a
blog posting from the SQL Server Query Optimization Team. It turns out
that ROWCOUNT and PAGECOUNT can be used to trick
the query optimizer by letting you fake the internal metadata about a
particular table - specifically the count of rows and count of pages that
the table occupies.
If you're going to experiment with this, please do not do so in a production database!
For example, you can vastly increase the apparent rowcount and pagecount of the Sales.SalesOrderDetail table by issuing this command:
UPDATE STATISTICS Sales.SalesOrderDetail
WITH ROWCOUNT=5000000, PAGECOUNT = 5000000
Why would you want to do this? SQL Server's query optimizer does things
differently on small tables and on large tables. During development, you
likely don't have a whole lot of data in your tables. Rather than spend
time generating huge amounts of sample data, you can use the undocumented
options of UPDATE STATISTICS to see how the query optimizer
will treat your tables after they've been in production for a while. This
can be a quick and dirty method to uncover potential performance issues
before they arise, and perhaps point out the need for additional indexes
or other schema work. I'd recommend confirming any such changes by
retesting with realistic amounts of sample data, though.
Using Statistics Wisely
In most cases, SQL Server 2005 is so well tuned in this area that you
only need to remember a single best practice: leave the statistics set to
automatically update and forget about them. If you're having a problem
with particular queries, there are a few things you can do to make
statistics more accurate. These include running an update with
FULLSCAN, creating multicolumn statistics if you have
multiple-column WHERE clauses that aren't covered by existing
statistics, and updating statistics more frequently if you're filling a
table with ascending data. Armed with the ability to evaluate the quality
of statistics, you should be able to decide whether you need to undertake
any of these steps in your own databases.
About the Author
Mike Gunderloy is the author of over 20 books and numerous articles on development topics, and the Senior Technology Partner for Adaptive Strategy, a Washington State consulting firm. When he's not writing code, Mike putters in the garden on his farm in eastern Washington state.
