Thursday, 13 February 2014

SQL Server Statistics

The people who are reading this blog should have a basic knowledge of 
1. Indexes in sql server
2. Index internals like representation, contents, storage and usage.
3. Clustered, Non-clustered, and covering indexes.
4. SQL Server Architecture (Query processing part)

Lets begin to observe the importance of statistics..

Indexes are not enough to gain the best performance from the queries submitted to the database engine.
In addition to indexes, additional, updated and accurate statistics should also be used to gain best performance.

The purpose of optimizer is not only finding a best plan, its purpose is to find a best plan fast. The optimizer uses the statistics and finds the best execution plan suited for the query as fast as possible.

Selectivity - Is the measure of the number of rows which satisfies the condition in the where clause.

In a where clause if the
1- condition is selective enough (i.e, very less rows are present which satisfies that condition), the index is      effective and the process chosen is Index Scan.
2- condition is not selective enough (more number of rows are present which satisfies that condition ), the index is not fully used and the process chosen will an Index Scan or a Table Scan.
3- If we cannot tell the number of rows that satisfies the condition, then the statistics come to play.

My last blog post was on index scan , index seek and table scan.

And in that blog, i explained about a simple case where they can take place. But in many cases, we cannot determine the selectivity that easily.

There we took a query as example,
SELECT emp_name FROM EMP
WHERE emp_id=2210

The selectivity(the number of rows that satisfies the condition in a particular query) can be easily determined from the above query.

Think of a query like below,
SELECT emp_name FROM EMP
WHERE emp_id like "%21%"

In this case, we cannot determine the selectivity that easily. Statistics are used here. And this is the situation for most of the real time queries.

- How can we know the number of rows without performing a table scan??
Even this issue will be taken care by the statistics.

Statistics
1. Statistics are the information kept by the sql server. 
2. They are accessed through the sys.indexes and sys.stats catalog views.
3. Aids the optimizer in estimating how many rows a query will return.
4. Automatically created for indexes to help and assess the usefulness of an index.
5. Can even be created on an un-indexed columns.
6. Can also be created on columns that are not the high-order elements of an index.

SQL Server Version         Number of indexes per table            Number of stats per table
         2005                                         249                                                  2000
         2008                                         999                                                 10000
         2008 R2                                    999                                                 30000


   

No comments:

Post a Comment

GROUPBY VS PARTITIONBY