Sql Server - All about Statistics



STATISTICS

Statistics holds information about distribution of data in tables & indexes.
Statistics helps the optimiser in creating efficient plan.

EXAMPLES:

 SET  “Auto Create Statistics ” & “Auto Update Statistics” in database properties   =   ‘FALSE’

CREATE TABLE TEST (Col1  INT ,Col2 VARCHAR(20),Col3 VARCHAR(20))

;WITH CTE (Col1,Col2,Col3)
AS(
SELECT Col1 = 1,Col2 = 'unique column' + CAST(1 AS VARCHAR(20)),col3 = 'nonunique column'
UNION ALL
SELECT Col1 = Col1 + 1,Col2 = 'unique column' + CAST(Col1 + 1 AS VARCHAR(20)),col3 = 'nonunique column'
FROM CTE
WHERE Col1 < 30000
)
INSERT INTO TEST
SELECT * FROM CTE OPTION (MAXRECURSION 30000)

sp_helpstats  - Returns statistics information about columns and indexes on the specified table.


sp_helpstats 'TEST', 'ALL'
GO



You will notice below message:
This object does not have any statistics or indexes.

Try the below query:

SELECT * FROM TEST WHERE Col1 = 1500
You will notice “Warnings: Columns with no statistics” in the below image. Because statistics are not created on any of the columns or index.

Now, SET  “Auto Create Statistics ” & “Auto Update Statistics” in database properties   =   ‘TRUE’ 

When AUTO_CREATE_STATISTICS is ON, the query optimizer creates statistics on individual predicate columns in the query, with statistics name starting with _WA as shown in below image.  

--Execute below query & right click statistics under table in object explorer and check,
SELECT * FROM TEST WHERE Col2 like '%unique column%'   





CREATE CLUSTERED INDEX CI_IDX ON TEST (Col1)

You will notice in the below image, after creating index, statistic is automatically created with index name. Because “Auto Create Statistics ” & “Auto Update Statistics” in database properties are SET to ‘TRUE’

Statistics can also be manually created by right clicking Table - > Statistics - > New Statistics .. or  by using below statement

CREATE STATISTICS statistics_name
ON { table_or_indexed_view_name } ( column [ ,...n ] )
    [ WHERE <filter_predicate> ]
    [ WITH
        [ [ FULLSCAN
          | SAMPLE number { PERCENT | ROWS }
          | STATS_STREAM = stats_stream ] [ , ] ]
        [ NORECOMPUTE ]
    ] ;

While creating statistics,Options FULLSCAN & SAMPLE are used to decide whether statistics has to be created by scanning all rows / by scanning mentioned sampling % of rows. 

If NORECOMPUTE option is specified, the query optimizer will complete any in-progress statistics updates and disable future AUTO_STATISTICS_UPDATE. 

How to check out dated statistics & When to update statistics:

Click “Include actual execution plan” in the query window, and then try  below query
     
SELECT * FROM TEST

Check  “Actual Number of Rows” and “Estimated Number of Rows” in the below image.
Insert 10k records

;WITH CTE (Col1,Col2,Col3)
AS(
SELECT Col1 = 30001,Col2 = 'unique column' + CAST(1 AS VARCHAR(20)),col3 = 'nonunique column'
UNION ALL
SELECT Col1 = Col1 + 1,Col2 = 'unique column' + CAST(Col1 + 1 AS VARCHAR(20)),col3 = 'nonunique column'
FROM CTE
WHERE Col1 < 40000
)
INSERT INTO TEST
SELECT * FROM CTE OPTION (MAXRECURSION 15000)


Click “Include actual execution plan” in the query window, and then again try  below query

SELECT * FROM TEST

Check  “Actual Number of Rows” and “Estimated Number of Rows” in the below image.
We can find statistics are out-dated by looking into execution plan
·         checking   “Actual Number of Rows” and “Estimated Number of Rows”, if the difference between the two is large,then it shows that statistics are out-dated and needs to be updated.
If “Auto Update Statistics” option in database properties is SET to ‘TRUE’. Statistics are periodically updated by SqlServer


Difference between Auto Update Statistics & Auto Update Statistics Asynchronously
Auto Update Statistics
Auto Update Statistics Asynchronously
Used to automatically update the statistics
Used to automatically update the statistics
SQL Server would ensure that the statistics are up-to date, every time a query is executed.
If the statistics are not updated,SQL Server would use the previously updated statistics for the execution of query and then updates statistics.

To be noted:

If the database is considered to have frequent DML operations,setting “Auto Update Statistics” to ‘True’,may degrade the performance by several statistics updates.
With STATS_DATE  function, date of the most recent update for statistics on a table or indexed view can be found.

Syntax:
STATS_DATE ( object_id , stats_id )

Various ways to have a look at statistics information:
1.       DBCC SHOW_STATISTICS displays current query optimization statistics for a table or indexed view.
 
2.  By right clicking Table - > Statistics - > properties
The optimizer uses the all_density values in statistics to determine whether to conduct a table scan or to use an index to access the data. all_density value ranges from 0.1 to 1. 

0.1 indicates  - less duplicate values in a column (index has high selectivity) 
1 indicates     - more duplicate values in a column (index has less selectivity) 

3.  SP_AUTOSTATS displays\changes the automatic statistics update option, AUTO_UPDATE_STATISTICS, for an index, a statistics object, a table, or an indexed view


Few system stored procedures related to statistics:

sp_statistics
sp_statistics_100
sp_statistics_rowset
sp_statistics_rowset2

You can also update statistics on columns, by checking option "update statistics  for these columns" under statistics properties in object explorer as shown in below image.


See Also:

2 comments:

Anonymous said...

Nice info about statistics.

Anonymous said...

Good one, and thanks for the statistics as well - very nice. Another great page here also explaining selectivity:

http://www.programmerinterview.com/index.php/database-sql/selectivity-in-sql-databases/