Sometimes when you plan to change your index design is good to know the columns of your tables. But not just a data type and max size.
When you picking new index key from available candidates you need an overview of real measured properties for your table columns.
Some of the main values are:
Selectivity of the columns:
This is very important value. Because when you are planning index for your queries you will focus on columns in WHERE clause. Some ( or all) of them you will use for a index key.
But you need to decide in which order columns will be used in that key. And that's the case. You should focus on selectivity of the column. Selectivity mean how many rows column return if you ask only for one value. Selectivity 1 means you will get one row from the column. If you have selectivity lover than 1 it means you will receive more rows from column when you ask just for one value. You want to have most selective column first in the index key because it reduce the amount of entries though server should go in order to find requested rows.
So, if you will have three columns in where clause and for example they will have selectivity c1 (0.89) ,c2(1.00) , c3(0.45), you might go for index like this:
Real max used length of every column :
But index selectivity isn't only one viable property. You also need to ensure that your columns in the index key are short as possible. You should be also aware of current used max size of your columns. Maybe you will have most selective column uniqueidentifier which is much more wider than other integer column. If you will blindly use it, you can miss an opportunity to use different column which is smaller and better suited.
Max used length is also good in deciding between similar columns. Let's say you have c1 varchar(80) with selectivity 0.877 and c2 varchar(100) with selectivity (0.932), which you will use? Real used size could help to determine proper winner. Remember, smaller is better but you could look not only on data type max value you can employ the real usage of your column.
Max available column length:
This one will help you find an overestimated columns in your database design. Sometimes designers oversize the columns because they expect bigger values than in reality came from application. You can compare this value with the Real max used length to see if data type is utilized as expected. If i have a column with good selectivity in compare to other candidates and have a too big data type size which is not really used. You could decrease it's size and employ it as a first index key column.
If you see those values important like me, you can use my procedure to check the columns in your table:
It reports just a few (but valuable) columns:
Column name
Selectivity of column
Column type
MaxLength: defined by the column definition (e.g. varchar(30) = 30)
MaxSizeInBytes: real used size in Bytes
DistinctValues: Amount of distinct rows
NumberOfRows: Amount of rows with values (not counting empty cells - NULLs)
Column_id: Column ID registered in table
Table name
Some columns with big data types are excluded from the report because they cannot be considered for index key column.
NOTE: Full Script here: USP_Selectivity.sql
If you will use it, you get a nice picture of your columns in the table. It could looks like this:
Ok, I hope this procedure will help you. Now I must go because my girlfriend is getting really angry on me . She didn't appreciated my work at the evening. If you will have any questions, comments or you will find any bugs don't hesitate and write a comment or send an email.