Btrees have a root node, intermediate nodes, and leaf nodes. Nodes are interchangeably called pages.
SQL Server default behaviour allows each data and index page to be filled 100% full.
This strategy save space and makes reads faster because all the data is contiguous.
Page splits are an expensive operation that injects free space into a table as new data is inserted. Page splits move about half of the rows of the current data or index page into a whole new page so that there will be enough room for the new row.
SQL Server allows you to minimize the performance hit of page splits by specifying a fill factor for the clustered index of your table.
Fill factor is a gap placed on each data page when a clustered index is created.
The gap is a percentile value describing how much data should be on the data page. Thus, a value of 100% means that the data page will be 100% full of data, while a value of 80% means that 20% of the data page will be free for new inserts.
N=p/100(n-1)+1
We can even ensure free space in the intermediate and index pages of a table by using the pad index option of the CREATE INDEX statement or via the SQL Enterprise Manager GUI
Fill factor is only established when the index is created. SQL Server doesn’t do anything to maintain the fill factor. Therefore,
Solution
Good idea to schedule rebuilds on indexes on a regular basis where fill factors need to be maintained.
Clustered index
A clustered index defines the actual physical order of the rows written to disk.
- New rows may be inserted throughout the table and added to partially used pages.
- A clustered index on a long key takes no extra space for the leaf level (composite keys).
- A query using a clustered index is much more likely to take advantage of read-ahead.
Clustered indexes are good for queries
- GROUP BY that use all or the first few columns of the clustered index key.
- ORDER BY that use all or the first few columns of the clustered index key.
- WHERE clause conditions comparing to the first or the first few columns of the clustered index key and retrieving many rows.
It is important to have a clustered index on every table. They help to manage database space more efficiently and enable the use of fill factor
Tables without a clustered index, called a heap, have numerous problems
Example
If you have two important queries on the same table using different keys in the ORDER BY clause, you have to choose which one is more critical as your clustered index. You can create a clustered index to optimize one report and a non-clustered index for another. However, it is likely that the query answered by the non-clustered index will not see as much performance boost as that answered by the clustered index. In such conflict situations you have to consider carefully which query is more important to optimize first.
Either based on long columns or composite keys comprised of many columns
Non-clustered indexes
A non clustered index serve as quick look-ups to the actual physical records, much like the index of a book or magazine
All new rows are added to the bottom of the table (the last data page of the table).
A non-clustered index on a long key may be quite large, because it takes a lot of space to store keys of the leaf level of the index (composite keys).
A non-clustered index it usually turns out that requested rows are spread around (Bcoz it’s disordered) different pages and it may take additional I/O operation.
Most non-clustered indexes have keys narrower than the underlying table.
Select count(*) from table_name
More vs. Fewer Indexes
Realize that more indexes are not always better. Every new index on a table may improve some of the SELECT statements, but may also slow down other INSERT, UPDATE, and DELETE transactions on the same table.
This is because SQL Server has to update every index key for each affected row in the transaction. Simply put, the more indexes, the faster the SELECT statements. The more indexes, the greater the tax on INSERT, UPDATE, and DELETE transactions.
Decision support applications and reporting databases typically require more indexes per table in order to satisfy the most frequently run queries.
They usually put up with slower modification operations because those are limited to during non-peak hour batch jobs.
Short vs. Long Index Keys
“one column in the index” or “two or more columns in the index”?
Indexes with short keys, especially those with a single column, are generally more efficient than those with long keys.
It is a matter of simple I/O, because more keys fit on an 8K page thus reducing the total number of I/O’s needed to answer a request.
Very often you have a choice of adding an extra column to the key that doesn’t make a big difference in index selectivity.
Covering Indexes
A non-clustered index that includes (or covers) all columns used in a specific query is called a covering index.
The values of the covering index are stored on their own leaf nodes.
When SQL Server requests data using the covering index, the query can be resolved simply by scanning the index without having to scan the actual table and its data pages.
This can result in dramatically fewer I/Os for the query, especially on large tables.
SELECT fname, lname
FROM employee
WHERE lname LIKE ‘%ttlieb%’
AND job_lvl < 200
If a covering index benefits only a single query
It’s takes a tax on every INSERT, UPDATE, or DELETE transaction on the same table, then it may not be a good idea to create it.
Making covering index keys too wide reduces their benefits as well, because an index on a wide key takes more space and may not be much more effective than scanning data pages of the table.
Index Selection
Using the right indexes is the most important aspect of optimization.
- Evaluate filter conditions on each table
- Satisfy join conditions
- Find column values without going to the actual data pages (covering index)
The optimizer may find an index based on the SARG (search arguments) criteria that significantly narrow down the number of table rows qualifying for the result set.
Composite indexes
Composite indexes are composed of several columns of a table.
Composite indexes are used from leftmost column to right
Depending on your WHERE clause conditions, SQL Server may use all or fewer columns of the index, or not use the index at all, as shown below:
| WHERE CLAUSE CONDITIONS | KEY COLUMNS THAT MAY BE USED |
| WHERE a = @aAND b = @bAND c = @cAND d = @d | a, b, c, d |
| WHERE a = @aAND b = @bAND c = @c | a, b, c |
| WHERE a = @aAND b = @bAND d = @d | a, b |
| WHERE a = @aAND c = @cAND d = @d | a |
| WHERE b = @bAND c = @cAND d = @d | Index cannot be used because the WHERE clause does not start with the leftmost column. |
| WHERE b = @bAND a = @a | Again, the index cannot be used, despite both columns being indexed, because the WHERE clause does not analyze the leftmost column first. |
The key point to remember is that you should know the order of columns appearing within a composite index. Once you know the order of the columns, you should always structure your WHERE clause to analyze columns starting with the leftmost column in the composite index and work towards the right.
Join Order
The more tables you join, the more possible join orders SQL Server may come up with the number of possible joins order.
| NUMBER OF JOINED TABLES | NUMBER OF POSSIBLE JOIN ORDERS |
| 1 | 1 |
| 2 | 2 |
| 3 | 6There are six orders for three tables (shown as table 1, 2, and 3): 123, 132, 213, 231, 312, 321. |
| 4 | 24There are 24 ways to join four tables (shown as 1, 2, 3, and 4): 1234, 1243, 1324, 1342, 1423, 1432, 2134, 2143, 2314, 2341, 2413, 2431, 3124, 3142, 3214, 3241, 3412, 3421, 4123, 4132, 4213, 4231, 4312, and 4321. |
| 5 | 120 |
| 6 | 720 |
| 7 | 5,040 |
| 8 | 40,320 |
| 9 | 362,880 |
| 10 | 3,628,800 |
| 11 | 39,916,800 |
| 12 | 479,001,600 |
| 13 | 6,227,020,800 |
| 14 | 87,178,291,200 |
| 15 | 1,307,674,368,000 |
| 16 | 20,922,789,888,000Even if you could evaluate a million different join orders per second, it would still take 242 days to find the best plan for a 16-way join! |
And it is still a recommended best practice that you do not join more than five tables, even with the best hardware and latest editions of the software running in your environment.
We do not exceed 16 tables for join the tables.
Transitive Properties
Query performance is based on the old algebra concept called Transitive Properties.
In this concept, we were taught that if A = B and B = C, then it is also true to say A = C.
SQL Server does not know this. Therefore, you can get a big performance boost in multi-table queries or mutli-join queries by including the extra step of checking for A = C.
SELECT c.customer_idFROM customer_order c,product p ,shipping_registry sWHERE c.product_id = p.product_idAND c.product_id = s.product_id |
The performance on this query was only average. However, adding one more option for comparison gave the query optimizer the choices it needed to rotate a long-running query into a couple seconds query.
SELECT c.customer_idFROM customer_order c,product p,shipping_registry sWHERE c.product_id = p.product_idAND c.product_id = s.product_idAND p.product_id = s.product_id |
Queries that Include OR Statements
As a workaround, you should evaluate the query as a UNION with each SELECT in the UNION using its own highly selective SARGs. This alternative approach will enable you to ensure that at least some portion of the query performs index seeks rather than full table scans.
Queries that Use LIKE and Wildcards
Wildcares in a LIKE clause can be dangerous. There are two simple rules to remember when evaluating how useful a LIKE statement is to the performance of a query:
• LIKE can benefit from indexes if the pattern starts with a character string, such as WHERE lname LIKE ‘w%’
• LIKE cannot use an index if the pattern starts with a wildcard, such as WHERE lname LIKE ‘%alton’
Thus, a query searching for records with a lname value of ‘Walton’ would access the table’s index using the first WHERE clause, but would not access the index using the second WHERE clause.
Queries Using Functions and Calculations in the WHERE Clause
There are a certain number of knowable and convenient situations where an index will be unused, despite referencing it in the WHERE clause of your query, primarily the use of functions and calculations. Learning these simple rules of thumb will enable you to ensure your queries perform at their peak.
First, the use of calculations against a column in the WHERE clause of a query invalidates any index on that column. For example:
SELECT ord_num
FROM sales
WHERE qty * 12 > 10000
This query MUST do a full table scan to deliver the result set, even if there is an index on the sales.qty column.
SELECT ord_num
FROM sales
WHERE qty > 10000/12
The query above performs a seek on the index created on sales.qty!
Second, the use of functions against a column in the WHERE clause of a query invalidates any index on that column. For example:
SELECT ord_num
FROM sales
WHERE ISNULL(ord_date, ‘Jan 01,2001’) > ‘Jan 01, 2002 12:00:00 AM’
This query MUST do a full table scan to deliver the result set, even if there is an index on the sales.ord_date column.
SELECT ord_num
FROM sales
WHERE ord_date IS NOT NULL
AND ord_date > ‘Jan 01, 2002 12:00:00 AM’
The query above performs a seek on the index created on sales.ord_date!
INDEX TUNING WIZARD
The most valuable information we can provide for the wizard is a day of normal database activity.
The wizard then analyzes the workload and recommends an index configuration that will improve the performance of our database.
The wizard can even return recommendations including important advanced considerations like disk space constraints.

