Feeds:
Posts
Comments

Index

Btrees have a root node, intermediate nodes, and leaf nodes. Nodes are interchangeably called pages.

SQL Server default behaviour allows each data and index page to be filled 100% full.

This strategy save space and makes reads faster because all the data is contiguous.

Page splits are an expensive operation that injects free space into a table as new data is inserted. Page splits move about half of the rows of the current data or index page into a whole new page so that there will be enough room for the new row.

 SQL Server allows you to minimize the performance hit of page splits by specifying a fill factor for the clustered index of your table.

 Fill factor is a gap placed on each data page when a clustered index is created.

The gap is a percentile value describing how much data should be on the data page. Thus, a value of 100% means that the data page will be 100% full of data, while a value of 80% means that 20% of the data page will be free for new inserts.

N=p/100(n-1)+1

We can even ensure free space in the intermediate and index pages of a table by using the pad index option of the CREATE INDEX statement or via the SQL Enterprise Manager GUI

Fill factor is only established when the index is created. SQL Server doesn’t do anything to maintain the fill factor. Therefore,

Solution

Good idea to schedule rebuilds on indexes on a regular basis where fill factors need to be maintained.

 Clustered index

 A clustered index defines the actual physical order of the rows written to disk.

  • New rows may be inserted throughout the table and added to partially used pages.
  • A clustered index on a long key takes no extra space for the leaf level (composite keys).
  • A query using a clustered index is much more likely to take advantage of read-ahead.

Clustered indexes are good for queries

 

  • GROUP BY that use all or the first few columns of the clustered index key.
  • ORDER BY that use all or the first few columns of the clustered index key.
  • WHERE clause conditions comparing to the first or the first few columns of the clustered index key and retrieving many rows.

 It is important to have a clustered index on every table. They help to manage database space more efficiently and enable the use of fill factor

 Tables without a clustered index, called a heap, have numerous problems

 Example

If you have two important queries on the same table using different keys in the ORDER BY clause, you have to choose which one is more critical as your clustered index. You can create a clustered index to optimize one report and a non-clustered index for another. However, it is likely that the query answered by the non-clustered index will not see as much performance boost as that answered by the clustered index. In such conflict situations you have to consider carefully which query is more important to optimize first.

 Either based on long columns or composite keys comprised of many columns

 

Non-clustered indexes

A non clustered index serve as quick look-ups to the actual physical records, much like the index of a book or magazine

 All new rows are added to the bottom of the table (the last data page of the table).

 A non-clustered index on a long key may be quite large, because it takes a lot of space to store keys of the leaf level of the index (composite keys).

 A non-clustered index it usually turns out that requested rows are spread around (Bcoz it’s disordered) different pages and it may take additional I/O operation.

 Most non-clustered indexes have keys narrower than the underlying table.

 Select count(*) from table_name

 More vs. Fewer Indexes

 Realize that more indexes are not always better. Every new index on a table may improve some of the SELECT statements, but may also slow down other INSERT, UPDATE, and DELETE transactions on the same table.

This is because SQL Server has to update every index key for each affected row in the transaction. Simply put, the more indexes, the faster the SELECT statements. The more indexes, the greater the tax on INSERT, UPDATE, and DELETE transactions.

 Decision support applications and reporting databases typically require more indexes per table in order to satisfy the most frequently run queries.

 They usually put up with slower modification operations because those are limited to during non-peak hour batch jobs.

 Short vs. Long Index Keys

“one column in the index” or “two or more columns in the index”?

Indexes with short keys, especially those with a single column, are generally more efficient than those with long keys.

 It is a matter of simple I/O, because more keys fit on an 8K page thus reducing the total number of I/O’s needed to answer a request.

 Very often you have a choice of adding an extra column to the key that doesn’t make a big difference in index selectivity.

 Covering Indexes

 A non-clustered index that includes (or covers) all columns used in a specific query is called a covering index.

 The values of the covering index are stored on their own leaf nodes.

 When SQL Server requests data using the covering index, the query can be resolved simply by scanning the index without having to scan the actual table and its data pages.

 This can result in dramatically fewer I/Os for the query, especially on large tables.

 SELECT fname, lname

FROM employee

WHERE lname LIKE ‘%ttlieb%’

AND job_lvl < 200

 If a covering index benefits only a single query

 It’s takes a tax on every INSERT, UPDATE, or DELETE transaction on the same table, then it may not be a good idea to create it.

 Making covering index keys too wide reduces their benefits as well, because an index on a wide key takes more space and may not be much more effective than scanning data pages of the table.

 Index Selection

 Using the right indexes is the most important aspect of optimization.

  • Evaluate filter conditions on each table
  • Satisfy join conditions
  • Find column values without going to the actual data pages (covering index)

The optimizer may find an index based on the SARG (search arguments) criteria that significantly narrow down the number of table rows qualifying for the result set.

 Composite indexes

 Composite indexes are composed of several columns of a table.

 Composite indexes are used from leftmost column to right

 Depending on your WHERE clause conditions, SQL Server may use all or fewer columns of the index, or not use the index at all, as shown below:

 

WHERE CLAUSE CONDITIONS KEY COLUMNS THAT MAY BE USED
WHERE a = @aAND b = @bAND c = @cAND d = @d a, b, c, d
WHERE a = @aAND b = @bAND c = @c a, b, c
WHERE a = @aAND b = @bAND d = @d a, b
WHERE a = @aAND c = @cAND d = @d a
WHERE b = @bAND c = @cAND d = @d Index cannot be used because the WHERE clause does not start with the leftmost column.
WHERE b = @bAND a = @a Again, the index cannot be used, despite both columns being indexed, because the WHERE clause does not analyze the leftmost column first.

 The key point to remember is that you should know the order of columns appearing within a composite index. Once you know the order of the columns, you should always structure your WHERE clause to analyze columns starting with the leftmost column in the composite index and work towards the right.

Join Order

The more tables you join, the more possible join orders SQL Server may come up with the number of possible joins order.

 

NUMBER OF JOINED TABLES NUMBER OF POSSIBLE JOIN ORDERS
1 1
2 2
3 6There are six orders for three tables (shown as table 1, 2, and 3): 123, 132, 213, 231, 312, 321.
4 24There are 24 ways to join four tables (shown as 1, 2, 3, and 4): 1234, 1243, 1324, 1342, 1423, 1432, 2134, 2143, 2314, 2341, 2413, 2431, 3124, 3142, 3214, 3241, 3412, 3421, 4123, 4132, 4213, 4231, 4312, and 4321.
5 120
6 720
7 5,040
8 40,320
9 362,880
10 3,628,800
11 39,916,800
12 479,001,600
13 6,227,020,800
14 87,178,291,200
15 1,307,674,368,000
16 20,922,789,888,000Even if you could evaluate a million different join orders per second, it would still take 242 days to find the best plan for a 16-way join!

 

 

 

  And it is still a recommended best practice that you do not join more than five tables, even with the best hardware and latest editions of the software running in your environment.

 We do not exceed 16 tables for join the tables.

 Transitive Properties

 Query performance is based on the old algebra concept called Transitive Properties.

 In this concept, we were taught that if A = B and B = C, then it is also true to say A = C.

 SQL Server does not know this. Therefore, you can get a big performance boost in multi-table queries or mutli-join queries by including the extra step of checking for A = C.

 

SELECT c.customer_id

FROM   customer_order c,

       product p       ,

       shipping_registry s

WHERE  c.product_id = p.product_id

   AND c.product_id = s.product_id

 The performance on this query was only average. However, adding one more option for comparison gave the query optimizer the choices it needed to rotate a long-running query into a couple seconds query.

 

SELECT c.customer_id

FROM customer_order c,

product p,

shipping_registry s

WHERE c.product_id = p.product_id

AND c.product_id = s.product_id

AND p.product_id = s.product_id

 

Queries that Include OR Statements

As a workaround, you should evaluate the query as a UNION with each SELECT in the UNION using its own highly selective SARGs. This alternative approach will enable you to ensure that at least some portion of the query performs index seeks rather than full table scans.

 

Queries that Use LIKE and Wildcards

Wildcares in a LIKE clause can be dangerous. There are two simple rules to remember when evaluating how useful a LIKE statement is to the performance of a query:

• LIKE can benefit from indexes if the pattern starts with a character string, such as WHERE lname LIKE ‘w%’

• LIKE cannot use an index if the pattern starts with a wildcard, such as WHERE lname LIKE ‘%alton’

Thus, a query searching for records with a lname value of ‘Walton’ would access the table’s index using the first WHERE clause, but would not access the index using the second WHERE clause.

  Queries Using Functions and Calculations in the WHERE Clause

 There are a certain number of knowable and convenient situations where an index will be unused, despite referencing it in the WHERE clause of your query, primarily the use of functions and calculations. Learning these simple rules of thumb will enable you to ensure your queries perform at their peak.

 First, the use of calculations against a column in the WHERE clause of a query invalidates any index on that column. For example:

 SELECT ord_num

FROM sales

WHERE qty * 12 > 10000

 This query MUST do a full table scan to deliver the result set, even if there is an index on the sales.qty column.

 SELECT ord_num

FROM sales

WHERE qty > 10000/12

 

 The query above performs a seek on the index created on sales.qty!

Second, the use of functions against a column in the WHERE clause of a query invalidates any index on that column. For example:

 SELECT ord_num

FROM sales

WHERE ISNULL(ord_date, ‘Jan 01,2001’) > ‘Jan 01, 2002 12:00:00 AM’

 This query MUST do a full table scan to deliver the result set, even if there is an index on the sales.ord_date column.

 

SELECT ord_num

FROM sales

WHERE ord_date IS NOT NULL

AND ord_date > ‘Jan 01, 2002 12:00:00 AM’

 The query above performs a seek on the index created on sales.ord_date!

 INDEX TUNING WIZARD

 The most valuable information we can provide for the wizard is a day of normal database activity.

 The wizard then analyzes the workload and recommends an index configuration that will improve the performance of our database.

 The wizard can even return recommendations including important advanced considerations like disk space constraints.

From the point of view of database programming, a database LUW is an inseparable sequence of database operations that ends with a database commit. The database LUW is either fully executed by the database system or not at all. Once a database LUW has been successfully executed, the database will be in a consistent state. If an error occurs within a database LUW, all of the database changes since the beginning of the database LUW are reversed. This leaves the database in the state it was in before the transaction started.

 

BC400_03_image001

The database changes that occur within a database LUW are not actually written to the database until after the database commit. Until this happens, you can use a database rollback to reverse the changes.

Transaction Log Architecture

The transaction log in a database maps over one or more physical files. Theoretically, the log file is a string of log records. Physically, the sequence of log records is stored efficiently in the set of physical files that implement the transaction log.

The SQL Server Database Engine divides each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file.

Physical file into a number of virtual log file.

The Database Engine chooses the size of the virtual log files dynamically while it is creating or extending log files. The Database Engine tries to maintain a small number of virtual files.

The size of the virtual files after a log file has been extended is the sum of the size of the existing log and the size of the new file increment. The size or number of virtual log files cannot be configured or set by administrators.

The only time virtual log files affect system performance is if the log files are defined by small size and growth_increment values.

 If these log files grow to a large size because of many small increments, they will have lots of virtual log files. This can slow down database startup and also log backup and restore operations.

 Recommendation….. 

We have to assign log files a size value close to the final size required, and also have a relatively large growth_increment value.

The transaction log is a wrap-around file. For example, consider a database with one physical log file divided into four virtual log files. When the database is created, the logical log file begins at the start of the physical log file. New log records are added at the end of the logical log and expand toward the end of the physical log.

Log truncation frees any virtual logs whose records all appear in front of the minimum recovery log sequence number (MinLSN). The MinLSN is the log sequence number of the oldest log record that is required for a successful database-wide rollback.

 ms179355_cfa19116-585f-466f-b632-9f502ae1a0d7(en-us,SQL_100)

This cycle repeats endlessly, as long as the end of the logical log never reaches the beginning of the logical log. If the old log records are truncated frequently enough to always leave sufficient room for all the new log records created through the next checkpoint, the log never fills.

If the end of the logical log does reach the start of the logical log, one of two things occurs:

If the FILEGROWTH setting is enabled for the log and space is available on the disk, the file is extended by the amount specified in growth_increment and the new log records are added to the extension.

If the FILEGROWTH setting is not enabled, or the disk that is holding the log file has less free space than the amount specified in growth_increment, an 9002 error is generated.

If the log contains multiple physical log files, the logical log will move through all the physical log files before it wraps back to the start of the first physical log file.

 

DELETE Statement

 

The DELETE statement is used to delete rows in a table.

 
Syntax
DELETE FROM < TARGET_TABLENAME >
WHERE COLUMN_NAME = VALUES

 

FROM an optional keyword that can be used between the DELETE keyword and the target table name. WHERE Specifies the conditions used to limit the number of rows that are deleted.

 The DELETE statement may fail if it violates a trigger or attempts to remove a row referenced by data in another table with a FOREIGN KEY constraint.

 If the DELETE removes multiple rows, and any one of the removed rows violates a trigger or constraint, the statement is cancelled, an error is returned, and no rows are removed.

 When a DELETE statement encounters an arithmetic error (overflow, divide by zero, or a domain error) occurring during expression evaluation, SQL Server handles these errors as if SET ARITHABORT is ON. The remainder of the batch is cancelled, and an error message is returned.

 DELETE physically removes rows one at a time and records each deleted row in the transaction log (Transaction log will examine my next post).

 DELETE TABLE makes the space occupied by the deleted rows available for the storage of new data.

 

DELETING ROWS FROM A HEAP

 Before going to discuss this topic we need to know what is heap.

 What is heap?

In SQL Server data is stored within tables. Data within a table is grouped together into allocation unites based on their column data types, what it means is one kind of data types are stored together in allocation unites. Data within this allocation unit is stored in pages. Each pages are of size 8KB. Group of 8 pages is stored together and they are referred as Extent. Pages within a table store the data rows with structure which helps to search/locate data faster. If the data of table is not logically sorted, in other word there is no order of data specified in table it is called as Heap Structure.

 Let us come to the point

 When rows are deleted from a heap the Database Engine may use row or page locking for the operation.

 As a result, the pages made empty by the delete operation remain allocated to the heap. When empty pages are not deallocated, the associated space cannot be reused by other objects in the database.

 Specify the TABLOCK hint in the DELETE statement. Using the TABLOCK hint causes the delete operation to take a shared lock on the table instead of a row or page lock. This allows the pages to be deallocated.

 Use TRUNCATE TABLE if all rows are to be deleted from the table.

 Create a clustered index on the heap before deleting the rows. You can drop the clustered index after the rows are deleted. This method is more time consuming than the previous methods and uses more temporary resources.

 

USING AN INSTEAD OF TRIGGER ON DELETE ACTIONS

 When an INSTEAD OF trigger is defined on DELETE actions against a table or view, the trigger executes instead of the DELETE statement. Earlier versions of SQL Server only support AFTER triggers on DELETE and other data modification statements. The FROM clause cannot be specified in a DELETE statement that references, either directly or indirectly, a view with an INSTEAD OF trigger defined on it.

 PERMISSIONS

DELETE permissions are required on the target table. SELECT permissions are also required if the statement contains a WHERE clause.

 Examples

Using DELETE on the current row of a cursor

The following example deletes a single row from the #temp table using a cursor named complex cursor. The delete operation affects only the single row currently fetched from the cursor

 Code

 CREATE TABLE #TEMP

(

    ID INT IDENTITY(1,1),

    CITY      VARCHAR(30)

)

 INSERT INTO #TEMP

VALUES  (‘INDIA’),   (‘ORANGE’),   (‘UK’),

         (‘USA’),           (‘AUS’),      (‘ROME’),

         (‘DENMARK’),       (‘GERMAN’)

 

DECLARE COMPLEX_CURSOR CURSOR FOR

          SELECT ID

         FROM #TEMP

             OPEN COMPLEX_CURSOR;

    FETCH FROM COMPLEX_CURSOR;

               DELETE FROM #TEMP

                       WHERE CURRENT OF COMPLEX_CURSOR;

     CLOSE COMPLEX_CURSOR;

    DEALLOCATE COMPLEX_CURSOR

   SELECT * FROM  #TEMP

 

DROP TABLE #TEMP

   

Using DELETE based on a sub query and using the Transact-SQL extension

DELETE FROM #TEMP

WHERE ID IN

    (SELECT ID

     FROM #TEMP

     WHERE ID > 11);

 Using DELETE with the TOP clause

 The following example deletes 2.5 percent of the rows (total rows)

 DELETE TOP (2.5) PERCENT

FROM #TEMP

Using DELETE with the OUTPUT clause

 The following example shows how to save the results of a DELETE statement into a table variable.

 DELETE #TEMP

    OUTPUT DELETED.*

Using OUTPUT with from_table_name in a DELETE statement

 The following example deletes rows in the #temp table. The OUTPUT clause returns columns from the table being deleted and inserted into variable table.

 DECLARE @MyTableVar table (

    ID int NOT NULL,

    CITY NVARCHAR(50)NOT NULL

    );

 

DELETE #temp

OUTPUT DELETED.ID,

       DELETED.CIty

    INTO @MyTableVar

FROM #temp

 SELECT *

FROM @MyTableVar

GO

 

Merge Statement

SQL SERVER 2008 launches the MERGE statement which will allow user to achieve insert, update and delete operations in a single statement.


Introduction of MERGE Statement

MERGE is a new feature that provides an efficient way to perform multiple DML operations. In earlier version we had to write separate statements to INSERT, UPDATE, DELETE data based on certain conditions, but now, using MERGE statement. One of the most main advantages of MERGE statement is all the data is read and processed only once.

How MERGE Statement works

The MERGE statement internally works as an individual insert, update and delete statement within a single Merge statement. You need to specify the SOURCE and the TARGET table or query which should be joined together. Within the MERGE statement you also need to specify the type of the data modification that needs to be performed when the records between the source and target are matched and what actions needs to be performed when they are not matched.

Below are the three different matched clauses in MERGE

· WHEN MATCHED THEN

o Rows that meet the criteria

· WHEN [TARGET] NOT MATCHED THEN

o Rows that do not match with another row in the target table

· WHEN SOURCE NOT MATCHED THEN

o Rows that do not match with another row in the source table

Notes

At least one of the three MATCHED clauses must be specified, but they can be specified in any order. Any insert, update, or delete actions specified on the target table by the MERGE statement are limited by any constraints defined on it, including any cascading referential integrity constraints.

If IGNORE_DUP_KEY is set to ON for any unique indexes on the target table, MERGE ignores this setting. The MERGE statement requires a semicolon (;). MERGE is a fully reserved keyword when the database compatibility level is set to 100.

Trigger Implementation

For every insert, update, or delete action specified in the MERGE statement, SQL Server fires any related AFTER triggers defined on the target table, but does not guarantee on which action to fire the triggers first or last.

If there is any INSTEAD OF UPDATE or INSTEAD OF DELETE triggers defined on target_table, the update or delete operations are not performed. Instead, the triggers fire and the inserted and deleted tables are populated accordingly.

If there is any INSTEAD OF INSERT triggers defined on target_table, the insert operation is not performed. Instead, the triggers fire and the inserted table is populated accordingly.

Example

In our example we consider our blog members detail and meets the below three conditions.

· Delete the records whose post more than 15000.

· Update post and add 1000 points to each for bonus.

· Insert the record does not exists.

CREATE TABLE Sqlteam_Memebers
(
MEMBER_ID    INTEGER PRIMARY KEY,
MEMBER_NAME VARCHAR(15)
)
GO

INSERT INTO Sqlteam_Memebers
VALUES(1,'RAJDAKSHA')
INSERT INTO Sqlteam_Memebers
VALUES(2,'PESO')
INSERT INTO Sqlteam_Memebers
VALUES(3,'KHTAN')
INSERT INTO Sqlteam_Memebers
VALUES(4,'WATERDUCK')
INSERT INTO Sqlteam_Memebers
VALUES(5,'RICKD')
INSERT INTO Sqlteam_Memebers
VALUES(6,'BKLR')
INSERT INTO Sqlteam_Memebers
VALUES(7,'MADHI')
GO 

SELECT * FROM SQLTEAM_MEMEBERS
SELECT * FROM MEMEBERSTOTALPOSTS
GO

CREATE TABLE MemebersTotalPosts
(
MEMBER_ID        INTEGER REFERENCES SQLTEAM_MEMEBERS,
TOTALPOSTS        INTEGER
)
GO

INSERT INTO MemebersTotalPosts
VALUES(1,330)
INSERT INTO MemebersTotalPosts
VALUES(2,25000)
INSERT INTO MemebersTotalPosts
VALUES(3,11682)
INSERT INTO MemebersTotalPosts
VALUES(4,568)
GO

 

 

MERGE MemebersTotalPosts AS MTP
USING
             (SELECT MEMBER_ID,
                     MEMBER_NAME
             FROM    Sqlteam_Memebers
             ) AS STM
ON MTP.MEMBER_ID = STM.MEMBER_ID
WHEN MATCHED
         AND MTP.TOTALPOSTS > 15000 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET              MTP.TOTALPOSTS = MTP.TOTALPOSTS + 1000
WHEN NOT MATCHED THEN
INSERT
       (
              MEMBER_ID,
              TOTALPOSTS
       )
       VALUES
       (
              STM.MEMBER_ID,
              100
       );