Recently we are assigned a task to improve the performance of a couple of SQL queries that run for 15 seconds and 30 seconds. At the beginning we were not so sure whether the performance can be improved anymore, given that there were about 80 million data in the database. But I have to admit that SQL Server is really powerful as eventually we are able to reduce the time down to 2 seconds and 6 seconds!
During the process, I have had a few interesting findings about SQL Server and learned a lot about troubleshooting SQL Server performance issues. Here I want to share a few tips on optimizing SQL statements I learned from the experience.
check Index first
We sometimes get ahead of ourselves and jump directly into analyzing SQL queries, but the answer to the performance issue may just lie on the indexing itself, so always check whether necessary indexes are presented. A good way to find out what indexes you need to is to use Dynamic Management Views to monitor system health and come up necessary indexes. With appropriate indexes in place, performance increase will be substantial.
Clustered vs. Non-clustered index
There are tons of articles about Clustered and Non-clustered index and their benefits. Knowing the difference between clustered and non-clustered index helps troubleshooting performance issue for sure. Here is a nice article explaining Clustered and Non-clustered indexes.
Here I would like to share a metaphor I like which helped me understand Clustered and Non-clustered index – Clustered index is like the page numbers of a book where the logic order of the pages is the same as the physical order of them in the book, e.g. page 1 is at the first and page 100 is the last page for a 100 pages book. On the other hand, Non-clustered index is like the Index page of a book where the logic order of the terms on the index page is not the same as their actual locations in the book. For instance, the word Apple is listed at the beginning of the index page but it may firstly appear on the last page of the book.
Analyze, trial and error
To really find out why a certain query is running slow, more information is needed.
VIEW Execution plan and set Statistics io
Execution plan can be quite helpful to figure out which parts of a complex query are taking time and how SQL server optimized the query. Though it does not provide a solution directly, it does provide some hints or directions on troubleshooting performance issues. SQL Management Studio or even Visual Studio now provide the ability to view execution plan. A good tool I find quite useful is SQL Sentry Plan Explorer which displays the execution plan diagram with much more information, for example, it displays the number of records processed above each line, which is very convenient.
Another quite handy tool, Statistics IO, displays statistics information for your query. Set Statistics IO ON and information such as the number of scans will be displayed.
Break nested queries
However, execution plan may not be very straightforward sometimes, especially for a complex query with many nested sub-queries. The large query will be optimized by SQL Server which makes the execution plan harder to understand. Instead of trying to figure out what SQL Server is doing, a faster way for me is to just break the large query into smaller ones and test the performance of each of them. It is possible that SQL Server is not picking the best route to run the query and the ‘optimized’ query is actually running much slower, which is exactly what happened in my case. My query should only takes 2 seconds to run whereas the ‘optimized’ query by SQL Server takes 15 seconds. Why? Optimizing query also takes time and SQL Server may just not find the best optimization in time.
If SQL Server is selecting a slower way to execute the query, there are a couple of workaround:
- Use OPTION (FORCE ORDER) at the end of the query to force the order of join to be the same as it is in the query statement.
- Break nested sub-queries into temporary tables. This way you can force SQL Server to materialize sub-queries.
Trial and Error
Another quick and easy approach is just try different ways of writing the same query. Swap IN with JOIN, change INNER JOIN to LEFT JOIN to EXIST, or change Non-Unique Non-clustered index to Clustered index. Some changes just take a few seconds to make and they may just work. Once it works, you can research a bit on why or may just choose to forget about it if not in the mood. I happen to know someone (me) who is always not in the mood 🙂