07 May 2017

a case of fixing SQL deadlocks without any code/schema changes

I work on high volume message processing app which is built on .NET, C#, MS SQL server and IBM WebSphere MQ. The application process messages to the tune of 200 messages a second. During LnP testing, there were deadlocks happening in few scenarios and we fixed all of them.

Recently there was a need to load data from the same MS SQL database tables, which are being updated from the HIGH VOLUME message flow 24X7. The new data load tool should run daily to pick the changes happened over last business day based on the message was last modified and send it another interfacing system.

The new tool developed to be a .NET Console app and scheduled to run everyday using Windows Task Scheduler.

The tool ran for some days and we were monitoring logs. Unfortunately, the deadlocks issues were back again at the forefront. When the tool ran, the deadlocks were seen only with new tool DA layer logic but not with the original application which updates the same tables. This is because its easy to ROLLBACK a SELECT only SQL TRANSACTION for SQL SERVER than to rolling back an Insert/Update/Delete SQL TRANSACTION. So the SELECT only queries used by the new tool were chosen as DEADLOCK VICTIM all the time.

Transaction (Process ID) was deadlocked on lock resources with another process and has been chosen as the deadlock victim

With the above introduction (actually a bit long), let's get start to justify the blog post title.

After analyzing the logs which the tool produced we learned below:-

  • Deadlocks were happening while loading the records which were being updated by the main process app at the same time.
  • Analyzing the SELECT only queries which are used to load data in batches, we found no easy ways to optimize the queries to prevent deadlocks from happening. The queries were already optimized. No low hanging fruits.
  • The tool was running to select changes which were happened over 24 hours basing its logic on the record last modified time.
  • After doing analysis of record's last modified date-time we found that:-
    1. Most the records were heavily updated in first hour they got created.
    2. Almost all the records were being modified only during the first 5-7 hours from their created time. This was an important clue to fix the dead locks issue. we were facing.
  • So instead of selecting the records modified in the last 24 hours, we selected new 24 hours date time range such that whose MAX date time is 8 hours less than current date time. In this new date range, there were only few records were getting updated. Post this change in selection date time range, the SELECT only queries were getting started to succeeded ALL THE TIME without being a victim of deadlocks.

Deadlock issues were resolved without any code or database schema changes. These Complex deadlocks issues were fixed by employing altogether a different approach to look at the problem.

so, THINK OUTSIDE THE BOX!!!

References

stackoverflow QNA on cause-of-a-process-being-a-deadlock-victim

MSDN article on Detecting and Ending Deadlocks