Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table schema primary key #527

Closed
m7yJpNXY opened this issue Aug 21, 2024 · 6 comments
Closed

Table schema primary key #527

m7yJpNXY opened this issue Aug 21, 2024 · 6 comments
Labels
pri1 released Issue has been released

Comments

@m7yJpNXY
Copy link

Some input from a DBA (not me) regarding the table schema (mssql) that could perhaps be of interest.

Recommendation based on table analysis

Based on an analysis of the cardinality of the table columns and index definitions for the table dbo.scheduled_tasks I can see the following from the DBA perspective:

  • The following columns are included as key columns in the PRIMARY KEY (clustered index) of the table:
    task_name
    task_instance
  • Cardinality analysis of these columns shows that:
    task_name - contains only two unique values
    task_instance - contains one unique value per row
  • As the order of key columns matters, where you want columns with the highest cardinality first, the order of the columns in the PRIMARY KEY definition for the table should be changed so that task_instance comes first.
  • If the order of the columns is changed, the database engine can find the correct row in the table faster and with fewer locks when updating specific rows, as happens in the two UPDATE queries that cause deadlocks. Fewer deadlocks and faster access to the right row lead to less risk of deadlocks.
@JonasNjopOlsson
Copy link

The DBA in question would be me so if anything seems unclear regarding the recommendation, please let me know. We are seeing a high frequency of deadlocks in the Microsoft SQL Server environment due to the above issue with the table and index definitions.

@kagkarlsson
Copy link
Owner

kagkarlsson commented Aug 21, 2024

Thank you for posting your observation. The SQL Server deadlock-issue has popped up before, and I think what is needed is this type of effort by actual users of the database. Surprising behavior to me, but not sure on where the actual deadlock occurs since I am more familiar with Postgres, where this is not a problem.

I agree with you that it would be slightly better to reverse the order of the fields in the primary key. Possibly significantly better for Sql Server (depending on how index-locking actually works there)?? Did you try creating an alternative index and see if that affected the number of deadlocks?

@JonasNjopOlsson
Copy link

@kagkarlsson: The problem we are seeing is when several concurrent SQL sessions are trying to update the same table with similar code. Examples below:

-- Session #1
--------------------------------------------------------------------------------------------------
-- Arguments: @P0 bit,@P1 nvarchar(4000),@P2 datetime2,@P3 bit,@P4 nvarchar(4000),@P5 nvarchar(4000),@P6 bigint
UPDATE scheduled_tasks
SET
    picked = @P0
    ,picked_by = @P1
    ,last_heartbeat = @P2
    ,version = version + 1
WHERE
    picked            = @P3
    AND task_name     = @P4
    AND task_instance = @P5
    AND version       = @P6;

-- Session #2
--------------------------------------------------------------------------------------------------
-- Arguments: @P0 bit,@P1 nvarchar(4000),@P2 datetime2,@P3 datetime2,@P4 datetime2,@P5 int,@P6 datetime2,@P7 nvarchar(4000),@P8 nvarchar(4000),@P9 bigint
UPDATE scheduled_tasks
SET
    picked = @P0
    ,picked_by = @P1
    ,last_heartbeat = @P2
    ,last_success = @P3
    ,last_failure = @P4
    ,consecutive_failures = @P5
    ,execution_time = @P6
    ,version = version + 1
WHERE
    task_name         = @P7
    AND task_instance = @P8
    AND version       = @P9;

Sessions which are running the two T-SQL statements above typically deadlock very frequently with each other in our SQL Server environment on the dbo.scheduled_tasks table.

I did not try changing the key column order in the index to alleviate the problem so far, since it is based on the PRIMARY KEY constraint of the table, but based on previous experience with similar issues it seems likely that the recommended approach would lead to much quicker access paths in the associated index to get to the right row for each UPDATE statement (fewer reads, shorter execution time, locks more likely to be taken on the specific row).

@JonasNjopOlsson
Copy link

@kagkarlsson @m7yJpNXY : We have now conducted internal tests using the same application code but with a changed table schema according to our suggestion above. We have not encountered a single deadlock in the database after the change was made and the jobs on the application side seem to be working better now. Our conclusion is that the recommendation works as expected and should be introduced into the main code base.

@kagkarlsson
Copy link
Owner

Nice! Thank you for investigating! You are welcome to submit a PR for the change (or I will get to this issue eventually)

@kagkarlsson
Copy link
Owner

🎉 This issue has been resolved in v15.1.1 (Release Notes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pri1 released Issue has been released
Projects
None yet
Development

No branches or pull requests

3 participants