Performance improvements : Index on ClaimMutationID #362

mngoe · 2025-01-28T07:29:10Z

MutationLog request per ClientMutationID is one of the most common request in the database.

On a fresh install this kind of request is not really challenging, but for hundreds of uses and millions of mutation in the table without index, the database is really struggling.

MutationLog request per ClientMutationID is one of the most common request in the database. On a fresh install this kind of request is not really challenging, but for hundreds of uses and millions of mutation in the table without index, the database is really struggling.

sonarqubecloud · 2025-01-28T07:37:34Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

delcroip · 2025-01-29T09:36:26Z

What would you think about this kind of approach
https://www.perplexity.ai/search/best-django-type-for-uuid-in-m-dLREnfSOSlusSqx10PnDvw#1 (@sunilparajuli mentionned that the uuid as text was not so fast on mssql)

delcroip · 2025-01-29T09:43:30Z

@mngoe I am surprised to see no migration did you run makemigrations ?

sunilparajuli · 2025-01-29T15:49:07Z

@delcroip @mngoe

the difference is largely significant in large volume of data. We got this issue with Social Security Fund Nepal with almost 1 TB of database size, consisting of around ~300GB of textual data , ~600GB attachment (the attachment was configured to be stored in db which was bad choice previously ). Contributing around ~40GB of Mutation log table

Major problem faced:
the mutation log is generated almost everytime claim is created/updated..... etc. and due to the very large size, the ordering (recent mutation which we see at right panel), slowed down. When each thread (worker) starts to take time, it also required more RAM since, work is being held by mssql.

Diagnosis of the performance issue

Not readable.
Not naturally sortable according to creation time. Although the UUID v1 format includes a timestamp, it encodes it in a little-endian format, where the least significant portion of the time appears first. This makes it difficult to sort UUIDs based on their creation time.
For database like MySQL, Oracle, which uses clustered primary key, both UUIDv1 and UUIDv4 will hurt insertion performance if used as the primary key. This is because it requires reordering the rows to place the newly inserted row at the right position inside the clustered index.
-> On the other hand, PostgreSQL uses heap instead of clustered primary key, thus using UUID as the primary key won't impact PostgreSQL's insertion performance (still i would say no to uuid for PK in large dataset).

Analysis

Deep analysis

Why Integer Primary Key is Faster?
Smaller Storage Size – An INT (4 bytes) or BIGINT (8 bytes) requires much less space than a UUID (16 bytes).
Better Indexing – Smaller keys result in smaller indexes, making lookups, joins, and updates faster.
Sequential Inserts – Auto-incrementing INT (IDENTITY) ensures sequential inserts, reducing index fragmentation.
Efficient Joins – Foreign keys using INT are more efficient compared to UUID.
Why UUID Can Be Slower?
Random Inserts – UUIDs are randomly generated, causing page splits and fragmentation in indexes.
Larger Index Size – More storage overhead leads to slower lookups.
Sorting Issues – Unlike IDENTITY, UUIDs are not sequential, making ordered queries slower.

Since, we are not using any distributed kind of things so, using pk as auto-increment field is better choice.

Mitigation strategy:

Had to create new table for fresh start ( because - claim mutation log & core mutation log seems to take eternity for deletion so, could not do in realtime and service needs to be up 24/7 almost ),
We named table as mutation table v2 (something), remove data slowing in original table (mutation log, took several repeated task)
renamed back to original table (though it was not needed we still did). Still we need to fix this one using PK as integer.
@weilu , though it seems small area but , this was the performance issue that i was referring yesterday.

sunilparajuli · 2025-01-29T15:52:35Z

What would you think about this kind of approach https://www.perplexity.ai/search/best-django-type-for-uuid-in-m-dLREnfSOSlusSqx10PnDvw#1 (@sunilparajuli mentionned that the uuid as text was not so fast on mssql)

thanks i will check on this one , also whatever we need to do, the data type should support indexing

delcroip · 2025-01-29T16:06:17Z

@sunilparajuli thaks a lot for that feedback, we already reached the conclusion that MSSQL was not ideal but it adds up; why cannot we clear/archive mutation log with a running windows of 7 days ?

sunilparajuli · 2025-01-30T04:23:05Z

@sunilparajuli thaks a lot for that feedback, we already reached the conclusion that MSSQL was not ideal but it adds up; why cannot we clear/archive mutation log with a running windows of 7 days ?

I just mean it was taking lot of time (as a metaphor or smthing), it could take lots of hours for clearing the data but the risk was maintaining uptime. (deleting process of mutation log with parallel insertion of mutation log from active claim insertion)

When we perform a DELETE operation in a database like MSSQL, the system needs to lock the rows, pages, or even the entire table to ensure that no other operations interfere with the deletion process. Even if you try to limit the locks to individual rows (using something like ROWLOCK), deleting a large amount of data can cause the locks to escalate to a table-level lock. This means the entire table gets locked, which can block other operations like reads or writes.

If the table is being used actively—for example, in a 24/7 service where data is constantly being added or queried—this locking can create contention, when multiple operations are trying to access the same resource (in this case, the table) at the same time. Because of this probably , the deletion process becomes much slower because it has to compete with other operations for access to the table. So deleting a large amount of data from an actively used table can take a very long time.

There could be other factors affecting the operation's efficiency like how much resources maxime has allocated in the infrastructure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvements : Index on ClaimMutationID #362

Performance improvements : Index on ClaimMutationID #362

mngoe commented Jan 28, 2025

sonarqubecloud bot commented Jan 28, 2025

delcroip commented Jan 29, 2025

delcroip commented Jan 29, 2025

sunilparajuli commented Jan 29, 2025

sunilparajuli commented Jan 29, 2025

delcroip commented Jan 29, 2025

sunilparajuli commented Jan 30, 2025

Performance improvements : Index on ClaimMutationID #362

Are you sure you want to change the base?

Performance improvements : Index on ClaimMutationID #362

Conversation

mngoe commented Jan 28, 2025

sonarqubecloud bot commented Jan 28, 2025

Quality Gate passed

delcroip commented Jan 29, 2025

delcroip commented Jan 29, 2025

sunilparajuli commented Jan 29, 2025

sunilparajuli commented Jan 29, 2025

delcroip commented Jan 29, 2025

sunilparajuli commented Jan 30, 2025