Improve chunk delete performance #7484

tangledbytes · 2023-09-08T12:33:01Z

Explain the changes

This PR adds yet another index to objectparts relation in the hope of increasing the performance.

I also tried do filtering on the NodeJS side but did not see much performance improvement (and I think the performance might worsen if there are too many objects).

NOW

nbcore=# EXPLAIN ANALYZE SELECT jsonb_build_object('chunk', data->'chunk') as data FROM objectparts WHERE ((data->>'chunk'='644c59f6549898000d99a03f' and data ? 'chunk') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb));
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_btree_objectparts_chunk_1_deleted_1 on objectparts  (cost=0.12..8.15 rows=1 width=32) (actual time=0.012..0.013 rows=0 loops=1)
   Index Cond: ((data ->> 'chunk'::text) = '644c59f6549898000d99a03f'::text)
 Planning Time: 0.336 ms
 Execution Time: 0.059 ms
(4 rows)

PREVIOUSLY

nbcore=# EXPLAIN ANALYZE SELECT jsonb_build_object('chunk', data->'chunk') as data FROM objectparts WHERE ((data->>'chunk'='644c59f6549898000d99a03f' and data ? 'chunk') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb));
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_btree_objectparts_chunk on objectparts  (cost=0.56..4.40 rows=1 width=32) (actual time=0.206..2099.544 rows=696211 loops=1)
   Index Cond: ((data ->> 'chunk'::text) = '644c59f6549898000d99a03f'::text)
   Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
   Rows Removed by Filter: 52422
 Planning Time: 0.287 ms
 Execution Time: 2150.945 ms
(6 rows)

Issues: Fixed #xxx / Gap #xxx

Please see this issue for more details: https://bugzilla.redhat.com/show_bug.cgi?id=2225008

Doc added/updated
Tests added

dannyzaken · 2023-11-01T03:07:54Z

Hi @tangledbytes, sorry for the delay :)

Can you specify which md_stores functions should be affected by this? I see many functions where we query chunks using the $in operator. How are these queries perform?

tangledbytes · 2023-11-02T09:13:20Z

TBH, I don't remember at all why I even created this PR :).

I will get back to this as soon as I can.

tangledbytes · 2024-01-16T12:22:01Z

@dannyzaken I went back to look at the work I did here and tested it again (against the rebased version). The reason I added this index was to improve the performance of the queries where we search of object parts where DELETED = NULL is set. As you noted, these queries are pretty common however PG would use this index only in the queries where there is no other attribute being used and has a smaller index attached to it.

I also compared the write performance against NooBaa with and without changes and did not notice and substantial difference in the performance (it should be noted that I am just testing against a DB with ~ 5K object parts and roughly 20G of data uploaded).

cc: @guymguym

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

guymguym · 2024-01-29T07:47:28Z

This case seems to be the same data uploaded many times, causing many object parts to reference the same chunk. When many objects like these are deleted by the user, we are left with many deleted parts referencing the same chunk, and then all the queries that try to filter only non-deleted parts (in read/delete object flows) will suffer from having to filter a lot of those parts (in md_store these are find_parts_by_chunk_ids(), find_parts_unreferenced_chunk_ids(), find_parts_chunks_references(), load_parts_objects_for_chunks()).

Instead of adding a new index for this, we could modify the existing index and add the partial filter, but, we still have one last query that we use without the deleted filter in db_cleaner - has_any_parts_for_chunk().

The question is if we can change the db_cleaner to allow removal of the chunk even if delete parts still exist - which I think is not trivial, but also maintaining an additional index on our largest table is non trivial.

@tangledbytes @dannyzaken @jackyalbo - what do you think about changing how db_cleaner check works to only consider non deleted parts here -

https://github.com/tangledbytes/noobaa-core/blob/a6c16bbbf0ba98859c21427db170a4aae140b139/src/server/bg_services/db_cleaner.js#L71-L73

pull-request-size bot added the size/S label Sep 8, 2023

tangledbytes requested a review from dannyzaken September 11, 2023 03:34

nimrod-becker requested a review from guymguym January 16, 2024 11:36

tangledbytes force-pushed the utkarsh/fix/delete-performance branch from 170cd1a to 8b79eb1 Compare January 16, 2024 12:11

tangledbytes force-pushed the utkarsh/fix/delete-performance branch from 8b79eb1 to 547343a Compare January 18, 2024 10:01

dannyzaken approved these changes Jan 28, 2024

View reviewed changes

improve delete performance

2235975

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

tangledbytes force-pushed the utkarsh/fix/delete-performance branch from 547343a to 2235975 Compare January 29, 2024 05:37

tangledbytes merged commit 074052f into noobaa:master Jan 29, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve chunk delete performance #7484

Improve chunk delete performance #7484

tangledbytes commented Sep 8, 2023

dannyzaken commented Nov 1, 2023

tangledbytes commented Nov 2, 2023

tangledbytes commented Jan 16, 2024

guymguym commented Jan 29, 2024

Improve chunk delete performance #7484

Improve chunk delete performance #7484

Conversation

tangledbytes commented Sep 8, 2023

Explain the changes

Issues: Fixed #xxx / Gap #xxx

dannyzaken commented Nov 1, 2023

tangledbytes commented Nov 2, 2023

tangledbytes commented Jan 16, 2024

guymguym commented Jan 29, 2024