Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve chunk delete performance #7484

Merged

Conversation

tangledbytes
Copy link
Member

Explain the changes

This PR adds yet another index to objectparts relation in the hope of increasing the performance.

I also tried do filtering on the NodeJS side but did not see much performance improvement (and I think the performance might worsen if there are too many objects).

NOW

nbcore=# EXPLAIN ANALYZE SELECT jsonb_build_object('chunk', data->'chunk') as data FROM objectparts WHERE ((data->>'chunk'='644c59f6549898000d99a03f' and data ? 'chunk') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb));
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_btree_objectparts_chunk_1_deleted_1 on objectparts  (cost=0.12..8.15 rows=1 width=32) (actual time=0.012..0.013 rows=0 loops=1)
   Index Cond: ((data ->> 'chunk'::text) = '644c59f6549898000d99a03f'::text)
 Planning Time: 0.336 ms
 Execution Time: 0.059 ms
(4 rows)

PREVIOUSLY

nbcore=# EXPLAIN ANALYZE SELECT jsonb_build_object('chunk', data->'chunk') as data FROM objectparts WHERE ((data->>'chunk'='644c59f6549898000d99a03f' and data ? 'chunk') and (data->'deleted' IS NULL OR data->'deleted' = 'null'::jsonb));
                                                                    QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_btree_objectparts_chunk on objectparts  (cost=0.56..4.40 rows=1 width=32) (actual time=0.206..2099.544 rows=696211 loops=1)
   Index Cond: ((data ->> 'chunk'::text) = '644c59f6549898000d99a03f'::text)
   Filter: (((data -> 'deleted'::text) IS NULL) OR ((data -> 'deleted'::text) = 'null'::jsonb))
   Rows Removed by Filter: 52422
 Planning Time: 0.287 ms
 Execution Time: 2150.945 ms
(6 rows)

Issues: Fixed #xxx / Gap #xxx

Please see this issue for more details: https://bugzilla.redhat.com/show_bug.cgi?id=2225008

  • Doc added/updated
  • Tests added

@dannyzaken
Copy link
Member

Hi @tangledbytes, sorry for the delay :)

Can you specify which md_stores functions should be affected by this? I see many functions where we query chunks using the $in operator. How are these queries perform?

@tangledbytes
Copy link
Member Author

TBH, I don't remember at all why I even created this PR :).

I will get back to this as soon as I can.

@tangledbytes tangledbytes force-pushed the utkarsh/fix/delete-performance branch from 170cd1a to 8b79eb1 Compare January 16, 2024 12:11
@tangledbytes
Copy link
Member Author

@dannyzaken I went back to look at the work I did here and tested it again (against the rebased version). The reason I added this index was to improve the performance of the queries where we search of object parts where DELETED = NULL is set. As you noted, these queries are pretty common however PG would use this index only in the queries where there is no other attribute being used and has a smaller index attached to it.

I also compared the write performance against NooBaa with and without changes and did not notice and substantial difference in the performance (it should be noted that I am just testing against a DB with ~ 5K object parts and roughly 20G of data uploaded).

cc: @guymguym

@tangledbytes tangledbytes force-pushed the utkarsh/fix/delete-performance branch from 8b79eb1 to 547343a Compare January 18, 2024 10:01
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
@tangledbytes tangledbytes force-pushed the utkarsh/fix/delete-performance branch from 547343a to 2235975 Compare January 29, 2024 05:37
@tangledbytes tangledbytes merged commit 074052f into noobaa:master Jan 29, 2024
10 checks passed
@guymguym
Copy link
Member

This case seems to be the same data uploaded many times, causing many object parts to reference the same chunk. When many objects like these are deleted by the user, we are left with many deleted parts referencing the same chunk, and then all the queries that try to filter only non-deleted parts (in read/delete object flows) will suffer from having to filter a lot of those parts (in md_store these are find_parts_by_chunk_ids(), find_parts_unreferenced_chunk_ids(), find_parts_chunks_references(), load_parts_objects_for_chunks()).

Instead of adding a new index for this, we could modify the existing index and add the partial filter, but, we still have one last query that we use without the deleted filter in db_cleaner - has_any_parts_for_chunk().

The question is if we can change the db_cleaner to allow removal of the chunk even if delete parts still exist - which I think is not trivial, but also maintaining an additional index on our largest table is non trivial.

@tangledbytes @dannyzaken @jackyalbo - what do you think about changing how db_cleaner check works to only consider non deleted parts here -

https://github.com/tangledbytes/noobaa-core/blob/a6c16bbbf0ba98859c21427db170a4aae140b139/src/server/bg_services/db_cleaner.js#L71-L73

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants