Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further optimise CategoryService #6931

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

rickardvh
Copy link
Contributor

CategoryService could benefit further from the cached lookup by parent category ID. By saving the full categories instead of just IDs, we can get rid of many unnecessary database queries, especially when displaying search results. Furthermore, the cache getter in GetCategoryBreadCrumbAsync could be sped up significantly by using a HashSet instead of a List for keeping track of processed categories.

Best regards,

Rickard von Haugwitz
Majako majako.se

@skoshelev
Copy link
Contributor

Hi @rickardvh.
I like the changes you suggested, I don’t see how they reduce the number of queries to the database, but even without this, these changes are very good.
But now, looking at the code you simplified, I think how much we need to cache the same data set twice, only in different collections. I'm talking about the NopCatalogDefaults.CategoriesAllCacheKey and NopCatalogDefaults.ChildCategoryLookupCacheKey keys.
I'm not sure which operation will be more expensive than ToGroupedDictionary or SelectMany, but it seems to me that we need to leave only one set of data in the cache, the only question is which one.

@rickardvh
Copy link
Contributor Author

Hi @rickardvh. I like the changes you suggested, I don’t see how they reduce the number of queries to the database, but even without this, these changes are very good. But now, looking at the code you simplified, I think how much we need to cache the same data set twice, only in different collections. I'm talking about the NopCatalogDefaults.CategoriesAllCacheKey and NopCatalogDefaults.ChildCategoryLookupCacheKey keys. I'm not sure which operation will be more expensive than ToGroupedDictionary or SelectMany, but it seems to me that we need to leave only one set of data in the cache, the only question is which one.

@skoshelev I'll have to get back to you next week when I have more time to look at this, but about the two similar cached collections, I had the same thought. We could always SelectMany over the GroupedDictionary's values, that's very fast. Constructing the GroupedDictionary is fast, but involves populating lists and some minor sorting, so it's better to do that only once.

As for why this would reduce db calls, if allCategories was not given in GetCategoryBreadCrumbAsync, the loop would go fetch each parent from the cache/database. Since we already had all categories cached, it would be easier to just build the dictionary we needed using those. Speaking of that, though, is there any particular reason why allCategories should be supplied by the caller? Does it need to be different from the cached "all categories", or is it a leftover from earlier days when it wasn't cached and the caller might already have all categories in memory?

@skoshelev
Copy link
Contributor

Hi @rickardvh.
I also looked at the old implementation, and other than as a relic of the EF, I cannot justify this; most likely it made sense before, but not now.

As for double caching of the same data in different collections, here we probably need to analyze in what form this data is requested more often, and based on this, leave only one key in the database, and retrieve the second collection from it in memory.

@rickardvh
Copy link
Contributor Author

@skoshelev could you please enlighten me as to where and why the category list needs to be sorted "for tree representation"? Since we are already storing the categories as a lookup by parent ID (i.e. almost explicitly as a tree), and with these changes here with children presorted, that list could be very cheaply generated on the fly from the lookup. Right now we're creating a new lookup and passing it into SortCategoriesForTree, but I don't think the time we save on filtering in the database is usually worth it. Since the category tree is a tree, we can prune entire subtrees if a parent node should be excluded from the tree, so we don't need to check the filter conditions for all nodes like we do in the db query. Unless, of course, we want to include all categories, or the tree is flat. Either way, filtering is an $O(n)$ operation, so it's not worse than the .ToList we already have in GetAllCategoriesAsync.

@skoshelev
Copy link
Contributor

Hi @rickardvh. I apologize for the long wait for an answer. I've looked and tried to analyze the GetAllCategories calls and I don't see any need for the SortCategoriesForTree method anymore. It would also be nice to find all the places like this https://github.com/nopSolutions/nopCommerce/blob/develop/src/Presentation/Nop.Web/Factories/CatalogModelFactory.cs#L832 and replace them with direct calls to the new method

@rickardvh
Copy link
Contributor Author

Hi @rickardvh. I apologize for the long wait for an answer. I've looked and tried to analyze the GetAllCategories calls and I don't see any need for the SortCategoriesForTree method anymore. It would also be nice to find all the places like this https://github.com/nopSolutions/nopCommerce/blob/develop/src/Presentation/Nop.Web/Factories/CatalogModelFactory.cs#L832 and replace them with direct calls to the new method

No worries, I've been knocked out for a whole week by a bad flu, myself. Thank you, I had been looking at that line and others like it, and actually replaced them like you suggested in our own repo already the other week. I think I can migrate those changes to this branch one of the next few days.

Overall, the model factories (CatalogModelFactory and, in particular, ProductModelFactory) are huge (or is that tiny?) bottlenecks, but I am at a bit of a loss as to what to do about them. Some kind of clever caching is needed, I suppose, but I'm not sure exactly what or where for maximum effect while not using up all memory. If you have any suggestions, I'd be happy to have a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants