Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MC Product Statuses: Replace WP functions with WC functions #2257

Conversation

jorgemd24
Copy link
Contributor

@jorgemd24 jorgemd24 commented Feb 12, 2024

Changes proposed in this Pull Request:

Part of #2146 & #2250

In #2255, we've divided the background job for fetching product statuses into multiple tasks to distribute resource usage. Until now, we've been relying on WP_Post and bulk updates via raw queries to handle updating numerous products in a single request. While it sped things up, it could became less reliable as WC might change how product data is stored. For better future-proofing, it's more convenient to use WooCommerce objects to retrieve and update their data.

This PR replaces the bulk queries and the use of get_post with wc_get_products. I've also updated metadata using methods available in their class. The metadata is saved using the save_meta_data method, so this way, we bypass calling the woocommerce_update_product hook and avoid triggering the syncer hooks.

At first, I experimented with batches of 1,000 products. However, in some instances, it was hitting the maximum execution time of 30 seconds. So, I opted to decrease it to 500 products. I did these tests with a memory limit of 128MB.

Detailed test instructions:

  1. Create a bunch of products ( I did the test using more than 3k products)
  2. Reduce the PHP memory limit to 128MB
  3. Delete any existing mc_statuses so you can check that the mc_status is added. You can run a query like this one:
    DELETE FROM wp_postmeta WHERE meta_key = '_wc_gla_mc_status';
  4. Watch your server error log.
  5. Call this endpoint GET /gla/mc/product-statistics/refresh
  6. Go to WC -> Status -> Scheduled actions and see that update_merchant_product_statuses runs.
  7. When all the jobs are completed, you should see the total results in Product View - Overview section.
  8. Check that the product's metadata _wc_gla_mc_status is set.
  9. Check that the syncer hooks have not been called. (No pending jobs trying to sync the product)

Additional details:

  • The UI will be updated in a different PR to indicate that the stats are loading while the jobs are running.
  • With these changes, the methods ProductMetaQueryHelper::batch_update_values and ProductMetaQueryHelper::batch_insert_values are not longer used in our codebase, so they could be deleted however they are public, so not sure if it is better just to deprecate them for now.

Changelog entry

@jorgemd24 jorgemd24 self-assigned this Feb 12, 2024
@github-actions github-actions bot added changelog: add A new feature, function, or functionality was added. type: enhancement The issue is a request for an enhancement. labels Feb 12, 2024
@jorgemd24 jorgemd24 changed the title Add/replace wp posts functions with wc objects MC Product Statuses: Replace WP functions with WC functions Feb 12, 2024
Copy link

codecov bot commented Feb 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (2e26242) 59.0% compared to head (9f3556a) 54.0%.
Report is 5 commits behind head on feature/refactor-product-stats.

Additional details and impacted files

Impacted file tree graph

@@                         Coverage Diff                         @@
##             feature/refactor-product-stats   #2257      +/-   ##
===================================================================
- Coverage                              59.0%   54.0%    -5.0%     
===================================================================
  Files                                   457     293     -164     
  Lines                                 17702    3715   -13987     
  Branches                                  0     532     +532     
===================================================================
- Hits                                  10443    2006    -8437     
+ Misses                                 7259    1258    -6001     
- Partials                                  0     451     +451     
Flag Coverage Δ
js-unit-tests 54.0% <ø> (?)
php-unit-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

see 750 files with indirect coverage changes

@mikkamp
Copy link
Contributor

mikkamp commented Feb 13, 2024

I understand this is still a WIP, but just wanted to ask the question early. One of the side effects of switching to CRUD classes is that update hooks will be called if we set/update product meta. Are we going to put something in place to short circuit the syncer hooks (eventually will be notification hooks), to prevent them from being triggered?

Base automatically changed from add/refactor-merchant-product-statuses-job to feature/refactor-product-stats February 15, 2024 09:02
@jorgemd24 jorgemd24 requested a review from a team February 16, 2024 11:37
@jorgemd24 jorgemd24 marked this pull request as ready for review February 16, 2024 11:37
@jorgemd24
Copy link
Contributor Author

I understand this is still a WIP, but just wanted to ask the question early. One of the side effects of switching to CRUD classes is that update hooks will be called if we set/update product meta. Are we going to put something in place to short circuit the syncer hooks (eventually will be notification hooks), to prevent them from being triggered?

Thanks @mikkamp for the heads up! I was just in the middle of implementing something to short-circuit, but then I realized that when you use save_meta_data, it doesn't trigger the woocommerce_update_product hook. So, the syncer hooks aren't triggered either.

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the changes here, it feels cleaner to be working with CRUD classes. I left some comments as we go through the code, I think we need to do better on how we handle the set of products we are working with to make them accessible.
I also left some comments below, but these aren't fully related to this PR so might be better handled in a followup PR.

Not directly related to this PR but I was getting the following errors in the Action Scheduler: action failed via Async Request: Call to a member function getId() on null
Cause was proxied requests returning invalid data, but my question is how is the API request supposed to handle the failure scenario? In my case it stayed with loading set to true, but had actually completed with a failure.

Again probably not related to this PR, but I was wondering about the function update_intermediate_product_statistics and how it handles variable products. For example if batch 1 has 1 variation (APPROVED) and batch 2 has 1 variation (DISSAPPROVED) for the same variable product. Both statuses will get a +1 count for the same variable product.

@@ -97,7 +97,7 @@ public function get_product_view_report( $next_page_token = null ): array {
continue;
}

$product_view_data['statuses'][] = [
$product_view_data['statuses'][ (int) $wc_product_id ] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the typecast here necessary? As far as I can see the function get_wc_product_id is required to return an int. And if it is necessary then my next question would be why we aren't typecasting it for all usages of this variable.

Copy link
Contributor Author

@jorgemd24 jorgemd24 Feb 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's correct, there is not need for typecasting. Removed here: 7b9944a


ksort( $new_product_statuses );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can see we only update product meta with this. Why does that need to be done in a particular order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line was left from the previous code:

ksort( $new_product_statuses );

So I removed it here: 21e97e2

Comment on lines 671 to 673
'name' => $wc_product->get_name(),
'visibility' => $wc_product->get_meta( $visibility_meta_key ),
'parent_id' => $wc_product->get_parent_id(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unclear how we are expecting to use this product_data_lookup. The name and visibility are used when we handle product issues. However since we populated the statuses in batches this information isn't going to be pre-loaded when we loop through the product issues.

Would it be more useful to save an instance of the product objects in the class:

		$this->process_product_statuses( $statuses );
		$this->update_products_meta_with_mc_status();
		$this->update_intermediate_product_statistics();

So when we call these three functions they all work with the same product set? I can see the update meta function reloads them from cache but that's not faster then just keeping a previous copy around. Or if we don't want to save a copy in the class, can we have process_product_statuses call the other two functions internally so we don't need to call the three in a row? That way it can also pass the set of product objects by reference so we don't need to store it in the class instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same feeling, the main reason I kept the $product_data_lookup as it was is because I initially intended to populate the product issues using the ProductView Report query. My plan was to reuse most of the code. However, after further investigation of the product issues, I realized that the ProductView doesn't return the details from the description or the documentation. So, I don't believe I'll be able to use the same query for that purpose.

'issue' => $item_level_issue->getDescription(),
'action' => $item_level_issue->getDetail(),
'action_url' => $item_level_issue->getDocumentation(),

One idea to retrieve the product issues could be to store the Google IDs when we retrieve the product status. Then, the product issues section could continue to function similarly to how it's currently implemented.

Or if we don't want to save a copy in the class, can we have process_product_statuses call the other two functions internally so we don't need to call the three in a row? That way it can also pass the set of product objects by reference so we don't need to store it in the class instance.

Thanks for the suggestion! I've updated the code so that now those functions accept an array of products. While I was making these updates, I noticed that parent_id wasn't being fetched previously. So, I've adjusted the code to fetch all the parent IDs at once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea to retrieve the product issues could be to store the Google IDs when we retrieve the product status. Then, the product issues section could continue to function similarly to how it's currently implemented.

Even though we can't use the ProductView Report query, we can still use the productstatuses/list endpoint, which supports pagination. That will allow us to pull the products in batches without needing to know a list of product/Google ID's up front. We can just act on whatever products the MC presents us with, and if we can map to a WC ID then we store the list of issues.

Since product issues are stored in the DB locally (until they expire), and we have the product statistics job run in batches of products. We can also consider doing something like this for each batch of UpdateMerchantProductStatuses:

  • Query ProductView report to get a list of products
  • Map products to WC ID's
  • Add up intermediate statuses
  • Fetch product statuses for the same batch of products using productstatuses/custombatch
  • Store issues in local DB

We might still need something to pull in account issues / add local issues to the table. And possibly decrease batch sizes if it takes more resources to pull in statuses and issues for that batch.

I think something like that will take the most advantage of running a background process, otherwise we'd have to create two to loop through the same batch of MC products. Let me know what you think about that approach.

@jorgemd24
Copy link
Contributor Author

Thanks @mikkamp for your comments and suggestions.

Not directly related to this PR but I was getting the following errors in the Action Scheduler: action failed via Async Request: Call to a member function getId() on null
Cause was proxied requests returning invalid data,

Sorry, I've updated the proxy to avoid those errors. e15c368 .

but my question is how is the API request supposed to handle the failure scenario? In my case it stayed with loading set to true, but had actually completed with a failure.

One option could be to store the error in a transient let's say gla_mc_statuses_errors and then return this to the client. This way, the client can try to refresh the statuses, either automatically or by adding a small icon for manual retrying.

Again probably not related to this PR, but I was wondering about the function update_intermediate_product_statistics and how it handles variable products. For example if batch 1 has 1 variation (APPROVED) and batch 2 has 1 variation (DISSAPPROVED) for the same variable product. Both statuses will get a +1 count for the same variable product.

That's a tricky situation, but I think we can resolve it by storing the parent statuses in the intermediate transient. Then, we can compare if the new status has a higher priority than the one saved in the previous batch. WDYT?

@mikkamp, when you have a chance, can you do another round? Thanks!

@mikkamp
Copy link
Contributor

mikkamp commented Feb 20, 2024

One option could be to store the error in a transient let's say gla_mc_statuses_errors and then return this to the client. This way, the client can try to refresh the statuses, either automatically or by adding a small icon for manual retrying.

Either a new transient, or storing the error in the transient we are already saving. So next time it can detect that the transient contains an error and can return a correct response. For UI I think it makes sense to present the user with the error message and a small refresh icon.

That's a tricky situation, but I think we can resolve it by storing the parent statuses in the intermediate transient. Then, we can compare if the new status has a higher priority than the one saved in the previous batch. WDYT?

I agree, I was thinking about something similar, and just looping through the parent statuses at the end.

@@ -617,32 +617,28 @@ protected function refresh_presync_product_issues(): void {
* @see MerchantReport::get_product_view_report
*/
public function process_product_statuses( array $statuses ): void {
$product_repository = $this->container->get( ProductRepository::class );
$products = $product_repository->find_by_ids_as_associative_array( array_column( $statuses, 'product_id' ) );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the mapping part is moved to the ProductRepository. I was wondering why we didn't already have that and it didn't seem correctly placed handling it in the MerchantStatuses class.

*
* @return WC_Product[] Array of WooCommerce product objects
*/
public function find_by_ids_as_associative_array( array $ids, array $args = [], int $limit = -1, int $offset = 0 ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we force this function to have a return type of array? The function already always returns an array. I just noticed you have the code $products + $parent_products so I wanted to confirm that there is no way that those variables are not arrays.

if ( ! $product ) {
do_action(
'woocommerce_gla_debug_message',
sprintf( 'Merchant Center product %s not found in this WooCommerce store.', $product_id ),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error message seems to implicate that we are outputting the MC ID, but at this time we have already mapped it to a WC ID (potentially with a filter and custom mapping). Should we reword it to something like:

Suggested change
sprintf( 'Merchant Center product %s not found in this WooCommerce store.', $product_id ),
sprintf( 'Merchant Center product with WooCommerce ID %d is not found in this store.', $product_id ),

Comment on lines 671 to 673
'name' => $wc_product->get_name(),
'visibility' => $wc_product->get_meta( $visibility_meta_key ),
'parent_id' => $wc_product->get_parent_id(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea to retrieve the product issues could be to store the Google IDs when we retrieve the product status. Then, the product issues section could continue to function similarly to how it's currently implemented.

Even though we can't use the ProductView Report query, we can still use the productstatuses/list endpoint, which supports pagination. That will allow us to pull the products in batches without needing to know a list of product/Google ID's up front. We can just act on whatever products the MC presents us with, and if we can map to a WC ID then we store the list of issues.

Since product issues are stored in the DB locally (until they expire), and we have the product statistics job run in batches of products. We can also consider doing something like this for each batch of UpdateMerchantProductStatuses:

  • Query ProductView report to get a list of products
  • Map products to WC ID's
  • Add up intermediate statuses
  • Fetch product statuses for the same batch of products using productstatuses/custombatch
  • Store issues in local DB

We might still need something to pull in account issues / add local issues to the table. And possibly decrease batch sizes if it takes more resources to pull in statuses and issues for that batch.

I think something like that will take the most advantage of running a background process, otherwise we'd have to create two to loop through the same batch of MC products. Let me know what you think about that approach.

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, it's working as expected. I left a few minor comments, but it seems to have submitted them early. Since those are minor and the rest is reliant on follow up PR's I'll go ahead and approve this PR.

@jorgemd24 jorgemd24 merged commit 3e9c708 into feature/refactor-product-stats Feb 20, 2024
14 checks passed
@jorgemd24 jorgemd24 deleted the add/replace-wp-posts-functions-with-wc-objects branch February 20, 2024 15:39
@eason9487 eason9487 mentioned this pull request Mar 26, 2024
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: add A new feature, function, or functionality was added. type: enhancement The issue is a request for an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants