Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get product stats using the Merchant Center report query #2229

Merged

Conversation

jorgemd24
Copy link
Contributor

@jorgemd24 jorgemd24 commented Jan 30, 2024

Changes proposed in this Pull Request:

Part of #2146

This PR explores a new method to gather product stats independently of the MC API ID. The past two approaches are:

  1. The first approach (GLA v1.0.0) involved using product status to list all the products in the merchant center. However, this required fetching all the products in the MC, with a maximum of 250 results per request. For larger customers, this meant making multiple requests and fetching products that might not even be synced through the GLA.

  2. The current approach involves using the MC API IDs created when submitting the products, stored in the metadata with the key _wc_gla_google_ids. Instead of fetching all the products, this approach fetches the products using the MC API ID, making it more efficient than the first approach. However, it may return multiple 404 errors if the product has been deleted in the MC and the GLA extension is not aware of it.

  3. The new approach: As we're migrating to the API approach, we can't continue using the _wc_gla_google_ids metadata since we won't be submitting the product. Instead, Google will create the product for us, so we won't know the MC API ID. However, we do know the offerID that the product will have, which by default (or at least as it is now) is the WC product ID. We'll use the offerIDs from the "syncable" products to query the Merchant Center Reports, which allows us to retrieve a lot more results (the default page size is 1k) in one request and be independent of the MC API ID.

Detailed test instructions:

It is a bit difficult to test the results as you will need products in MC with different statuses. I have been able to test the "Pending", "Not Synced", and "Disapproved" and for the "Expiring" I changed the code to simulate an expiring product.

  1. Go to the Product Feed - Overview
  2. Check the stats and compare the numbers with the "develop" branch.

Additional details:

  • The plan is to remove the gla_ prefix for the offerID for newly synced products to the MC.
  • If the WC product has _wc_gla_google_ids, Google will need to keep the ID and the offerID saved in the metadata (confirmation on how this will work is still needed).
  • I've introduced the new static function get_offer_id as a replacement for get_google_product_offer_id. Since we're not using the slug, we could simply remove it from the slug argument in get_google_product_offer_id, but I'm unsure if this change might affect other plugins using this function for other purposes. Maybe we could mark it as deprecated.
  • In the next PR, I'll include tests for this approach.
  • It is still pending to test the query with a thousand products to see the performance.

More context:

Changelog entry

@jorgemd24
Copy link
Contributor Author

jorgemd24 commented Feb 6, 2024

Hi @mikkamp thanks for your comments.

I made some adjustments to the PR. I realized that we can still use most of the existing functions to calculate and update the metadata for the products. Now what we are doing is converting the aggregated status to the MC status, so we can reuse the functions that were already used.

I've moved the Merchant Report Query request to the AS job. This way, it can run in the background if the transient is not set. We might want to discuss what's more convenient: scheduling the job when we make a request to mc/product-statistics and the transient is not set, or leaving an endpoint for the client to schedule the job when needed similar to what's done in SyncableProductsCountController.

Additionally, I think that while the job is running, the client could display something like "Loading Stats" or a similar message. Currently, it just shows N/A if the transient is not set.

I also bumped up my PHP memory limit to 128MB and set the limit to 1000 in the report query. It seemed to work well with more than 2k products, but I'm interested to hear about your results.

I have not added/updated the MCStatus tests as I would like to define the direction first.

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring, I'm able to run it now without reaching out of memory issues.

I left some comments as I went through the code, but I think it's important to consider batching the call per page (1000 rows at a time). For now I think it's ok to stick with it using the WP_Post objects and bulk meta updates to prevent it from loading the full WC_Product objects (which consume more memory). But I personally think it's a bit of a maintenance burden to not work with WC_Product objects/CRUD classes. If anyone (or WC) decides to work with a different storage system for products our code will fail and we need to adjust to remain compatible. We knew that when we implemented this code, but we opted for speed over compatibility since we needed it to complete in one request.

Since we are moving it to async I think we'd be missing an opportunity if we don't take full advantage of batching the async process. Which means in the future we can:

  • Swap out the code to work with CRUD classes
  • Use a filter to control the batch size of a 1000 (a simple snippet can fix a site that is reaching it's memory limits)

The filter for the batch size is actually not a bad idea to add now already.

Comment on lines 87 to 94
do {
$response = $this->merchant_report->get_product_view_report( $next_page_token );

$this->merchant_statuses->process_product_report( $response->getResults() );

$next_page_token = $response->getNextPageToken();

} while ( $next_page_token );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this loop is fully taking advantage of it being an async process. Yes we moved the job to a background process, but this loop means it will still process all the results in a single scheduled task. So it's not distributing the resource load.

Since after the refactor we revert back to loading WP_Post objects and updating meta in bulk we don't run into resource issues, but in theory even those resource limits can be reached for extreme sites.

What do you think about removing this loop and instead calling $this->schedule with the next_page_token as an argument (and keep doing so until there are no more next_page_tokens? We might need to refactor $this->merchant_statuses->update_product_stats to support a partial update and then when there are no more next_page_tokens left we set some status that everything is completed and calculate the final totals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have the same feeling and I tried that approach too. The reason I hesitated to proceed was that when there are too many pending jobs, completing the full job would take even longer. It's not just one job that needs to be completed, but multiple. I'm open to exploring this in a different PR. Also, as we discussed, we could check the AS queue health to determine the best approach here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good plan. Can we log all those details in a followup issue, so we can keep track of all things we want to improve eventually?

src/MerchantCenter/MerchantStatuses.php Show resolved Hide resolved
src/MerchantCenter/MerchantStatuses.php Outdated Show resolved Hide resolved
src/MerchantCenter/ShoppingContentDateTrait.php Outdated Show resolved Hide resolved
src/MerchantCenter/ShoppingContentDateTrait.php Outdated Show resolved Hide resolved
src/MerchantCenter/MerchantStatuses.php Outdated Show resolved Hide resolved
src/API/Google/MerchantReport.php Outdated Show resolved Hide resolved
@jorgemd24
Copy link
Contributor Author

Thanks @mikkamp for your comments!

For now I think it's ok to stick with it using the WP_Post objects and bulk meta updates to prevent it from loading the full WC_Product objects (which consume more memory). But I personally think it's a bit of a maintenance burden to not work with WC_Product objects/CRUD classes. If anyone (or WC) decides to work with a different storage system for products our code will fail and we need to adjust to remain compatible.

Yes, that makes sense. Now that we have a clearer strategy for fetching the product stats, we can start looking into those improvements.

The filter for the batch size is actually not a bad idea to add now already.

I added a filter here: 1cf0178

Also, I addressed your suggestions and replied to your questions! Do you mind to have another look?

Thanks!

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jorgemd24 for all the changes. It looks pretty much good to go, the only part I think we should still settle on is how the unsynced count is handled. The rest of the comments I left are just minor things I noticed.

<?php
declare( strict_types=1 );

namespace Automattic\WooCommerce\GoogleListingsAndAds\MerchantCenter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this trait as well and make it part of Automattic\WooCommerce\GoogleListingsAndAds\API\Google, since the classes in MerchantCenter only have to deal with a DateTime object.

Comment on lines 122 to 124
* @param string $status The status of the product.
*
* @return array The MC status.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just make a small clarification that this is "The aggregated status of the product".
Also I think the return status was intended to be a string here.

* Get ProductView Query response.
*
* @param string|null $next_page_token The next page token.
* @return array Array of products along with their statuses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the best description is here, but it's also including the next_page token.

Comment on lines 87 to 89
$rows = $response->getResults();

foreach ( $rows as $row ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but we might as well shorten this to one line since $rows isn't referenced anywhere else.

return $product_view_data;
} catch ( GoogleException $e ) {
do_action( 'woocommerce_gla_mc_client_exception', $e, __METHOD__ );
throw new Exception( __( 'Unable to retrieve Product View Report.', 'google-listings-and-ads' ), $e->getCode() );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, since this is going to be recorded as a failure in a Scheduled Action scheduler job, do you think it will be useful to include both $e->getMessage and $e->getCode? Otherwise they have to look up the matching timestamp to see more details in the log.

Suggested change
throw new Exception( __( 'Unable to retrieve Product View Report.', 'google-listings-and-ads' ), $e->getCode() );
throw new Exception( __( 'Unable to retrieve Product View Report: ', 'google-listings-and-ads' ) . $e->getMessage(), $e->getCode() );

Or format the error with sprintf if we want to allow for greater translation/RTL support.

Comment on lines 646 to 647
$this->update_product_mc_statuses();
$this->update_mc_statuses();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of these two functions makes me have to look up what each one does. Since they are just internal functions would a more descriptive name help here?

Suggested change
$this->update_product_mc_statuses();
$this->update_mc_statuses();
$this->update_products_meta_with_mc_status();
$this->update_mc_status_statistics();

'value' => SyncStatus::SYNCED,
'key' => ProductMetaHandler::KEY_MC_STATUS,
'compare' => '=',
'value' => MCStatus::NOT_SYNCED,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we are getting the right amount of "unsynced" products here.

Firstly it seems that KEY_SYNC_STATUS is no longer used after this PR, should we remove it?

Secondly based on PR #1728 it seems we are expecting to fetch "syncable but unsynced" products. In update_product_mc_statuses we set the MC status to NOT_SYNCED for all products that aren't found in MC. But we only have the meta values for all products that previously had a MC status meta key.

What happens if we have existing products, never synced, and then we install GLA. At what point do they get the initial MC status set?

Previously that was handled by triggering the job UpdateAllProducts which would walk through all the "syncable" products and then mark them as "pending" for their initial status. But if we don't trigger that job then it seems like we'd get 0 for unsynced products after initial install as they don't get a MC status set until something in a product is updated and we trigger the syncer hooks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah looking closer at the details I see update_product_mc_statuses works with both a combination of:

  • $all_product_ids all products
  • $current_product_statuses all products which have a MC status previously set

So that means the MC status will be set for all products, but the meta ProductMetaHandler::KEY_VISIBILITY isn't going to be set for those products. Getting syncable products is expensive that's why we have a background job to get the count. Do we need to combine that somehow instead of querying here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmm yes, when I checked the code I saw that MCStatus::NOT_SYNCED was set when the product was not found in the MC but I didn't realize about ProductMetaHandler::KEY_VISIBILITY.

Getting syncable products is expensive that's why we have a background job to get the count. Do we need to combine that somehow instead of querying here?

Do you mean fetching the count of syncable products from OptionsInterface::SYNCABLE_PRODUCTS_COUNT and then performing a simple calculation like not_synced = OptionsInterface::SYNCABLE_PRODUCTS_COUNT -synced_products? However, I think this would need updating the OptionsInterface::SYNCABLE_PRODUCTS_COUNT more frequently, as it's currently set only during user onboarding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly it seems that KEY_SYNC_STATUS is no longer used after this PR, should we remove it?

I see that the key is used here:

$exclude_meta[] = $this->prefix_meta_key( ProductMetaHandler::KEY_SYNC_STATUS );

So if we plan to release this earlier, before the pulling method I guess it could affect when duplicating a product.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we plan to release this earlier, before the pulling method I guess it could affect when duplicating a product.

Makes sense, I guess we should leave it to remain backwards compatible. Probably good to leave a note somewhere why we are keeping it around.

Copy link
Contributor

@mikkamp mikkamp Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean fetching the count of syncable products from OptionsInterface::SYNCABLE_PRODUCTS_COUNT and then performing a simple calculation like not_synced = OptionsInterface::SYNCABLE_PRODUCTS_COUNT -synced_products? However, I think this would need updating the OptionsInterface::SYNCABLE_PRODUCTS_COUNT more frequently, as it's currently set only during user onboarding.

That's what I was thinking only on the first round, and then hoping that eventually we have meta updated so we can do a better calculation quicker. But I actually don't have a good solution there as it seems messy.

I'm currently wondering though, why do we make it difficult on ourselves to have this show "syncable but unsynced". Does the merchant even know that we are counting it like this? What do you think the effect would be if we would put just all unsynced products in there? I don't think syncable is a very clear state to the merchant as technically a product they would normally sync but is currently out of stock is temporarily classified as unsyncable until it comes back in stock. If we have it show just all unsynced products it would be much easier to explain what the count is there. At most we might need to clarify in the docs. We now have this in the docs:

Not synced. These products do not appear across Google or the Google Shopping tab as free listings. They are queued for submission or may be ineligible or excluded from the product feed.

That actually sounds like it's describing "all unsynced" anyways, so might as well match that.

So in simple terms that means querying for MCStatus::NOT_SYNCED and removing the ProductMetaHandler::KEY_VISIBILITY condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm currently wondering though, why do we make it difficult on ourselves to have this show "syncable but unsynced". Does the merchant even know that we are counting it like this? What do you think the effect would be if we would put just all unsynced products in there?

In my opinion, from the merchant's perspective and without checking the documentation, I would say that "Not synced" means all products that are not in the Merchant Center regardless of their visibility.

In the follow-up issue that I am creating, I will add the item to update the documentation.

@mikkamp
Copy link
Contributor

mikkamp commented Feb 8, 2024

Also forgot to mention, but as we discussed in slack, should we have this PR target a separate feature branch? Or is it easier to have it all in one and pull out the changes we want to release later?

@jorgemd24
Copy link
Contributor Author

Also forgot to mention, but as we discussed in slack, should we have this PR target a separate feature branch? Or is it easier to have it all in one and pull out the changes we want to release later?

I've been thinking about this because the current branch already contains the changes made in notifications, etc., since I merged feature/google-api-project ( fb619db) into it. So, I'm considering either reverting that commit and targeting the branch to a new one based off "develop," or creating a fresh branch and cherry-picking these commits. Future PRs related to this section could then reference the new branch. WDYT?

@mikkamp
Copy link
Contributor

mikkamp commented Feb 8, 2024

So, I'm considering either reverting that commit and targeting the branch to a new one based off "develop," or creating a fresh branch and cherry-picking these commits. Future PRs related to this section could then reference the new branch. WDYT?

Either one would work for me. Reverting the merge commit would keep all the history of this PR around, but either way that would still be retained in the PR comments. Could also keep it simple and close this PR and replace it with a fresh one.

@jorgemd24
Copy link
Contributor Author

@mikkamp, thanks for your suggestions and discussions. I added your suggestions and per our #2229 (comment), I removed the ProductMetaHandler::KEY_VISIBILITY from the non-syncable query.

As I mentioned in the other comment, I will create a follow-up issue that includes all the tasks we discussed during this PR.

Thanks!

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the refactoring it looks good to go from my side. I just left one question about the offer_id vs. id field, but it still works regardless.

I'll let you figure out from here which PR is going to handle the tests and how you are going to merge it to a branch.

$wc_product_id = $this->product_helper->get_wc_product_id( $product_view->getId() );
$mc_product_status = $this->convert_aggregated_status_to_mc_status( $product_view->getAggregatedDestinationStatus() );

// Skip if the product id does not exist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but technically we've only tried extracting the product ID from the string, we haven't actually checked if it exists.

This makes me wonder though, we are extracting from $product_view->getId(), why is both the offer_id and id included in the query MerchantProductViewReportQuery then? Not a big deal to leave it there, but was just curious. Seems like getOfferId() would return a much shorter string.

…ew-report-query"

This reverts commit fb619db, reversing
changes made to 50e108c.
@jorgemd24 jorgemd24 changed the base branch from feature/google-api-project to feature/refactor-product-stats February 8, 2024 18:13
@jorgemd24 jorgemd24 merged commit a29a04e into feature/refactor-product-stats Feb 8, 2024
@jorgemd24 jorgemd24 deleted the add/product-view-report-query branch February 8, 2024 18:20
@eason9487 eason9487 mentioned this pull request Mar 26, 2024
21 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: add A new feature, function, or functionality was added. type: enhancement The issue is a request for an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants