-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get product stats using the Merchant Center report query #2229
Get product stats using the Merchant Center report query #2229
Conversation
…ck-for-wpcom-app Add the missing functions required for the WPCOM OAuth flow.
Enable filtering of WC REST API responses based on the gla_syncable query parameter
Hi @mikkamp thanks for your comments. I made some adjustments to the PR. I realized that we can still use most of the existing functions to calculate and update the metadata for the products. Now what we are doing is converting the aggregated status to the MC status, so we can reuse the functions that were already used. I've moved the Merchant Report Query request to the AS job. This way, it can run in the background if the transient is not set. We might want to discuss what's more convenient: scheduling the job when we make a request to Additionally, I think that while the job is running, the client could display something like "Loading Stats" or a similar message. Currently, it just shows N/A if the transient is not set. I also bumped up my PHP memory limit to 128MB and set the limit to 1000 in the report query. It seemed to work well with more than 2k products, but I'm interested to hear about your results. I have not added/updated the MCStatus tests as I would like to define the direction first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactoring, I'm able to run it now without reaching out of memory issues.
I left some comments as I went through the code, but I think it's important to consider batching the call per page (1000 rows at a time). For now I think it's ok to stick with it using the WP_Post objects and bulk meta updates to prevent it from loading the full WC_Product objects (which consume more memory). But I personally think it's a bit of a maintenance burden to not work with WC_Product objects/CRUD classes. If anyone (or WC) decides to work with a different storage system for products our code will fail and we need to adjust to remain compatible. We knew that when we implemented this code, but we opted for speed over compatibility since we needed it to complete in one request.
Since we are moving it to async I think we'd be missing an opportunity if we don't take full advantage of batching the async process. Which means in the future we can:
- Swap out the code to work with CRUD classes
- Use a filter to control the batch size of a 1000 (a simple snippet can fix a site that is reaching it's memory limits)
The filter for the batch size is actually not a bad idea to add now already.
src/API/Site/Controllers/MerchantCenter/ProductStatisticsController.php
Outdated
Show resolved
Hide resolved
do { | ||
$response = $this->merchant_report->get_product_view_report( $next_page_token ); | ||
|
||
$this->merchant_statuses->process_product_report( $response->getResults() ); | ||
|
||
$next_page_token = $response->getNextPageToken(); | ||
|
||
} while ( $next_page_token ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this loop is fully taking advantage of it being an async process. Yes we moved the job to a background process, but this loop means it will still process all the results in a single scheduled task. So it's not distributing the resource load.
Since after the refactor we revert back to loading WP_Post objects and updating meta in bulk we don't run into resource issues, but in theory even those resource limits can be reached for extreme sites.
What do you think about removing this loop and instead calling $this->schedule
with the next_page_token
as an argument (and keep doing so until there are no more next_page_token
s? We might need to refactor $this->merchant_statuses->update_product_stats
to support a partial update and then when there are no more next_page_token
s left we set some status that everything is completed and calculate the final totals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have the same feeling and I tried that approach too. The reason I hesitated to proceed was that when there are too many pending jobs, completing the full job would take even longer. It's not just one job that needs to be completed, but multiple. I'm open to exploring this in a different PR. Also, as we discussed, we could check the AS queue health to determine the best approach here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good plan. Can we log all those details in a followup issue, so we can keep track of all things we want to improve eventually?
Thanks @mikkamp for your comments!
Yes, that makes sense. Now that we have a clearer strategy for fetching the product stats, we can start looking into those improvements.
I added a filter here: 1cf0178 Also, I addressed your suggestions and replied to your questions! Do you mind to have another look? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorgemd24 for all the changes. It looks pretty much good to go, the only part I think we should still settle on is how the unsynced count is handled. The rest of the comments I left are just minor things I noticed.
<?php | ||
declare( strict_types=1 ); | ||
|
||
namespace Automattic\WooCommerce\GoogleListingsAndAds\MerchantCenter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this trait as well and make it part of Automattic\WooCommerce\GoogleListingsAndAds\API\Google
, since the classes in MerchantCenter only have to deal with a DateTime object.
src/API/Google/MerchantReport.php
Outdated
* @param string $status The status of the product. | ||
* | ||
* @return array The MC status. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just make a small clarification that this is "The aggregated status of the product".
Also I think the return status was intended to be a string here.
src/API/Google/MerchantReport.php
Outdated
* Get ProductView Query response. | ||
* | ||
* @param string|null $next_page_token The next page token. | ||
* @return array Array of products along with their statuses. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the best description is here, but it's also including the next_page token.
src/API/Google/MerchantReport.php
Outdated
$rows = $response->getResults(); | ||
|
||
foreach ( $rows as $row ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but we might as well shorten this to one line since $rows isn't referenced anywhere else.
src/API/Google/MerchantReport.php
Outdated
return $product_view_data; | ||
} catch ( GoogleException $e ) { | ||
do_action( 'woocommerce_gla_mc_client_exception', $e, __METHOD__ ); | ||
throw new Exception( __( 'Unable to retrieve Product View Report.', 'google-listings-and-ads' ), $e->getCode() ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, since this is going to be recorded as a failure in a Scheduled Action scheduler job, do you think it will be useful to include both $e->getMessage
and $e->getCode
? Otherwise they have to look up the matching timestamp to see more details in the log.
throw new Exception( __( 'Unable to retrieve Product View Report.', 'google-listings-and-ads' ), $e->getCode() ); | |
throw new Exception( __( 'Unable to retrieve Product View Report: ', 'google-listings-and-ads' ) . $e->getMessage(), $e->getCode() ); |
Or format the error with sprintf if we want to allow for greater translation/RTL support.
$this->update_product_mc_statuses(); | ||
$this->update_mc_statuses(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of these two functions makes me have to look up what each one does. Since they are just internal functions would a more descriptive name help here?
$this->update_product_mc_statuses(); | |
$this->update_mc_statuses(); | |
$this->update_products_meta_with_mc_status(); | |
$this->update_mc_status_statistics(); |
'value' => SyncStatus::SYNCED, | ||
'key' => ProductMetaHandler::KEY_MC_STATUS, | ||
'compare' => '=', | ||
'value' => MCStatus::NOT_SYNCED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we are getting the right amount of "unsynced" products here.
Firstly it seems that KEY_SYNC_STATUS
is no longer used after this PR, should we remove it?
Secondly based on PR #1728 it seems we are expecting to fetch "syncable but unsynced" products. In update_product_mc_statuses
we set the MC status to NOT_SYNCED
for all products that aren't found in MC. But we only have the meta values for all products that previously had a MC status meta key.
What happens if we have existing products, never synced, and then we install GLA. At what point do they get the initial MC status set?
Previously that was handled by triggering the job UpdateAllProducts which would walk through all the "syncable" products and then mark them as "pending" for their initial status. But if we don't trigger that job then it seems like we'd get 0 for unsynced products after initial install as they don't get a MC status set until something in a product is updated and we trigger the syncer hooks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah looking closer at the details I see update_product_mc_statuses
works with both a combination of:
$all_product_ids
all products$current_product_statuses
all products which have a MC status previously set
So that means the MC status will be set for all products, but the meta ProductMetaHandler::KEY_VISIBILITY
isn't going to be set for those products. Getting syncable products is expensive that's why we have a background job to get the count. Do we need to combine that somehow instead of querying here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmmm yes, when I checked the code I saw that MCStatus::NOT_SYNCED
was set when the product was not found in the MC but I didn't realize about ProductMetaHandler::KEY_VISIBILITY
.
Getting syncable products is expensive that's why we have a background job to get the count. Do we need to combine that somehow instead of querying here?
Do you mean fetching the count of syncable products from OptionsInterface::SYNCABLE_PRODUCTS_COUNT
and then performing a simple calculation like not_synced = OptionsInterface::SYNCABLE_PRODUCTS_COUNT -synced_products
? However, I think this would need updating the OptionsInterface::SYNCABLE_PRODUCTS_COUNT
more frequently, as it's currently set only during user onboarding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly it seems that KEY_SYNC_STATUS is no longer used after this PR, should we remove it?
I see that the key is used here:
$exclude_meta[] = $this->prefix_meta_key( ProductMetaHandler::KEY_SYNC_STATUS ); |
So if we plan to release this earlier, before the pulling method I guess it could affect when duplicating a product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if we plan to release this earlier, before the pulling method I guess it could affect when duplicating a product.
Makes sense, I guess we should leave it to remain backwards compatible. Probably good to leave a note somewhere why we are keeping it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean fetching the count of syncable products from OptionsInterface::SYNCABLE_PRODUCTS_COUNT and then performing a simple calculation like not_synced = OptionsInterface::SYNCABLE_PRODUCTS_COUNT -synced_products? However, I think this would need updating the OptionsInterface::SYNCABLE_PRODUCTS_COUNT more frequently, as it's currently set only during user onboarding.
That's what I was thinking only on the first round, and then hoping that eventually we have meta updated so we can do a better calculation quicker. But I actually don't have a good solution there as it seems messy.
I'm currently wondering though, why do we make it difficult on ourselves to have this show "syncable but unsynced". Does the merchant even know that we are counting it like this? What do you think the effect would be if we would put just all unsynced products in there? I don't think syncable is a very clear state to the merchant as technically a product they would normally sync but is currently out of stock is temporarily classified as unsyncable until it comes back in stock. If we have it show just all unsynced products it would be much easier to explain what the count is there. At most we might need to clarify in the docs. We now have this in the docs:
Not synced. These products do not appear across Google or the Google Shopping tab as free listings. They are queued for submission or may be ineligible or excluded from the product feed.
That actually sounds like it's describing "all unsynced" anyways, so might as well match that.
So in simple terms that means querying for MCStatus::NOT_SYNCED
and removing the ProductMetaHandler::KEY_VISIBILITY
condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm currently wondering though, why do we make it difficult on ourselves to have this show "syncable but unsynced". Does the merchant even know that we are counting it like this? What do you think the effect would be if we would put just all unsynced products in there?
In my opinion, from the merchant's perspective and without checking the documentation, I would say that "Not synced" means all products that are not in the Merchant Center regardless of their visibility.
In the follow-up issue that I am creating, I will add the item to update the documentation.
Also forgot to mention, but as we discussed in slack, should we have this PR target a separate feature branch? Or is it easier to have it all in one and pull out the changes we want to release later? |
I've been thinking about this because the current branch already contains the changes made in notifications, etc., since I merged |
Either one would work for me. Reverting the merge commit would keep all the history of this PR around, but either way that would still be retained in the PR comments. Could also keep it simple and close this PR and replace it with a fresh one. |
@mikkamp, thanks for your suggestions and discussions. I added your suggestions and per our #2229 (comment), I removed the As I mentioned in the other comment, I will create a follow-up issue that includes all the tasks we discussed during this PR. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the refactoring it looks good to go from my side. I just left one question about the offer_id vs. id field, but it still works regardless.
I'll let you figure out from here which PR is going to handle the tests and how you are going to merge it to a branch.
$wc_product_id = $this->product_helper->get_wc_product_id( $product_view->getId() ); | ||
$mc_product_status = $this->convert_aggregated_status_to_mc_status( $product_view->getAggregatedDestinationStatus() ); | ||
|
||
// Skip if the product id does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal, but technically we've only tried extracting the product ID from the string, we haven't actually checked if it exists.
This makes me wonder though, we are extracting from $product_view->getId()
, why is both the offer_id
and id
included in the query MerchantProductViewReportQuery then? Not a big deal to leave it there, but was just curious. Seems like getOfferId()
would return a much shorter string.
Changes proposed in this Pull Request:
Part of #2146
This PR explores a new method to gather product stats independently of the MC API ID. The past two approaches are:
The first approach (GLA v1.0.0) involved using product status to list all the products in the merchant center. However, this required fetching all the products in the MC, with a maximum of 250 results per request. For larger customers, this meant making multiple requests and fetching products that might not even be synced through the GLA.
The current approach involves using the MC API IDs created when submitting the products, stored in the metadata with the key
_wc_gla_google_ids
. Instead of fetching all the products, this approach fetches the products using the MC API ID, making it more efficient than the first approach. However, it may return multiple 404 errors if the product has been deleted in the MC and the GLA extension is not aware of it.The new approach: As we're migrating to the API approach, we can't continue using the
_wc_gla_google_ids
metadata since we won't be submitting the product. Instead, Google will create the product for us, so we won't know the MC API ID. However, we do know theofferID
that the product will have, which by default (or at least as it is now) is the WC product ID. We'll use theofferIDs
from the "syncable" products to query the Merchant Center Reports, which allows us to retrieve a lot more results (the default page size is 1k) in one request and be independent of the MC API ID.Detailed test instructions:
It is a bit difficult to test the results as you will need products in MC with different statuses. I have been able to test the "Pending", "Not Synced", and "Disapproved" and for the "Expiring" I changed the code to simulate an expiring product.
Additional details:
gla_
prefix for theofferID
for newly synced products to the MC._wc_gla_google_ids
, Google will need to keep theID
and theofferID
saved in the metadata (confirmation on how this will work is still needed).get_offer_id
as a replacement forget_google_product_offer_id.
Since we're not using the slug, we could simply remove it from the slug argument inget_google_product_offer_id
, but I'm unsure if this change might affect other plugins using this function for other purposes. Maybe we could mark it as deprecated.More context:
Changelog entry