Skip to content

Commit

Permalink
Merge pull request #343 from alphagov/Added-info-related-to-cleaned_p…
Browse files Browse the repository at this point in the history
…age_location-changes

Added info related to cleaned_page_location changes
  • Loading branch information
annecremin authored Feb 27, 2025
2 parents 3c31650 + fa724e7 commit 0810d8c
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 4 deletions.
13 changes: 11 additions & 2 deletions source/analysis/govuk-ga4/understand-ga4/index.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Understand key GA4 dimensions and metrics
weight: 3
last_reviewed_on: 2024-12-02
last_reviewed_on: 2025-02-27
review_in: 6 months
---

Expand All @@ -18,13 +18,22 @@ The page URL is available in various forms in GA4:
| --- | --- | --- | --- | --- | --- |
| Page location | - | UNNEST (event_params) WHERE key = "page_location" | page_location | pageLocation | Full URL with the protocol, hostname, page path and query string |
| - | Full page URL | - | - | fullPageUrl | The hostname, page path, and query string for web pages visited - does not contain the protocol |
| - | Page path | - | cleaned_page_location | pagePath | Page path (value after the domain/hostname), does not include query string |
| - | Page path | - | cleaned_page_location | pagePath | Page path (value after the domain/hostname), does not include query string. The cleaned_page_location is not identical to the 'Page path' as it defaults to the canonical URL value when available |
| Page path + query string | Page path + query string | - | - | pagePathPlusQueryString | Hostname, page path and query string |
| Page path and screen class | Page path and screen class | - | - | unifiedPagePathScreen | The page path (web) or screen class (app) on which the event was logged |

Not all page dimensions were created equal: [Google announced](https://developers.google.com/analytics/devguides/reporting/data/v1/announcements/20221201-compatibility-changes) that dimensions that include the query string such as pagePathPlusQueryString are only compatible with a limited set of dimensions and metrics.
For this reason, we also collect the query string in a custom dimension named `query_string`.

The 'Page location' and 'Full page URL' fields are the most complete page dimensions, showing all or nearly all that the end user will see in the address bar of their browser.
However, due to this completeness, they may not always be the best page dimensions to use.
The presence of query strings, UTM parameters, and other small errors in the URL can make it difficult to accurately understand the number of sessions that have occurred on a given page using these dimensions.

In most cases, the 'Page path' will be the simplest and best page dimension to use, aggregating all views of a page, ignoring any differing parameters appended to the URL.
In our [GOV.UK GA4 flattened dataset](/data-sources/ga/ga4-flat/), we have created a 'cleaned_page_location' field which is very similar to the 'Page path', although it defaults to the 'canonical_url' value on document types where the canonical URL is available as the canonical URL is cleaner.
This means there will be some differences when comparing analysis using the 'Page path' and analysis using the 'cleaned_page_location'.
This may particularly cause confusion when doing journey analysis, as the path seen in the next page's page_referrer may differ from the cleaned_page_location of the present page.

### Page referrer
The 'Page referrer' in GA4 is based on the document referrer, and tells you the page the user clicked a link on to get to the present page - the page that referred the user to the current page.
This is not necessarily the previous page the user opened or looked at.
Expand Down
4 changes: 2 additions & 2 deletions source/data-sources/ga/ga4-flat/index.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: GOV.UK GA4 flattened table
weight: 3
last_reviewed_on: 2025-02-19
last_reviewed_on: 2025-02-27
review_in: 6 months
---

Expand Down Expand Up @@ -74,7 +74,7 @@ More information on the processing can be found in the [Policies and processes s
| unique_session_id | STRING | NULLABLE | The user_pseudo_id concatenated with the ga_session_id to create a unique session ID |
| page_title | STRING | NULLABLE | The [page title](https://docs.publishing.service.gov.uk/analytics/attribute_title.html) |
| page_location | STRING | NULLABLE | The full URL with the protocol, hostname, page path and query string |
| cleaned_page_location | STRING | NULLABLE | The page path (the page_location without the protocol, hostname, and any query string |
| cleaned_page_location | STRING | NULLABLE | The page path (the page_location without the protocol, hostname, and any query string). If a canonical_url is available for this page, the cleaned_page_location will default to the canonical_url value |
| ga_sessionid | INTEGER | NULLABLE | A session ID generated by GA4. This corresponds to the time the session started and is not necessarily unqiue |
| ga_session_number | INTEGER | NULLABLE | The number of sessions that a user has started up to the current session e.g. '5' for the user's fifth session |
| primary_publishing_organisation | STRING | NULLABLE | The organisation that published the content e.g. 'Home Office' |
Expand Down

0 comments on commit 0810d8c

Please sign in to comment.