Skip to content

Commit

Permalink
Merge pull request #309 from alphagov/Uodates-to-flattened-table-and-…
Browse files Browse the repository at this point in the history
…data-quality-pages-plus-roadmap/changelog

Changes following taxonomy fields renaming
  • Loading branch information
annecremin authored Jan 24, 2025
2 parents aa801b5 + 48f000a commit b089038
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 14 deletions.
10 changes: 5 additions & 5 deletions source/data-sources/ga/ga4-flat/index.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: GOV.UK GA4 flattened table
weight: 3
last_reviewed_on: 2025-01-17
last_reviewed_on: 2025-01-23
review_in: 6 months
---

Expand Down Expand Up @@ -86,17 +86,17 @@ More information on the processing can be found in the [Policies and processes s
| public_updated_at | STRING | NULLABLE | The date of the last public update (the 'public_updated_at' field within the Content API and content attribute of the `<meta name="govuk:public-updated-at">` tag) |
| updated_at | STRING | NULLABLE | The date of the last internal update (the 'updated_at' field within the Content API and content attribute of the `<meta name="govuk:updated-at">` tag) |
| first_published_at | STRING | NULLABLE | The original publish date (the 'first_published_at' field within the Content API and content attribute of the `<meta name="govuk:first-published-at">` tag) |
| taxonomy_all_ids_DEPRECATED | STRING | NULLABLE | The content attribute of the `<meta name="govuk:taxon-ids">` tag e.g. 'a7f3005b-a3cd-4060-a127-725accb54f2e,65a25d2c-a0e5-4283-921d-c928babfb6e4' |
| taxonomy_all_ids | STRING | NULLABLE | The content attribute of the `<meta name="govuk:taxon-ids">` tag e.g. 'a7f3005b-a3cd-4060-a127-725accb54f2e,65a25d2c-a0e5-4283-921d-c928babfb6e4'. Note: this field was blank between February and November 2023 when instead the `full_taxonomy_ids_DEPRECATED` field was populated with this information |
| status_code | INTEGER | NULLABLE | The HTTP response status code |
| withdrawn | STRING | NULLABLE | Default value of 'false', 'true' if `<meta name="govuk:withdrawn" content="withdrawn">` tag is present |
| document_type | STRING | NULLABLE | The content attribute of the `<meta name="govuk:format">` tag |
| history | STRING | NULLABLE | Default value of 'false', 'true' if `<meta name="govuk:content-has-history" content="true">` tag is present |
| taxonomy_main_id | STRING | NULLABLE | The content attribute of the `<meta name="govuk:taxon-id">` tag |
| taxonomy_all_DEPRECATED | STRING | NULLABLE | The content attribute of the `<meta name="govuk:taxon-slugs">` tag |
| taxonomy_all | STRING | NULLABLE | The content attribute of the `<meta name="govuk:taxon-slugs">` tag. Note: this field was blank between February and November 2023 when instead the `full_taxonomy_DEPRECATED` field was populated with this information |
| taxonomy_main | STRING | NULLABLE | The content attribute of the `<meta name="govuk:taxon-slug">` tag |
| taxonomy_level_1 | STRING | NULLABLE | The content attribute of the `<meta name="govuk:themes">` tag |
| full_taxonomy | STRING | NULLABLE | Deprecated - the taxonomy path parts, concatenated |
| full_taxonomy_ids | STRING | NULLABLE | Deprecated - the taxonomy ID path parts, concatenated |
| full_taxonomy_DEPRECATED | STRING | NULLABLE | Deprecated - the taxonomy path parts, concatenated. This field was [only populated between February and November 2023](/data-sources/ga/ga4/data-quality/#issues-with-taxonomy-information). Users seeking this information outside that period should use the `taxonomy_all` field |
| full_taxonomy_ids_DEPRECATED | STRING | NULLABLE | Deprecated - the taxonomy ID path parts, concatenated. This field was [only populated between February and November 2023](/data-sources/ga/ga4/data-quality/#issues-with-taxonomy-information). Users seeking this information outside that period should use the `taxonomy_all_ids` field |
| rendering_app | STRING | NULLABLE | The content attribute of the `<meta name="govuk:rendering-app">` tag |
| organisations | STRING | NULLABLE | Publishing organisation IDs - the content attribute of the `<meta name="govuk:analytics:organisations">` tag e.g. '\<D25>' |
| session_engaged | STRING | NULLABLE | |
Expand Down
15 changes: 13 additions & 2 deletions source/data-sources/ga/ga4/data-quality/index.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: GOV.UK GA4 data quality
weight: 1
last_reviewed_on: 2024-11-28
last_reviewed_on: 2025-01-23
review_in: 6 months
---

Expand Down Expand Up @@ -69,7 +69,6 @@ This is because we were not technically able to limit form_complete events to fi

Further information on how and when form events are used can be found in our information on the [GOV.UK GA4 data structure](/analysis/govuk-ga4/understand-govuk-ga4/#form-events).


#### Truncated ecommerce (view_item_list) tracking on search results pages

Each ecommerce event (view_item_list) can have up to 200 items sent with it.
Expand Down Expand Up @@ -117,3 +116,15 @@ This is because to extract the date, which we record in the custom dimension, we
Previous work looking into timestamps associated with Whitehall Publisher CSVs identified that the Content API timestamps are in GMT, so for example an item timestamped as '2014-08-31 23:00:00' is actually displayed on the page as '1 September 2014' (published at midnight BST).

We have not yet investigated whether this has an impact on these GA4 dimensions.

#### Issues with taxonomy information

Taxonomy information is known to be of uneven quality as content editors are not always consistent in how they label or attach taxons to pages.

There is also a known bug with the `browse_topic` field which can leave it coming through blank when multiple topics have been assigned to a page.

The way in which taxonomy information has been collected into GA4 has also varied over time.
Due to GA4's character count limits on parameter values (previously set to 100 characters per parameter value), the `taxonomy_all` and `taxonomy_all_ids` fields were frequently heavily truncated when first implemented.
To work around this, in February 2023 we implemented a solution which split these fields across five sub-fields, which were then concatenated togeter in the `full_taxonomy` and `full_taxonomy_ids` fields, and the `taxonomy_all` and `taxonomy_all_ids` fields were left empty.
Later in 2023, Google increased the character count limit on parameter values to 500, so in November we switched back to the `taxonomy_all` and `taxonomy_all_ids` fields.
We have renamed the 'full' taxonomy fields to `full_taxonomy_DEPRECATED` and `full_taxonomy_ids_DEPRECATED` as they are now not being populated (since November 2023), although they still contain the data for February to November 2023 should that be required.
10 changes: 7 additions & 3 deletions source/processes/govuk-ga-roadmap/ga-changelog/index.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,9 +1,8 @@
---
title: GOV.UK GA4 improvements changelog
weight: 2
last_reviewed_on: 2025-01-17
last_reviewed_on: 2025-01-23
review_in: 6 months
hide_in_navigation: true
---

# GOV.UK GA4 improvements changelog
Expand All @@ -24,4 +23,9 @@ The `traffic_type` parameter is available in the [GOV.UK GA4 flattened dataset](
### Resolved issues with some custom dimension fields in the GOV.UK GA4 flattened dataset
Issues with the query_string, ui_text, response, search_term, autocomplete_input, autocomplete_suggestions, and link_text fields in the GOV.UK GA4 flattened dataset have been resolved in the data processing going forwards.
The processing for these fields was only including string values, but we observed that some data was coming through as having an integer or double data type.
We have now edited the processing so that integer and double type data will be included in these fields in the flattened table.
We have now edited the processing so that integer and double type data will be included in these fields in the flattened table.

### Taxonomy dimensions renamed in the GOV.UK GA4 flattened dataset
Four taxonomy dimensions have been renamed in the GOV.UK GA4 flattened dataset to help users find the correct fields.
`taxonomy_all_DEPRECATED` has become `taxonomy_all` and `taxonomy_all_ids_DEPRECATED` has become `taxonomy_all_ids`, as these are the fields containing current taxonomy information.
`full_taxonomy` has been renamed `full_taxonomy_DEPRECATED` and `full_taxonomy_ids` has been renamed `full_taxonomy_ids_DEPRECATED` because these fields have not been populated since November 2023.
18 changes: 14 additions & 4 deletions source/processes/govuk-ga-roadmap/index.html.md.erb
Original file line number Diff line number Diff line change
@@ -1,17 +1,16 @@
---
title: GOV.UK GA4 improvements roadmap
weight: 2
last_reviewed_on: 2025-01-17
last_reviewed_on: 2025-01-23
review_in: 6 months
---

# GOV.UK GA4 improvements roadmap
This page details upcoming and recent changes to the [GOV.UK GA4 data collection](/data-sources/ga/ga4/) and [processing](/processes/ga4-data-processing/).

## What we're working on now
### Fixing issues with taxonomy dimensions in the GOV.UK GA4 flattened dataset
Renaming taxonomy dimensions in the GOV.UK GA4 flattened dataset to ensure the correct page taxonomy information is readily available.
See the [changelog](/processes/govuk-ga-roadmap/ga-changelog/) for previous releases.

## What we're working on now
### Updating calculated URL fields in GOV.UK GA4 datasets
Reassessing and updating calculated page location, page path, and/or URL fields in GOV.UK GA4 datasets following the [implementation of the canonical URL](/processes/govuk-ga-roadmap/#new-canonical-url-field).

Expand All @@ -21,6 +20,12 @@ Improving our backfilling processes, focussing on developing a process for backf
### Updating our Smokey test data filter
Updating a data filter set up in the GOV.UK GA4 property to more accurately label [Smokey test](https://docs.publishing.service.gov.uk/manual/testing.html#smokey) data.

### Enabling easier access to site search data
Creating simplified tables or views containing site search data in BigQuery.

### Creating useful tables joining GOV.UK GA4 and Knowledge Graph data
Creating summarised tables or views combining GOV.UK GA4 and [Knowledge Graph](/tools/govgraph/) data.

## Recently released
### New canonical URL field
A page's canonical URL is captured in a new field/custom dimension sent with all events.
Expand All @@ -36,6 +41,11 @@ We have now edited the processing so that integer and double type data will be i

Note that these fields have not yet been backfilled (an upcoming task), so historic flattened data will still only include the string values.

### Taxonomy dimensions renamed in the GOV.UK GA4 flattened dataset
Four taxonomy dimensions have been renamed in the GOV.UK GA4 flattened dataset to help users find the correct fields.
`taxonomy_all_DEPRECATED` has become `taxonomy_all` and `taxonomy_all_ids_DEPRECATED` has become `taxonomy_all_ids`, as these are the fields containing current taxonomy information.
`full_taxonomy` has been renamed `full_taxonomy_DEPRECATED` and `full_taxonomy_ids` has been renamed `full_taxonomy_ids_DEPRECATED` because these fields have not been populated since November 2023.

## Updates
As we’re still at an early stage, our plans may shift.
We’ll update this page when this happens and add more detail when we can.

0 comments on commit b089038

Please sign in to comment.