diff --git a/source/analysis/govuk-ga4/find-in-ga4/index.html.md.erb b/source/analysis/govuk-ga4/find-in-ga4/index.html.md.erb index 45ec7c9f..153aed23 100644 --- a/source/analysis/govuk-ga4/find-in-ga4/index.html.md.erb +++ b/source/analysis/govuk-ga4/find-in-ga4/index.html.md.erb @@ -1,7 +1,7 @@ --- title: Find things in the GOV.UK GA4 data weight: 4 -last_reviewed_on: 2024-05-01 +last_reviewed_on: 2024-05-21 review_in: 6 months --- @@ -14,6 +14,7 @@ There are a [variety of different things you can learn from the GA4 data](/analy This page provides some guidance on how to find commonly requested information in the GA4 data. More information on the data source itself can be found on the [GOV.UK GA4 data source page](/data-sources/ga/ga4/). +Information on best practice accessing and using GA4 data can be found in the ['Use the GA4 data' section](/analysis/govuk-ga4/use-ga4/). ## Page views diff --git a/source/analysis/govuk-ga4/ga4-data-information/index.html.md.erb b/source/analysis/govuk-ga4/ga4-data-information/index.html.md.erb index 906adc5a..c28866d5 100644 --- a/source/analysis/govuk-ga4/ga4-data-information/index.html.md.erb +++ b/source/analysis/govuk-ga4/ga4-data-information/index.html.md.erb @@ -1,14 +1,22 @@ --- title: What can I learn from the GOV.UK GA4 data? weight: 1 -last_reviewed_on: 2024-05-16 +last_reviewed_on: 2024-05-21 review_in: 6 months --- # What can I learn from the GOV.UK GA4 data? This page is a work in progress. -The [GOV.UK Google Analytics 4 data](/data-sources/ga/ga4/) contains a variety of information on users of GOV.UK, and how those users interacted with various pages on GOV.UK. +The [GOV.UK Google Analytics 4 (GA4) data](/data-sources/ga/ga4/) contains a variety of information on users of GOV.UK, and how those users interacted with various pages on GOV.UK. + +## Limitations of GA4 data collection + +GA4 data is only collected on GOV.UK when users consent to cookies that measure website use. + +Data collection also relies on the [analytics JavaScript code](https://github.com/alphagov/govuk_publishing_components/blob/main/docs/analytics-ga4/analytics.md), and will only occur on browsers we are supporting. +Browsers such as Internet Explorer 11 are not supported (see the [RFC on removing support for legacy browsers](https://github.com/alphagov/govuk-rfcs/blob/rfc-168/rfc-171-remove-legacy-browser-js-support.md#loss-of-analytics-for-legacy-browsers)). + ## Information about how users interacted with pages on GOV.UK diff --git a/source/analysis/govuk-ga4/use-ga4/bigquery/index.html.md.erb b/source/analysis/govuk-ga4/use-ga4/bigquery/index.html.md.erb index 42ab4058..f5859078 100644 --- a/source/analysis/govuk-ga4/use-ga4/bigquery/index.html.md.erb +++ b/source/analysis/govuk-ga4/use-ga4/bigquery/index.html.md.erb @@ -1,7 +1,7 @@ --- title: BigQuery best practice weight: 3 -last_reviewed_on: 2024-04-23 +last_reviewed_on: 2024-05-22 review_in: 6 months --- @@ -12,16 +12,24 @@ GA4 data is available in BigQuery in the [raw](/data-sources/ga/ga4-bq/) or the The raw and flattened GA4 data in BigQuery is stored as a table of events - each row, or record, represents an event. A count of all records on a day will give you the total number of events that were recorded on that day. -A count of all records on a day filtered to only show records with the event name ‘page_view’ will give you the total number of page views recorded on that day. -Metrics such as users and sessions are not already available, but need to be calculated. +A count of all records on a day filtered to only show records with the event name 'page_view' will give you the total number of page views recorded on that day. +Metrics such as users and sessions need to be calculated. +Some example SQL to find common GA4 metrics can be found on the ['Find things in GA4' page](/analysis/govuk-ga4/find-in-ga4/#find-things-in-the-gov-uk-ga4-data). -There are quotas in place on querying in BigQuery to ensure that costs do not get too high. -More information on these can be found in the [quotas guidance](/gcp/BQ/#quotas). -Details of the specific quotas set on various projects can be found under each project on the [GCP page](/gcp/). +## Querying GA4 data in BigQuery + +If you are unfamiliar with BigQuery, it may help to review Google’s [documentation explaining the BigQuery user interface](https://cloud.google.com/bigquery/docs/bigquery-web-ui#open-ui). + +To query GA4 data in BigQuery you will have to have permissions to view the data and to run queries in whichever project you are running the query in. +If your query fails to run, check whether you are running the query from the right project, and check the error message to see if there was a role or permission error. + +You can save queries in BigQuery to return to later. -The GA4 data stored in BigQuery can also be used in Looker Studio. More information on this can be found on the [Looker Studio best practice page](/analysis/govuk-ga4/use-ga4/looker-studio/#use-the-bigquery-ga4-data-in-looker-studio). +There are quotas in place on querying in BigQuery to ensure that costs do not get too high. +More information on these can be found in the [quotas guidance](/tools/google-cloud-platform/bigquery/#quotas). +Details of the specific quotas set on various projects can be found under each project on the [GCP projects page](/tools/google-cloud-platform/gcp-projects/). -## Best practice +### Best practice Avoid selecting all (`SELECT *`) - there are very few circumstances where you actually need every column from the data source! If you would like to see what is in the dataset, you can PREVIEW a table in the BigQuery interface. @@ -32,4 +40,11 @@ If you are running queries in BigQuery or connecting to BigQuery data, make sure Using a wildcard in the place of the date at the end of the table queries all the flattened tables (the entire history of the data) and costs can rack up pretty quickly due to the amount of data we are collecting and storing. Either specify a date in the tables you are selecting data from or make sure to use a WHERE statement where you define the date (or dates) you want. -Note that using a LIMIT statement does not reduce the amount of data queried, just the amount of rows returned to you. `SELECT * FROM [table] LIMIT 20` and `SELECT * FROM [table]` cost the exact same amount. \ No newline at end of file +Note that using a LIMIT statement does not reduce the amount of data queried, just the amount of rows returned to you. `SELECT * FROM [table] LIMIT 20` and `SELECT * FROM [table]` cost the exact same amount. + +## Using GA4 data stored in BigQuery in other tools + +The GA4 data stored in BigQuery can also be queried into visualisation tools or other products built to use this data. + +Looker Studio connects very easily to data stored in BigQuery, and so is often used to display BigQuery data within GDS. +More guidance on this can be found on the [Looker Studio best practice page](/analysis/govuk-ga4/use-ga4/looker-studio/#use-the-bigquery-ga4-data-in-looker-studio). diff --git a/source/analysis/govuk-ga4/use-ga4/index.html.md.erb b/source/analysis/govuk-ga4/use-ga4/index.html.md.erb index 93ca33d6..9e905090 100644 --- a/source/analysis/govuk-ga4/use-ga4/index.html.md.erb +++ b/source/analysis/govuk-ga4/use-ga4/index.html.md.erb @@ -1,7 +1,7 @@ --- title: Use the GOV.UK GA4 data weight: 5 -last_reviewed_on: 2024-04-24 +last_reviewed_on: 2024-05-21 review_in: 6 months --- @@ -22,5 +22,5 @@ This section contains information and best practice on how the GOV.UK GA4 data c The GA4 data can also be acccessed via: -- the [Content Data app](data-sources/content-data-app/) +- the [Content Data app](/analysis/content-data/) - Data Services' custom data tools \ No newline at end of file diff --git a/source/analysis/govuk-ga4/use-ga4/looker-studio/index.html.md.erb b/source/analysis/govuk-ga4/use-ga4/looker-studio/index.html.md.erb index 5f83c046..ee059086 100644 --- a/source/analysis/govuk-ga4/use-ga4/looker-studio/index.html.md.erb +++ b/source/analysis/govuk-ga4/use-ga4/looker-studio/index.html.md.erb @@ -1,7 +1,7 @@ --- title: Looker Studio best practice weight: 2 -last_reviewed_on: 2024-05-16 +last_reviewed_on: 2024-05-21 review_in: 6 months --- @@ -66,6 +66,16 @@ If you cannot use the above shared connection we recommend you use a custom quer 6. Tick the checkbox to ‘enable date range parameters’ if needed 7. Selecting ‘Add’ + +An example of a SQL query that could be used in step 5 above is: + +```SQL +SELECT * +FROM `ga4-analytics-352613.flattened_dataset.flattened_daily_ga_data_*` +WHERE _TABLE_SUFFIX BETWEEN @DS_START_DATE AND @DS_END_DATE +``` + + ### Best practice Where possible, use the flattened data source. The flattened tables are much more efficient to query, and should be easier to use as well. diff --git a/source/data-sources/ga/ga4-bq/index.html.md.erb b/source/data-sources/ga/ga4-bq/index.html.md.erb index 1bd96b50..8f44386b 100644 --- a/source/data-sources/ga/ga4-bq/index.html.md.erb +++ b/source/data-sources/ga/ga4-bq/index.html.md.erb @@ -1,7 +1,7 @@ --- title: GOV.UK GA4 (BigQuery export) weight: 2 -last_reviewed_on: 2024-04-23 +last_reviewed_on: 2024-05-22 review_in: 6 months --- @@ -19,13 +19,18 @@ More information can be found in our [GA access policy](/processes/ga-access/#wh ### Location There are 3 GA4 datasets. These correspond to the integration, staging, and production or live GOV.UK websites. -All of these datasets are made up of sharded tables. This means that a new table is created each day with the suffix YYYYMMDD. - The GA4 data for the live GOV.UK site is located in BigQuery in the `ga4-analytics-352613.analytics_330577055` dataset. The GA4 data for the staging site is located in BigQuery in the `ga4-analytics-352613.analytics_330580593` dataset. The GA4 data for the integration site is located in BigQuery in the `ga4-analytics-352613.analytics_294475112` dataset. -These datasets are all within the [GA4 analytics project](/gcp/#ga4-analytics). +These datasets are all comprised of sharded tables - a new table is created each day with the suffix YYYYMMDD. + +Our Google Analytics properties to export GOV.UK data several times a day. +The data for the current day is temporarily stored in intraday tables. +At the end of the day, BigQuery automatically moves the data in the intraday tables to a date table (suffixed `YYYYMMDD`) and deletes the intraday tables in question. +New intraday tables are created and added to throughout the next day. + +The GA4 datasets are all stored within the [GA4 analytics project](/gcp/#ga4-analytics). For more information on the Google Cloud Platform projects, see our [GCP Project Documentation](/gcp/). ## Schema diff --git a/source/tools/ga4-user-admin-tool/index.html.md.erb b/source/tools/ga4-user-admin-tool/index.html.md.erb index c2806486..3a1c0069 100644 --- a/source/tools/ga4-user-admin-tool/index.html.md.erb +++ b/source/tools/ga4-user-admin-tool/index.html.md.erb @@ -1,11 +1,11 @@ --- title: GA4 User Admin tool weight: 5 -last_reviewed_on: 2024-04-25 +last_reviewed_on: 2024-05-21 review_in: 6 months --- -# GA4 User Admin Tool +# GA4 User Admin tool The GA4 User Admin tool is used to add and delete users' access to www.gov.uk production Google Analytics data. The tool adds a user to [www.gov.uk GA4 production data](/data-sources/ga/ga4/) as well as providing read access to GA4 nested data, flattened data and www.gov.uk Search Console data stored in BigQuery. diff --git a/source/tools/google-cloud-platform/bigquery/index.html.md.erb b/source/tools/google-cloud-platform/bigquery/index.html.md.erb index e4f16763..44ae11ac 100644 --- a/source/tools/google-cloud-platform/bigquery/index.html.md.erb +++ b/source/tools/google-cloud-platform/bigquery/index.html.md.erb @@ -1,7 +1,7 @@ --- title: Use BigQuery weight: 3 -last_reviewed_on: 2024-04-25 +last_reviewed_on: 2024-05-22 review_in: 6 months --- @@ -26,16 +26,13 @@ For example, our GOV.UK Universal Analytics data is sent to the `govuk-bigquery- - `87773428` is the dataset name - `ga_sessions_intraday_YYYYMMDD` is the table name -Our Google Analytics data is stored in sharded tables. These table names end with the suffix `YYYYMMDD`, representing the date in year-month-day format. -### Intraday tables +There are different types of tables that can be used to store data in BigQuery. -We have set up our Google Analytics properties to export GOV.UK data several times a day. -This day is temporarily stored in intraday tables. - -At the end of the day, BigQuery automatically moves the data in the intraday table to a date table (suffixed `YYYYMMDD`) and deletes the intraday tables in question. -New intraday tables are created and added to throughout the next day. +Our [raw Google Analytics data](/data-sources/ga/ga4-bq/#location), for example, is stored in [sharded tables](https://cloud.google.com/bigquery/docs/partitioned-tables#dt_partition_shard). +These table names end with the suffix `YYYYMMDD`, representing the date in year-month-day format. +[Partitioned tables](https://cloud.google.com/bigquery/docs/partitioned-tables) are in use in other datasets. ## Quotas Several projects have quotas set up to limit the amount of data that can be queried. @@ -47,7 +44,7 @@ The aim is not to be a hindrance to the need to use the data we store so please Specific quotas can be found detailed under the project name on the [GCP page](https://docs.data-community.publishing.service.gov.uk/gcp/). -### How did I exceed my quota? +### How did I query more data than permitted by my quota? This is a very good question that we are still investigating. Our current thinking is that the dynamic concurrent query queue could be loaded with queries before the quota is breached. @@ -72,13 +69,25 @@ This does not appear to be the case when using an existing data connection. In t Where possible, use shared pre-existing data connections in Looker Studio. ## Roles -Google Cloud Platform permissions (IAM) can be a mysterious thing. -BigQuery Data Viewer at the Project level allows the person to see all the data held within a project, it can also be applied at a dataset level. -It does not provide the ability to query the data from that project, they would need to do so from a different project. +A role is a set of permissions. Users should only have the specific role or permissions they need to use the Google Cloud Platform. + +Contact the Data Engineering community on Slack to ask for a role, permission or service account. + +Common permissions we use include: + +- BigQuery Data Viewer +- BigQuery Job User +- BigQuery Read Session User + +BigQuery Data Viewer allows the user to access and view data. When granted at the project level, this means the user can see all the data held within a project. +This does not allow the user to query the data from within that project - a user granted only Data Viewer permissions to a given project would need to query the data from a different project. +This permission can also be applied at dataset level. -BigQuery Job User grants the user the ability to run queries from the project. The data could be held in a different project but the query cost it allocated to the querying project. +BigQuery Job User grants the user the ability to run queries from the project in question. +The data being queried could be held in a different project but the query cost is allocated to the querying project. +BigQuery Read Session User permissions are required if the Storage Read API is used when querying. -BigQuery Read Session User is needed if the Storage Read API is used when querying. +More information on roles and IAM permissions can be found in the Google Cloud documentation.