From f25a79f07a83f1bd33a04420b1458d8ea1a243bc Mon Sep 17 00:00:00 2001
From: annecremin <anne.cremin@digital.cabinet-office.gov.uk>
Date: Wed, 17 Jul 2024 17:37:57 +0100
Subject: [PATCH 1/2] Merging path tool pages (very similar content) and
 bringing up to date some incorrect information

---
 .../analysis/reverse-path/index.html.md.erb   | 241 ------------------
 1 file changed, 241 deletions(-)
 delete mode 100644 source/analysis/reverse-path/index.html.md.erb

diff --git a/source/analysis/reverse-path/index.html.md.erb b/source/analysis/reverse-path/index.html.md.erb
deleted file mode 100644
index 9b69489a..00000000
--- a/source/analysis/reverse-path/index.html.md.erb
+++ /dev/null
@@ -1,241 +0,0 @@
----
-title: Use the reverse page path tool
-weight: 39.2
-last_reviewed_on: 2022-02-23
-review_in: 6 months
-hide_in_navigation: true
----
-
-# Use the reverse page path tool
-
-The reverse page path tool shows the pages a user visits before visiting a page of interest on GOV.UK.
-
-This tool has 4 outputs:
-
-- a CSV file with the count and proportion of user sessions visiting distinct, subsetted journeys
-- a CSV file with the count of user sessions visiting page paths at each step, regardless of the other pages in the subsetted journey
-- a Plotly visualisation of a Sankey diagram summarising the top 10 user journeys and all other journeys
-- a Google Sheet summarising the top 10 and all other journeys
-
-A subsetted journey is a part of a user's journey rather than the entire journey.
-
-A distinct journey is a unique journey that is not the same as any other journey that user has taken.
-
-To use the reverse page path tool, you must do the following.
-
-1. Download the reverse page path tool notebook.
-1. Open the tool notebook in Google Colab.
-1. Run the notebook.
-1. View the outputs.
-
-## Download the reverse page path tool
-
-Download the notebook from GitHub.
-
-1. Go to the [`govuk-user-journey-analysis-tools` GitHub repo](https://github.com/alphagov/govuk-user-journey-analysis-tools).
-1. Select __Code__ and then select __Download Zip__.
-1. Unzip the __govuk-user-journey-analysis-tools__ folder and go to the __notebooks__ folder.
-1. Save the __reverse-path-tool__ Jupyter notebook to your Google Drive account.
-
-## Opening the tool notebook in Google Colab
-
-1. Go to [Google Colab](https://colab.research.google.com/). You will see a window to open a notebook.
-1. Select the __Google Drive__ tab and open the __reverse-path-tool__ notebook.
-
-To open a Jupyter notebook in Google Colab from Google Drive for the first time, you must associate Jupyter notebooks with Google Colab.
-
-1. Right-click anywhere in Google Drive and select __More__.
-1. Select __Connect more apps__ and then __Google Colaboratory__.
-1. Accept any required permissions.
-
-Once you have associated Jupyter notebooks with Google Colab, you can open a Jupyter notebook in Google Colab from Google Drive.
-
-## Run the tool notebook
-
-To run the notebook, you must do the following.
-
-1. Authenticate your access.
-1. Set the query parameters.
-
-### Authenticate your access
-
-1. Hover your cursor over the cell that starts with the code `from datetime import datetime` to show the run icon. Select the run icon to start the authentication process.
-1. Select the authentication link in the text box and then select your Google account.
-1. Follow the on screen prompts, selecting __Allow__ when prompted, and copy the __Sign in code__ when this code appears.
-1. Go back to the text box in the notebook and paste the sign in code into the __Enter verification code__ field.
-1. Select __Enter__ to complete authentication.
-
-If you receive a warning message saying "The notebook was not authored by Google", select __Run Anyway__.
-
-### Set the query parameters
-
-You set the query parameters in the __Set query parameters__ cell.
-
-1. You must set the following mandatory query parameters:
-  - start and end dates for the analysis
-  - desired page path of interest
-  - the number of pages or events the user journeys will be subsetted by
-  - whether to use the first or last hit to the desired page in the session for the subsetted journey
-  - whether to include page and/or event hits
-  - device categories to include
-
-1. You can set the following optional query parameters on whether to:
-  - remove query strings from the page path of interest
-  - append event-associated page paths with an __[E]__,
-  - append event-associated page paths with the event category, event action, and/or event label suffixes
-  - flag journeys that include the entrance page
-  - flag journeys that include the exit page
-  - remove refreshes of the page of interest
-  - have search pages only show the search content type and search keywords
-
-Once you are happy with the query parameters, select the cell and first select __Runtime__ in the top menu, and then __Run after__.
-
-The notebook will run each cell after the __Set query parameters__ cell one at a time.
-
-## Viewing the outputs
-
-There are 4 outputs from running the notebook:
-
-- a raw CSV data file
-- a CSV data file summarising the most popular pages at each step
-- a Plotly visualisation of a Sankey diagram
-- a Google Sheets table of the top 10 reverse page path tool results
-
-After you select __Runtime__ and then __Run after__, the notebook should automatically scroll to the cell that starts with __Initialise a Google BigQuery client, and define the query parameters__.
-
-Manually scroll to this cell if the notebook does not automatically scroll to it.
-
-This cell first estimates the number of gigabytes read by the query and shows you the amount.
-
-If you are happy to run the query, enter "yes" into the user input box.
-
-If you leave the input box blank or enter something other than "yes", the query will not run.
-
-### The raw data file, summary, and Sankey diagram
-
-The query first generates the raw CSV data file and a CSV file of the most popular pages at each step, regardless of the subsetted journey.
-
-The query then downloads those files into the __Downloads_ folder on your local machine.
-
-If the query does not generate these CSV files, check the end of the URL search bar. If you see a download icon with a red cross, select the icon and change the option to __Always allow...__, and then select __Done__.
-
-Finally, running the query also creates the Sankey diagram, which is a visualisation of the reverse page path from the page of interest.
-
-To download the Sankey diagram, select the camera icon in the top right of the diagram, labelled __Download plot as a png__. This will download a PNG file to your __Downloads__ folder.
-
-### The top 10 reverse page path tool results
-
-The __Presenting results in Google sheets__ cells create a Google Sheet of the top 10 reverse page path journeys.
-
-Select and run the cell that starts __Compile a message, and flag to the user for a response; if not "yes", terminate execution__.
-
-You will see a box asking if you are happy to create the Google Sheet.
-
-If you are happy to create the Google Sheet, enter "yes" into the user input box. If you leave the input box blank or enter something other than "yes", the query will not run.
-
-The query will create the Google Sheet and provide a link to this spreadsheet under the __Create google sheet in Product and Technology Directorate/Data Services/Data Products/16 User Journey tools/Path tools: google sheet result tables__ cell.
-
-This Google Sheet follows the [reverse page path template](https://docs.google.com/spreadsheets/d/1E54VgFepSCxNfNKNtxp8eQXme7wGOAEauTqgzEuz3iM/edit?usp=drive_web&ouid=114104082491527752510).
-
-You have now viewed all the outputs from the reverse page path tool.
-
-Check the __Original SQL query__ cell for the original SQL for the reverse page path tool.
-
-## Assumptions and caveats
-
-This log contains a list of assumptions and caveats used in the forward page path tool analysis.
-
-### Definitions
-
-Assumptions are red-amber-green (RAG) rated according to the following definitions for quality and impact.
-
-| RAG rating   | Assumption quality | Assumption impact |
-|---|---|---|
-|Green|Reliable assumption, well understood and/or documented. Anything up to a validated and recent set of actual data.|Marginal assumptions that their changes have no or limited impact on the outputs.|
-|Amber|Some evidence to support the assumption. May vary from a source with poor methodology to a good source that’s a few years old.|Assumptions with a relevant, even if not critical, impact on the outputs.|
-|Red|Little evidence to support the assumption. May vary from an opinion to a limited data source with poor methodology|Core assumptions of the analysis is that the output would be extremely affected by their change.|
-
-These are Home Office Analytical Quality Assurance team definitions.
-
-### Tool only supports exact matches to `DESIRED_PAGE`
-
-* Quality: Green
-* Impact: Amber
-
-The query parameter for `DESIRED_PAGE` cannot evaluate the field using regular expression, and therefore the tool currently only supports exact matches. This is acceptable as users of these tools are used to providing exact matches for analyses.
-
-### Tool only uses the first and last visits to `DESIRED_PAGE`
-
-* Quality: Green
-* Impact: Amber
-
-Depending on user input, the subsetted journey considers the first or last visit to the `DESIRED_PAGE` as the goal location (that is, the first step).
-
-Therefore, the tool ignores any other visits to the `DESIRED_PAGE`. We decided to do this because the main aim of the tool is to understand which pages the user visits before the `DESIRED_PAGE`. Users also requested the ability to subset the journeys by the first hit.
-
-However, it is important that the user of the tool is aware of this assumption, as it will impact the subsetted journey output.
-
-### If `REMOVE_DESIRED_PAGE_REFRESHES` is `TRUE`, tool only uses the first visit in a series of sequential visits (page refreshes) to `DESIRED_PAGE` to determine which is the last visit
-
-* Quality: Green
-* Impact: Green
-
-Therefore, if `REMOVE_DESIRED_PAGE_REFRESHES` is `TRUE`, the tool will only use the first visit in a series of sequential visits to `DESIRED_PAGE` of hit type `PAGE`. Other earlier visits to `DESIRED_PAGE` will remain, as will any earlier desired page refreshes.
-
-### Tool always includes journeys shorter than the number of desired stages (`NUMBER_OF_STAGES`)
-
-* Quality: Green
-* Impact: Amber
-
-While journeys shorter than the number of desired stages are always included, journeys longer than the number of desired stages are not accurately represented.
-
-For example, if `NUMBER_OF_STAGES = 2`, and the journey consists of 3 stages, then the tool will not provide the user with the full journey (that is, the 3rd page path).
-
-Therefore, the tool will combine this journey with other journeys that consist of the same 2 page paths (`NUMBER_OF_STAGES = 2`), even if further page paths differ.
-
-### Tool assumes GOV.UK search page paths have the format `/search/{TYPE}?keywords={KEYWORDS}{...}`
-
-* Quality: Green
-* Impact: Green
-
-`{TYPE}` is the GOV.UK search content type. `{KEYWORDS}` are the search keywords, where each keyword is separated by +. {...} are any other parts of the search query that are not keyword-related (if they exist).
-
-### Tool assumes GOV.UK search page titles have the format `{KEYWORDS} - {TYPE} - GOV.UK`
-
-* Quality: Green
-* Impact: Green
-
-`{TYPE}` is the GOV.UK search content type, and `{KEYWORDS}` are the search keywords.
-
-### If `ENTRANCE_PAGE` is `FALSE`, each journey contains both instances where the entrance page is included, and is not included
-
-* Quality: Green
-* Impact: Red
-
-Therefore, if `ENTRANCE_PAGE` is `TRUE`, these 2 instances (where the entrance page is included against when the entrance page is not included) will be considered 2 separate journeys.
-
-This is a better representation of the journey, as a `TRUE` flag indicates that the journey had more page paths than `NUMBER_OF_STAGES`.
-
-The user must have a good understanding of what the `ENTRANCE_FLAG` represents, as this could change the output a lot.
-
-### If `EXIT_PAGE` is `FALSE`, each journey contains both instances where the exit page is included, and is not included
-
-Quality: Green
-Impact: Red
-
-If `EXIT_PAGE` is `TRUE`, these 2 instances, when the exit page is included compared to when the exit page is not included, will be considered 2 separate journeys.
-
-This is a better representation of the journey, as a `TRUE` flag indicates that the journey had more page paths than `NUMBER_OF_STAGES`.
-
-The user must understand what the `EXIT_FLAG` represents, as this could change the output a lot.
-
-### If user selects `DEVICE_ALL` in combination with either `DEVICE_DESKTOP`, `DEVICE_MOBILE`, and/or `DEVICE_TABLET`, then the analysis will use `DEVICE_ALL` and ignore all other arguments.
-
-* Quality: Green
-* Impact: Red
-
-By default, the tool will implement the `DEVICE_ALL` argument instead of `DEVICE_DESKTOP`, `DEVICE_MOBILE`, and `DEVICE_TABLET`.
-
-Therefore, if the user accidentally selects `DEVICE_ALL`, the query will ignore the desired arguments `DEVICE_DESKTOP`, `DEVICE_MOBILE`, and/or `DEVICE_TABLET` even if the user selects them as well. This will change the expected output.
-
-The output CSV file flags which device category(ies) were used, which should mitigate errors related to interpreting the data.

From d75b89c8ff5913ecc7b0d0635c79761366cf5546 Mon Sep 17 00:00:00 2001
From: annecremin <anne.cremin@digital.cabinet-office.gov.uk>
Date: Wed, 17 Jul 2024 17:39:38 +0100
Subject: [PATCH 2/2] Correcting old path tools information

---
 .../analysis/forward-path/index.html.md.erb   | 80 ++++++-------------
 1 file changed, 25 insertions(+), 55 deletions(-)

diff --git a/source/analysis/forward-path/index.html.md.erb b/source/analysis/forward-path/index.html.md.erb
index c2a4913d..7253513c 100644
--- a/source/analysis/forward-path/index.html.md.erb
+++ b/source/analysis/forward-path/index.html.md.erb
@@ -1,16 +1,17 @@
 ---
-title: Use the forward page path tool
+title: Use the forward and reverse page path tools
 weight: 39.1
-last_reviewed_on: 2022-02-23
+last_reviewed_on: 2024-07-18
 review_in: 6 months
-hide_in_navigation: true
 ---
 
-# Using the forward page path tool
+# Use the forward and reverse page path tools
 
-The forward page path tool shows the pages a user visits after visiting a page of interest on GOV.UK.
+The forward and reverse page path tools show the pages a user visits before (reverse) or after (forward) visiting a page of interest on GOV.UK.
 
-This tool has 4 outputs:
+These tools were developed within Data Services for use by analysts within GDS, and can be found in the 'Path tools' folder in the 'Performance and Data Analysts Community' shared Google Drive.
+
+The tools have 4 outputs:
 
 - a CSV file with the count and proportion of user sessions visiting distinct, subsetted journeys
 - a CSV file with the count of user sessions visiting page paths at each step, regardless of the other pages in the subsetted journey
@@ -21,51 +22,25 @@ A subsetted journey is a part of a user's journey rather than the entire journey
 
 A distinct journey is a unique journey that is not the same as any other journey that user has taken.
 
-To use the forward page path tool, you must do the following.
-
-1. Download the forward page path tool notebook.
-1. Open the tool notebook in Google Colab.
-1. Run the notebook.
-1. View the outputs.
-
-## Download the forward page path tool
-
-Download the notebook from GitHub.
-
-1. Go to the [`govuk-user-journey-analysis-tools` GitHub repo](https://github.com/alphagov/govuk-user-journey-analysis-tools).
-1. Select __Code__ and then select __Download Zip__.
-1. Unzip the __govuk-user-journey-analysis-tools__ folder and go to the __notebooks__ folder.
-1. Save the __forward-path-tool__ Jupyter notebook to your Google Drive account.
-
-## Opening the tool notebook in Google Colab
-
-1. Go to [Google Colab](https://colab.research.google.com/). You will see a window to open a notebook.
-1. Select the __Google Drive__ tab and open the __forward-path-tool__ notebook.
+To use the forward or reverse page path tools, you must do the following.
 
-To open a Jupyter notebook in Google Colab from Google Drive for the first time, you must associate Jupyter notebooks with Google Colab.
-
-1. Right-click anywhere in Google Drive and select __More__.
-1. Select __Connect more apps__ and then __Google Colaboratory__.
-1. Accept any required permissions.
-
-Once you have associated Jupyter notebooks with Google Colab, you can open a Jupyter notebook in Google Colab from Google Drive.
+1. Copy the page path tool notebook
+2. Open your copy of the page path tool notebook in Google Colab
+3. Run the notebook
+4. View the outputs
 
 ## Running the tool notebook
 
 To run the notebook, you must do the following.
 
 1. Authenticate your access.
-1. Set the query parameters.
+2. Set the query parameters.
 
 ### Authenticating your access
 
-1. Hover your cursor over the cell that starts with the code `from datetime import datetime` to show the run icon. Select the run icon to start the authentication process.
-1. Select the authentication link in the text box and then select your Google account.
-1. Follow the on screen prompts, selecting __Allow__ when prompted, and copy the __Sign in code__ when this code appears.
-1. Go back to the text box in the notebook and paste the sign in code into the __Enter verification code__ field.
-1. Select __Enter__ to complete authentication.
-
-If you receive a warning message saying "The notebook was not authored by Google", select __Run Anyway__.
+1. Run the cells in order. When running cell 2 - `auth.authenticate_user()` - you will see a pop-up asking you to authenticate
+2. Follow the on screen prompts, selecting your account and __Allow__ when prompted
+3. The cell will show as successfully run - with a small green tick - when you have successfully authenticated
 
 ### Setting the query parameters
 
@@ -79,10 +54,9 @@ You set the query parameters in the __Set query parameters__ cell.
   - use the first or last hit to the desired page in the session for the subsetted journey
   - device categories to include
 
-1. You can set the following optional query parameters on whether to:
+2. You can set the following optional query parameters on whether to:
   - remove query strings from the page path of interest
   - append event-associated page paths with an __[E]__
-  - append event-associated page paths with the event category, event action, and/or event label suffixes
   - flag journeys that include the entrance page
   - flag journeys that include the exit page
   - remove refreshes of the page of interest
@@ -99,7 +73,7 @@ There are 4 outputs from running the notebook:
 - a raw CSV data file
 - a CSV data file summarising the most popular pages at each step
 - a Plotly visualisation of a Sankey diagram
-- a Google Sheets table of the top 10 forward page path tool results
+- a Google Sheets table of the top 10 page path tool results
 
 After you select __Runtime__ and then __Run after__, the notebook should automatically scroll to the cell that starts with __Initialise a Google BigQuery client, and define the query parameters__.
 
@@ -119,13 +93,13 @@ The query then downloads those files into the __Downloads_ folder on your local
 
 If the query does not generate these CSV files, check the end of the URL search bar. If you see a download icon with a red cross, select the icon and change the option to __Always allow...__, and then select __Done__.
 
-Finally, running the query also creates the Sankey diagram, which is a visualisation of the forward page path from the page of interest.
+Finally, running the query also creates the Sankey diagram, which is a visualisation of the forward/reverse page path from the page of interest.
 
 To download the Sankey diagram, select the camera icon in the top right of the diagram, labelled __Download plot as a png__. This will download a PNG file to your __Downloads__ folder.
 
-### The top 10 forward page path tool results
+### The top 10 path tool results
 
-The __Presenting results in Google sheets__ cells create a Google Sheet of the top 10 forward page path journeys.
+The __Presenting results in Google sheets__ cells create a Google Sheet of the top 10 forward/reverse page path journeys.
 
 Select and run the cell that starts __Compile a message, and flag to the user for a response; if not "yes", terminate execution__.
 
@@ -135,18 +109,14 @@ If you are happy to create the Google Sheet, enter "yes" into the user input box
 
 The query will create the Google Sheet and provide a link to this spreadsheet under the __Create google sheet in Product and Technology Directorate/Data Services/Data Products/16 User Journey tools/Path tools: google sheet result tables__ cell.
 
-This Google Sheet follows the [forward page path template](https://docs.google.com/spreadsheets/d/1kISyKu2jVzINCxwPe8ydQM8cibgEX2a3WCPxkJM9W80/edit#gid=1115034830).
-
-You have now viewed all the outputs from the forward page path tool.
+## Original SQL query 
 
-Check the __Original SQL query__ cell for the original SQL for the forward page path tool.
+Check the __Original SQL query__ cell at the bottom of the notebooks for the original SQL for the forward and reverse page path tools.
 
 ## Assumptions and caveats
 
 This log contains a list of assumptions and caveats used in the forward page path tool analysis.
 
-### Definitions
-
 Assumptions are red-amber-green (RAG) rated according to the following definitions for quality and impact.
 
 | RAG rating   | Assumption quality | Assumption impact |
@@ -220,8 +190,8 @@ The user must have a good understanding of what the `ENTRANCE_FLAG` represents,
 
 ### If `EXIT_PAGE` is `FALSE`, each journey contains both instances where the exit page is included, and is not included
 
-Quality: Green
-Impact: Red
+* Quality: Green
+* Impact: Red
 
 If `EXIT_PAGE` is `TRUE`, these 2 instances, when the exit page is included compared to when the exit page is not included, will be considered 2 separate journeys.