Skip to content

Commit

Permalink
chore: merge branch
Browse files Browse the repository at this point in the history
  • Loading branch information
mhucka committed Jan 3, 2024
2 parents 1a4cffb + 62ee474 commit b4289c9
Show file tree
Hide file tree
Showing 15 changed files with 299 additions and 171 deletions.
8 changes: 5 additions & 3 deletions .github/workflows/build-sphinx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,14 @@ name: Build Sphinx docs & publish on GitHub Pages

on:
push:
branches:
- main
pull_request:
branches: ['main']
paths: ['docs/**']
release:
types: [published]

jobs:
Workflow:
name: Run Sphinx docs & publish GitHub Pages
runs-on: ubuntu-22.04
permissions:
contents: write
Expand Down
21 changes: 0 additions & 21 deletions .github/workflows/test-action.yml

This file was deleted.

37 changes: 37 additions & 0 deletions .github/workflows/test-waystation.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Workflow for testing Waystation.

name: Test Waystation
run-name: Archive GitHub Pages in the Wayback Machine

on:
workflow_dispatch:
inputs:
dry_run:
description: "Run without actually sending URLs"
type: boolean
default: true
debug:
default: true
type: boolean
save_errors:
default: true
type: boolean
save_outlinks:
default: true
type: boolean
save_screenshots:
default: true
type: boolean

jobs:
Workflow:
name: Run Waystation
runs-on: ubuntu-latest
steps:
- uses: caltechlibrary/waystation@develop
with:
debug: ${{github.event.inputs.debug || true}}
dry_run: ${{github.event.inputs.dry_run || true}}
save_errors: ${{github.event.inputs.save_errors || false}}
save_outlinks: ${{github.event.inputs.save_outlinks || false}}
save_screenshot: ${{github.event.inputs.save_screenshot || false}}
25 changes: 15 additions & 10 deletions .github/workflows/waystation.yml
Original file line number Diff line number Diff line change
@@ -1,23 +1,28 @@
# GitHub Actions workflow to run Waystation on this repository.
#
# Copyright 2022-2024 California Institute of Technology.
# License: Modified BSD 3-clause – see file "LICENSE" in the project website.
# Website: https://github.com/caltechlibrary/waystation
# ╭─────────────────── Notice ── Notice ── Notice ───────────────────╮
# │ This is a custom Waystation workflow file. It is different from │
# │ the sample workflow suggested for users. DO NOT COPY THIS FILE; │
# │ instead, use the sample workflow file "sample-workflow.yml" from │
# │ the repository at https://github.com/caltechlibrary/waystation/. │
# ╰─────────────────── Notice ── Notice ── Notice ───────────────────╯

name: "Waystation"
run-name: Archive GitHub Pages in IA
name: Archive GitHub Pages
run-name: Archive GitHub Pages in the Wayback Machine

on:
release:
types: [published]
workflow_dispatch:
inputs:
dry_run:
description: "Run without actually sending URLs"
type: boolean

jobs:
Workflow:
name: Waystation
run-waystation:
name: Run Waystation
runs-on: ubuntu-latest
steps:
- uses: caltechlibrary/waystation@main
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
dry_run: ${{github.event.inputs.dry_run || false}}
debug: true
12 changes: 12 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,17 @@
# Change log for Waystation

## Version 1.6 (2024-01-02)

Changes in this release:

* Fixed bug in issue #3. The default value of `dry_run` is now `false`, as it was supposed to be.
* Removed the `save_errors` parameter; it doesn't seem useful for Waystation and just adds complexity.
* Revised and improved the sample workflow.
* Added a file (`sample-workflow.yml`) for the sample workflow, to make copying easier.
* Updated file headers to be more streamlined and include a missing copyright statement.
* Updated the year in the `LICENSE` file.


## Version 1.5 (2022-11-22)

This release updates the version of the [Wayback](https://github.com/marketplace/actions/wayback-machine) action used by Waystation to the latest version (thanks to PR #1 from [Jamie Magee](https://github.com/JamieMagee)), and corrects the misspelling of the author's name (thanks to PR #2 from [Jamie Magee](https://github.com/JamieMagee)).
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ authors:
orcid: https://orcid.org/0000-0001-9105-5960
title: Waystation – Wayback site archiving automation
abstract: GitHub Action to archive a repo's GitHub Pages in the Wayback Machine
version: 1.5
date-released: 2022-11-22
version: 1.6.0
date-released: 2024-01-02
repository-code: "https://github.com/caltechlibrary/waystation"
license-url: "https://github.com/caltechlibrary/waystation/blob/main/LICENSE"
type: software
Expand Down
111 changes: 67 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,104 +24,127 @@ Many projects use [GitHub Pages](https://docs.github.com/en/pages) for documenta

### How does Waystation work?

Waystation (<ins><b>Way</b></ins>back <ins><b>s</b></ins>i<ins><b>t</b></ins>e <ins><b>a</b></ins>rchiving automa<ins><b>tion</b></ins>) automates the task of sending your project's [GitHub Pages](https://docs.github.com/en/pages) URL to the [Wayback Machine](https://web.archive.org). It's intended to be triggered on software releases in your repository and uses the [Wayback Machine GitHub Action](https://github.com/marketplace/actions/wayback-machine) to send your repository's configured GitHub Pages URL to the Wayback Machine, thereby ensuring that the latest copy of your site is archived. You can change the trigger condition if needed.
Waystation (<em><ins><b>Way</b></ins>back <ins><b>s</b></ins>i<ins><b>t</b></ins>e <ins><b>a</b></ins>rchiving automa<ins><b>tion</b></ins></em>) sends your project's [GitHub Pages](https://docs.github.com/en/pages) URL to the [Wayback Machine](https://web.archive.org). It's intended to be triggered on software releases in your repository and uses the [Wayback Machine GitHub Action](https://github.com/marketplace/actions/wayback-machine) to send your repository's configured GitHub Pages URL to the Wayback Machine, thereby ensuring that the latest copy of your site is archived. You can change the trigger condition if needed.

### Why would you want to bother with this?

GitHub is incredibly popular today, but the content is not guaranteed to be permanent; moreover, GitHub has in the past [changed the URLs and policies surrounding GitHub Pages](https://ws-dl.blogspot.com/2022/03/2022-03-30-github-is-not-archive-github.html)—and may do so again in the future. The Wayback Machine is a free digital archive of the World Wide Web founded by the [Internet Archive](https://en.wikipedia.org/wiki/Internet_Archive). Web pages saved in the Wayback Machine continue to exist even after the original project repository changes or is removed from the web, and they can be [searched for, shared, and linked to normally](https://help.archive.org/help/using-the-wayback-machine/). You can also view [previous versions of a site](https://archive.org/web/) if they were archived.
GitHub is incredibly popular today, but the content is not guaranteed to be permanent; moreover, GitHub has in the past [changed the URLs and policies surrounding GitHub Pages](https://ws-dl.blogspot.com/2022/03/2022-03-30-github-is-not-archive-github.html)—and may do so again in the future. The Wayback Machine is a free digital archive of the World Wide Web founded by the [Internet Archive](https://en.wikipedia.org/wiki/Internet_Archive). Web pages saved in the Wayback Machine continue to exist even after the original project repository changes or is removed from the web, and the archived pages can be [searched for, shared, and linked to normally](https://help.archive.org/help/using-the-wayback-machine/). You can also view [previous versions of a site](https://archive.org/web/) if they were archived.


## Installation

This action is available from the [GitHub Marketplace](https://github.com/marketplace?type=&verification=&query=waystation). Once you find the page in the GitHub Marketplace, do the following:
To use Waystation, you need to create a GitHub Actions workflow file in your repository. Follow these simple steps.

### Add the workflow file to your repository

1. In the main branch of your repository, create a `.github/workflows` directory if this directory does not already exist.
2. In the `.github/workflows` directory, create a file named `archive-github-pages.yml`.
3. Copy and paste the following content into the file:
3. Copy and paste the [following content](https://raw.githubusercontent.com/caltechlibrary/waystation/main/sample-workflow.yml) into the file:
```yaml
# GitHub Actions workflow for Waystation version 1.5.0.
# Available as the file "sample-workflow.yml" from the software
# repository at https://github.com/caltechlibrary/waystation

name: Archive GitHub Pages
run-name: Archive GitHub Pages in the Wayback Machine

on:
release:
types: [published]
workflow_dispatch:
inputs:
dry_run:
description: "Run without actually sending URLs"
type: boolean

jobs:
Workflow:
run-waystation:
name: Run Waystation
runs-on: ubuntu-latest
steps:
- uses: caltechlibrary/waystation@main
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
dry_run: ${{github.event.inputs.dry_run || false}}
```
4. Save the file, add it to your git repository, and commit the changes.
5. (If you did the steps above outside of GitHub) Push your repository changes to GitHub.
Refer to the next section for more information.
### Test the workflow
Once you have created the workflow file and pushed it to GitHub, it's wise to do a dry run, in order to test that things work as expected.
1. Go to the _Actions_ tab in your repository and click on the workflow named "Archive GitHub Pages" in the sidebar on the left<p align="center"><img src="https://github.com/caltechlibrary/waystation/raw/develop/docs/_static/media/github-run-workflow.png" alt="Screenshot of GitHub actions workflow list" width="90%"></p>
2. In the page shown by GitHub next, click the <kbd>Run workflow</kbd> button in the right-hand side of the blue strip<p align="center"><img src="https://github.com/caltechlibrary/waystation/raw/develop/docs/_static/media/github-run-workflow-button.png" alt="Screenshot of GitHub Actions workflow run button" width="75%"></p>
3. In the pull-down, click the checkbox for "Run without actually sending URLs"<p align="center"><img src="https://github.com/caltechlibrary/waystation/raw/develop/docs/_static/media/github-workflow-options-circled.png" alt="Screenshot of GitHub Actions workflow menu" width="40%"></p>
4. Click the green <kbd>Run workflow</kbd> button near the bottom
5. Refresh the web page and a new line will be shown named after your workflow file"<p align="center"><img src="https://github.com/caltechlibrary/waystation/raw/develop/docs/_static/media/github-workflow-running.png" alt="Screenshot of GitHub Actions running" width="90%"></p>
6. Click the title of that workflow, to make GitHub show the progress and results of running Waystation
## Usage
The trigger condition that causes Waystation to run is determined by the `on` statement in your `archive-github-pages.yml` workflow file. The examples shown here use `on: release` to trigger when software is released, but you can use [other trigger events defined by GitHub](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows) if you wish.
Once installed, the sample workflow will run automatically the next time you publish a release on GitHub. The trigger condition that causes Waystation to run automatically is determined by the `on` statement in your `archive-github-pages.yml` workflow file. The examples shown here use `on: release` to trigger when a release is published, but you can use [other trigger events defined by GitHub](https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows) if you wish.

Several parameters control the behavior of Waystation; they are described below.
Several optional parameters control the behavior of Waystation; they are described below.


### `dry_run` (default: `false`)

Setting the parameter `dry_run` to `true` will cause the action to execute without sending the URL to the Wayback Machine. This is useful during testing, especially if you want to try different trigger conditions.
Setting the parameter `dry_run` to `true` will cause the action to execute without sending the URL to the Wayback Machine. This mainly useful for testing, especially if you want to try different trigger conditions.

Here is an example workflow definition using `dry_run`:
The [sample workflow file](https://raw.githubusercontent.com/caltechlibrary/waystation/main/sample-workflow.yml) (shown [above](#add-the-workflow-file-to-your-repository)) includes a `dry_run` parameter checkbox when invoked manually. You can use that to set the value on an individual per-run basis. To change the default value (for example, when experimenting with different trigger conditions), you can do so by changing the `false` to `true` in the last line of the sample workflow. That is, change the last line from

```yaml
# .github/workflows/archive-github-pages.yml
on:
release:
types: [published]
jobs:
Workflow:
runs-on: ubuntu-latest
steps:
- uses: caltechlibrary/waystation@main
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
dry_run: true
dry_run: ${{github.event.inputs.dry_run || false}}
```

to

```yaml
dry_run: ${{github.event.inputs.dry_run || true}}
```

### `debug` (default: `false`)

Setting the parameter `debug` to `true` will cause the action to print the values of the input variables
and the GitHub context. This is useful for debugging the workflow.
### `debug` (default: `false`)

Here is an example workflow definition using `debug`:
Passing the parameter `debug` with a value of `true` will cause Waystation to print the values of the input variables and the GitHub context at run time. This is useful for debugging the workflow. To set the `debug` parameter, add it as part of the `with:` block in the workflow file. For example:

```yaml
on:
release:
types: [published]
jobs:
Workflow:
runs-on: ubuntu-latest
steps:
...
- uses: caltechlibrary/waystation@main
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
dry_run: true
dry_run: ${{github.event.inputs.dry_run || false}}
debug: true
...
```


### `save_errors` (default: `false`)

This corresponds to the parameter [`saveErrors`](https://github.com/JamieMagee/wayback#saveerrors) in the [Wayback Machine GitHub Action](https://github.com/marketplace/actions/wayback-machine). A value of `true` will make the action tell the Wayback Machine to save web pages that return an HTTP status code in the range 4xx or 5xx. The default is `false`.


### `save_outlinks` (default: `true`)

This corresponds to the parameter [`saveOutlinks`](https://github.com/JamieMagee/wayback#saveOutlinks) in the [Wayback Machine GitHub Action](https://github.com/marketplace/actions/wayback-machine). A value of `true` will make the action tell the Wayback Machine to archive external pages that are linked to from your GitHub Pages. The default in Waystation is `true` (unlike the default in the Wayback Machine GitHub Action) because the author finds this useful in producing a more complete archive of a GitHub Pages site.
This corresponds to the parameter [`saveOutlinks`](https://github.com/JamieMagee/wayback#saveOutlinks) in the [Wayback Machine GitHub Action](https://github.com/marketplace/actions/wayback-machine). A value of `true` will make the action tell the Wayback Machine to archive external pages that are linked to from your GitHub Pages. The default in Waystation is `true` because Waystation's author finds this useful in producing a more complete archive of a GitHub Pages site. To set the `save_outlinks` parameter, add it as part of the `with:` block in the workflow file. For example:

```yaml
...
- uses: caltechlibrary/waystation@main
with:
dry_run: ${{github.event.inputs.dry_run || false}}
save_outlinks: true
...
```


### `save_screenshot` (default: `true`)

This corresponds to the parameter [`saveScreenshot`](https://github.com/JamieMagee/wayback#saveScreenshot) in the [Wayback Machine GitHub Action](https://github.com/marketplace/actions/wayback-machine). A value of `true` will make the action tell the Wayback Machine to save a screenshot of the page located at the GitHub Pages URL. The default in Waystation is `true` (unlike the default in the Wayback Machine GitHub Action) because the author finds this useful in producing a more complete archive of a GitHub Pages site.
This corresponds to the parameter [`saveScreenshot`](https://github.com/JamieMagee/wayback#saveScreenshot) in the [Wayback Machine GitHub Action](https://github.com/marketplace/actions/wayback-machine). A value of `true` will make the action tell the Wayback Machine to save a screenshot of the page located at the GitHub Pages URL. The default in Waystation is `true` because Waystation's author finds this useful in producing a more complete archive of a GitHub Pages site. To set the `save_screenshot` parameter, add it as part of the `with:` block in the workflow file. For example:

```yaml
...
- uses: caltechlibrary/waystation@main
with:
dry_run: ${{github.event.inputs.dry_run || false}}
save_screenshot: true
...
```


## Getting help
Expand All @@ -136,7 +159,7 @@ Your help and participation in enhancing Waystation is welcome! Please visit th

## License

Software produced by the Caltech Library is Copyright © 2022 California Institute of Technology. This software is freely distributed under a modified BSD 3-clause license. Please see the [LICENSE](LICENSE) file for more information.
Software produced by the Caltech Library is Copyright © 2022–2024 California Institute of Technology. This software is freely distributed under a modified BSD 3-clause license. Please see the [LICENSE](LICENSE) file for more information.


## Acknowledgments
Expand Down
Loading

0 comments on commit b4289c9

Please sign in to comment.