Skip to content

Commit

Permalink
Merge pull request #242 from cal-itp/update-readme
Browse files Browse the repository at this point in the history
Organized the README
  • Loading branch information
atvaccaro authored Apr 12, 2023
2 parents ba967d9 + 5da1506 commit c6bc219
Showing 1 changed file with 44 additions and 53 deletions.
97 changes: 44 additions & 53 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,43 +2,16 @@

GTFS data quality reports for California transit providers

## Repository structure
#### Repository structure

This repository is set up in two pieces:

- `reports/` subfolder - generates underlying GTFS data for each report.
- `website/` subfolder - uses `generate.py` and `../templates/` to create the static reports website.

## Generating the reports
## To Get Started

See [this screencast](https://www.loom.com/share/b45317053ff54b9fbb46b8159947c379) for a full walkthrough of building the reports.

#### Generating Reports Data

The following steps are run within the `reports` folder.

- `make generate_parameters` runs the `generate_ids.py` file which generates:
1. `outputs/index_report.json` - a file that lists every agency name and `outputs/YYYY/MM` folder
2. `outputs/YYYY/MM` for every agency
- `make MONTH=02 YEAR=2023 all -j 15` runs the following commands:
1. `python generate_reports_data.py -v --f outputs/YYYY/MM/AGENCY_NUM/1_file_info.json generates json files in `outputs/YYYY/MM/AGENCY_NUM/` directories

The files in each `outputs/YYYY/MM/AGENCY_NUM/` directory are used to generate the static HTML (see below).

All report data for every month can be generated by running: ``python run_all_months.py``.

#### Building the website

To build the website, run ``npm run build`` in the ``website`` folder. Run ``npm run dev`` for verbose output, which can help with troubleshooting. These commands perform the following:

- Python script `website/generate.py` loads JSON from the `reports/outputs/YYYY/MM/ITPID/data` directory and applies it to template files in `/templates`
- HTML templates written with [Jinja](https://jinja.palletsprojects.com/en/3.0.x/)
- CSS written with [SCSS](https://sass-lang.com/documentation/syntax#scss) and [Tailwind](https://tailwindcss.com/docs) via [PostCSS](https://postcss.org/)
- JS behavior added with [Alpine.js](https://alpinejs.dev)
- Bundled with [Rollup](https://rollupjs.org/guide/en/)
- Build scripts via [NPM](https://www.npmjs.com/)

### Set up google cloud credentials
### Set up Google Cloud credentials

Set up [google cloud authentication credentials](https://cloud.google.com/docs/authentication/getting-started).

Expand All @@ -49,15 +22,15 @@ Specifically, download the SDK/CLI at the above link, install it, create a new t

Note that with a user account authentication, the environment variable `CALITP_SERVICE_KEY_PATH` should be unset.

### Running Locally
### To Run Locally

#### Virtual environment
#### with a Virtual Environment

1. `source .venv/bin/activate` to activate Python virtual environment
2. `pip install -r requirements.txt` to download Python dependencies
3. `npm install` to download npm dependencies

### Running via Docker-compose
#### with Docker-compose

Note that the folder also contains a `docker-compose.yml`, so it is possible to run the build inside docker by running these commands first.
In this case, docker first needs to be [installed locally](https://docs.docker.com/get-docker/), setting resources as desired (i.e. enable 6 cores if you have an 8 core machine, etc).
Expand All @@ -69,7 +42,11 @@ docker-compose run --rm --service-ports calitp_reports /bin/bash

If google credentials are already configured on the host, the local credential files should already be mounted in the container, but it may only be necessary to run `gcloud auth application-default login` from within the container.

### Executing Report Generation
## Executing Report Generation

See [this screencast](https://www.loom.com/share/b45317053ff54b9fbb46b8159947c379) for a full walkthrough of building the reports.

### Generating the Reports Data

The following takes place within the reports subfolder, i.e. (`cd reports`).

Expand All @@ -79,32 +56,35 @@ When looking for a clean start (i.e. start from scratch) run:
make clean
```

#### Fetching report data
#### Fetch existing report data
Run the gsutil rsync to update all the locally stored reports.
Note that `gtfs-data-test` can be replaced with `gtfs-data` for testing on production data:

```shell
gsutil -m rsync -r gs://gtfs-data-test/report_gtfs_schedule outputs
```

#### Generating reports
Next, start the report generation:
#### Generate the index file and create the outputs folder structure
`make generate_parameters` runs the `generate_ids.py` file which generates:
1. `outputs/index_report.json` - a file that lists every agency name and `outputs/YYYY/MM` folder
2. `outputs/YYYY/MM` for every agency

#### Run the data
`make MONTH=02 YEAR=2023 all -j 15` runs the following commands:
1. `python generate_reports_data.py -v --f outputs/YYYY/MM/AGENCY_NUM/1_file_info.json generates json files in `outputs/YYYY/MM/AGENCY_NUM/` directories

```shell
make generate_parameters
make MONTH=02 YEAR=2023 all -j 15
```
Where:
* the number after `MONTH=` is the desired numerical month (`02` in this case)
* the number after `YEAR=` is the desired numerical YEAR (`2023` in this case)
* the number after `-j` is the number of parallel threads (`15` in this case)

This will create data for one month within the reports/outputs folder.
The files in each `outputs/YYYY/MM/AGENCY_NUM/` directory are used to generate the static HTML (see below).

This will create data for one month within the reports/outputs folder. All report data for every month can be generated by running: ``python run_all_months.py``.

**NOTE** that the MONTH refers to the month of the folders that will be generated. This is different than the ``publish_date``, which is the first day of the next month for a given report. I.e. ``make MONTH=02 YEAR=2023 all -j 15`` will create ``outputs/2023/02/*`` folders, whereas the ``publish_date`` for the data in those folders is ``2023-03-01``.

Note that running too many threads (i.e. parallel queries, such as `30` or more) may not complete successfully if many other BigQuery queries are happening simultaneously: [BigQuery has a limit of 100 concurrent queries](https://cloud.google.com/bigquery/quotas).
If this is the case, try rerunning with fewer threads (i.e. `make all -j 8`).
**NOTE** that running too many threads (i.e. parallel queries, such as `30` or more) may not complete successfully if many other BigQuery queries are happening simultaneously: [BigQuery has a limit of 100 concurrent queries](https://cloud.google.com/bigquery/quotas). If this is the case, try rerunning with fewer threads (i.e. `make all -j 8`).

#### Validating the report creation

Expand All @@ -122,25 +102,37 @@ If there is a missing month, an individual month can be run with the following c
python generate_reports_data.py -v --f outputs/YYYY/MM/AGENCY_NUM/1_file_info.json
```

### Build website
#### Testing

Tests can be run locally from the ``tests`` directory by running ``python test_report_data.py``. These tests are run on commits through a github action.

Once every single report is generated, navigate to the website subfolder (i.e. `cd ../website`), install the npm dependencies, and build the website.
### Building the website

Once the report data has been generated navigate to the website subfolder (i.e. `cd ../website`), install the npm dependencies if you haven't done so already, and build the website.

```shell
npm install
npm run build
```

This will run the script in generate.py that will render the index.html, monthly report index pages, and the individual reports.
It will also apply the various jinja templates to the reports, JS frameworks, and CSS styles. It is worth mentioning that `npm run build` will currently only execute if you have data from previous months.
These commands perform the following:

- Python script `website/generate.py` loads JSON from the `reports/outputs/YYYY/MM/ITPID/data` directory and applies it to template files in `/templates`
- HTML templates written with [Jinja](https://jinja.palletsprojects.com/en/3.0.x/)
- CSS written with [SCSS](https://sass-lang.com/documentation/syntax#scss) and [Tailwind](https://tailwindcss.com/docs) via [PostCSS](https://postcss.org/)
- JS behavior added with [Alpine.js](https://alpinejs.dev)
- Bundled with [Rollup](https://rollupjs.org/guide/en/)
- Build scripts via [NPM](https://www.npmjs.com/)

It is worth mentioning that `npm run build` will currently only execute if you have data from previous months. Run ``npm run dev`` for verbose output and to see which month is failing, which can help with troubleshooting.

Note that the error:
```shell
jinja2.exceptions.UndefinedError: 'feed_info' is undefined
```
Is often due to a lack of generated reports. This can be remedied for prior months by rsyncing the reports from the upstream source (see [Fetching report data](#fetching-report-data)), and ensuring every single ITPID has a corresponding generated report for the current month (see [Generating reports](#generating-reports)).

Run ``npm run dev`` for more verbose output, to see which month is failing.
#### Viewing the website

To check that everything is rendered appropriately, go into the website/build (i.e. `cd build`) directory:

Expand All @@ -150,11 +142,10 @@ python -m http.server
and open up a web browser, and navigate to:
[localhost:8000](localhost:8000)

### Testing

Tests can be run locally from the ``tests`` directory by running ``python test_report_data.py``. These tests are run on commits through a github action.
### Pushing Data to Google Cloud

### Pushing to google cloud - Development
#### Pushing to Development

The next step is to update the development bucket in google cloud with the new data.
In the case where data must be overwritten (please use caution!) a `-d` flag can be added to the command
Expand All @@ -171,7 +162,7 @@ PR to main. This site can be viewed at `https://development-build--cal-itp-repor
> you can produce empty commits with `git commit --allow-empty` and merge those
> into the main branch.
### Pushing to google cloud - Production
#### Pushing to Production

Assuming that all the data is correct in development, you can sync the test data to production.

Expand Down

0 comments on commit c6bc219

Please sign in to comment.