Continuously matches realtime transit data in the VDV-454 structure against a GTFS Schedule dataset and generates GTFS Realtime (GTFS-RT) data.
Tip
If you're just looking for VBB's publicly deployed GTFS-RT feed:
Tip
Although gtfs-rt-feed
can be used standalone, it is intended to be used in tandem with vdv-453-nats-adapter
– which pulls the input VDV-454 data from a VDV-453/-454 API – and nats-consuming-gtfs-rt-server
– which combines the DIFFERENTIAL
-mode GTFS-RT data sent by gtfs-rt-feed
into a single non-differential feed and serves it via HTTP.
For more details about the architecture gtfs-rt-feed
has been designed for, refer to the VBB deployment's readme.
It uses the PostGIS GTFS importer to import the GTFS Schedule data into a new PostgreSQL database whenever it has changed.
This service reads VDV-454 IstFahrt
s (in JSON instead of XML) from a NATS message queue:
// To be more readable, this example only contains essential fields. In practice, there are more.
{
"LinienID": "M77",
"LinienText": "M77",
"FahrtID": {
"FahrtBezeichner": "9325_877_8_2_19_1_1806#BVG",
"Betriebstag": "2024-09-20",
},
"IstHalts": [
{
"HaltID": "900073281",
"Abfahrtszeit": "2024-09-20T12:41:00Z",
"IstAbfahrtPrognose": "2024-09-20T13:47:00+01:00", // 6 minutes delay
},
{
"HaltID": "900073236",
"Ankunftszeit": "2024-09-20T12:43:00Z",
"Abfahrtszeit": "2024-09-20T12:45:00Z",
"IstAnkunftPrognose": "2024-09-20T13:46:00+01:00", // 3 minutes delay
"IstAbfahrtPrognose": "2024-09-20T13:47:00+01:00", // 2 minutes delay
},
// Usually there are more IstHalts, but the IstFahrt may not be complete.
],
}
First, it is transformed it into a GTFS-RT TripUpdate
, so that subsequent must only deal with GTFS-RT concepts.
// Again, this example has been shortened for readability.
{
"trip": {},
"stop_time_update": [
{
"stop_id": "900073281",
"departure": {
"time": 1726836420,
"delay": 300,
},
},
{
"stop_id": "900073236",
"arrival": {
"time": 1726836360,
"delay": 180,
},
"departure": {
"time": 1726836420,
"delay": 120,
},
},
],
// not part of the GTFS Realtime spec, we just use it for matching and/or debug-logging
[kRouteShortName]: "M77",
}
Within the imported GTFS Schedule data, gtfs-rt-feed
then tries to find trip "instances" that
- have the same
route_short_name
("M77"), - for at least two
IstHalts
, stop at (roughly) the same scheduled time (2024-09-20T12:41:00Z
) at (roughly) the same stop (900073281
).
If there is exactly one such GTFS Schedule trip "instance", we call it a match. If there are 2 trip "instances", we consider the the match ambiguous and not specific enough, so we stop processing the IstFahrt
.
The GTFS Schedule trip "instance" is then formatted as a GTFS-RT TripUpdate
(it contains no realtime data). Then the schedule TripUpdate
and the matched realtime TripUpdate
get merged into a single new TripUpdate
.
// Again, this example has been shortened for readability.
{
"trip": {
"trip_id": "1234567",
"route_id": "17462_700",
},
"stop_time_update": [
{
"stop_id": "de:11000:900073281",
// Note that `arrival` has been filled in from schedule data.
"arrival": {
"time": 1726836060,
},
"departure": {
"time": 1726836420,
"delay": 300,
},
},
{
"stop_id": "de:11000:900073236",
"arrival": {
"time": 1726836360,
"delay": 180,
},
"departure": {
"time": 1726836420,
"delay": 120,
},
},
],
// not part of the GTFS Realtime spec, we just use it for matching and/or debug-logging
[kRouteShortName]: "M77",
}
This whole process, which we call matching, is done continuously for each VDV-454 IstFahrt
received from NATS.
There is a Docker image available:
# Pull the Docker images …
docker pull ghcr.io/opendatavbb/gtfs-rt-feed
docker pull ghcr.io/mobidata-bw/postgis-gtfs-importer:v4 # needed for importing GTFS Schedule data
# … or install everything manually (you will need Node.js & npm).
git clone https://github.com/OpenDataVBB/gtfs-rt-feed.git gtfs-rt-feed
cd gtfs-rt-feed
npm install --omit dev
# install submodules' dependencies
git submodule update --checkout
cd postgis-gtfs-importer && npm install --omit dev
Important
Although gtfs-rt-feed
is intended to be data-source-agnostic, just following the GTFS Schedule and GTFS-RT specs, it currently has some hard-coded assumptions specific to the VBB deployment it has been developed for. Please create an Issue if you want to use gtfs-rt-feed
in another setting.
gtfs-rt-feed
needs access to the following services to work:
- a NATS message queue with JetStream enabled
- a PostgreSQL database server, with the permission to dynamically create new databases (see postgis-gtfs-importer's readme)
- a Redis in-memory cache
gtfs-rt-feed
uses pg
to connect to PostgreSQL; For details about supported environment variables and their defaults, refer to pg
's docs.
To make sure that the connection works, use psql
from the same context (same permissions, same container if applicable, etc.).
gtfs-rt-feed
uses nats
to connect to NATS. You can use the following environment variables to configure access:
$NATS_SERVERS
– list of NATS servers (e.g.localhost:4222
), separated by,
$NATS_USER
&$NATS_PASSWORD
– if you need authentication$NATS_CLIENT_NAME
– the connection name
By default, gtfs-rt-feed
will connect as gtfs-rt-$MAJOR_VERSION
to localhost:4222
without authentication.
We also need to create a NATS JetStream stream called AUS_ISTFAHRT_2
that gtfs-rt-feed
will read (unmatched) VDV-454 AUS
IstFahrt
messages from. This can be done using the NATS CLI:
nats stream add \
# omit this if you want to configure more details
--defaults \
# collect all messages published to these subjects
--subjects='aus.istfahrt.>' \
# acknowledge publishes
--ack \
# with limited storage, discard the oldest limits first
--retention=limits --discard=old \
--description='VDV-454 AUS IstFahrt messages' \
# name of the stream
AUS_ISTFAHRT_2
On the AUS_ISTFAHRT_2
stream, we create a durable consumer called gtfs-rt-feed
:
nats consumer add \
# omit this if you want to configure more details
--defaults \
# create a pull-based consumer (refer to the NATS JetStream docs)
--pull \
# let gtfs-rt-feed explicitly acknowledge all received messages
--ack=explicit \
# let the newly created consumer start with the latest messages in AUS_ISTFAHRT_2 (not all)
--deliver=new \
# send gtfs-rt-feed at most 200 messages at once
--max-pending=200 \
# when & how often to re-deliver a message that hasn't been acknowledged (usually because it couldn't be processed)
--max-deliver=3 \
--backoff=linear \
--backoff-steps=2 \
--backoff-min=15s \
--backoff-max=2m \
--description 'OpenDataVBB/gtfs-rt-feed' \
# name of the stream
AUS_ISTFAHRT_2 \
# name of the consumer
gtfs-rt-feed
Next, again using the NATS CLI, we'll create a stream called GTFS_RT_2
that the gtfs-rt-feed
service will write (matched) GTFS-RT messages into:
nats stream add \
# omit this if you want to configure more details
--defaults \
# collect all messages published to these subjects
--subjects='gtfsrt.>' \
# acknowledge publishes
--ack \
# with limited storage, discard the oldest limits first
--retention=limits --discard=old \
--description='GTFS-RT messages' \
# name of the stream
GTFS_RT_2
gtfs-rt-feed
uses ioredis
to connect to PostgreSQL; For details about supported environment variables and their defaults, refer to its docs.
Make sure your GTFS Schedule dataset is available via HTTP without authentication. Configure the URL using $GTFS_DOWNLOAD_URL
. Optionally, you can configure the User-Agent
being used for downloading by setting $GTFS_DOWNLOAD_USER_AGENT
.
The GTFS import script will
- download the GTFS dataset;
- import it into a separate database called
gtfs_$timestamp_$gtfs_hash
(each revision gets its own database); - keep track of the latest successfully imported database's name in a meta "bookkeeping" database (
$PGDATABASE
by default).
Refer to postgis-gtfs-importer's docs for details about why this is done and how it works.
Optionally, you can
- activate gtfstidy-ing before import using
GTFSTIDY_BEFORE_IMPORT=true
; - postprocess the imported GTFS dataset using custom SQL scripts by putting them in
$PWD/gtfs-postprocessing.d
.
Refer to the import script for details about how to customize the GTFS Schedule import.
export GTFS_DOWNLOAD_URL='…'
# Run import using Docker …
./import.sh --docker
# … or run import using ./postgis-gtfs-importer
./import.sh
Once the import has finished, you must set $PGDATABASE
to the name of the newly created database.
export PGDATABASE="$(psql -q --csv -t -c 'SELECT db_name FROM latest_import')"
Note
If you're running gtfs-rt-feed
in a continuous (service-like) fashion, you'll want to run the GTFS Schedule import regularly, e.g. once per day. postgis-gtfs-importer
won't import again if the dataset hasn't changed.
Because it highly depends on your deployment strategy and preferences on how to schedule the import – and how to modify $PGDATABASE
for the gtfs-rt-feed
process afterwards –, this repo doesn't contain any tool for that.
As an example, VBB's deployment uses a systemd timer to schedule the import, and a systemd service drop-in file to set $PGDATABASE
.
# Run using Docker …
# (In production, use the container deployment tool of your choice.)
docker run --rm -it \
-e PGDATABASE \
# note: pass through other environment variables here
ghcr.io/opendatavbb/gtfs-rt-feed
# … or manually.
# (During development, pipe the logs through `./node_modules/.bin/pino-pretty`.)
node index.js
todo: $LOG_LEVEL
todo: $LOG_LEVEL_MATCHING
todo: $LOG_LEVEL_FORMATTING
todo: $LOG_LEVEL_STATION_WEIGHT
todo: $METRICS_SERVER_PORT
todo: $MATCHING_CONCURRENCY
todo: $MATCH_GTFS_RT_TO_GTFS_CACHING
todo: $MATCHING_CONSUMER_NAME
todo: $MATCHING_PUBLISH_UNMATCHED_TRIPUPDATES
todo: $PG_POOL_SIZE
The example docker-compose.yml
starts up a complete set of containers (vbb-gtfs-rt-server
and all of its dependencies: PostgreSQL & NATS).
Warning
The Docker Compose setup is only intended as a quick demo on how to run gtfs-rt-feed
and its dependency services.
Be sure to set POSTGRES_PASSWORD
, either via a .env
file or an environment variable.
POSTGRES_PASSWORD=my_secret_password docker-compose up
gtfs-rt-feed
writes pino-formatted log messages to stdout
, so you can use pino-compatible tools to process them.
gtfs-rt-feed
exposes Prometheus-compatible metrics via HTTP. By default, the metrics server will listen on a random port. You can configure a permanent port using $METRICS_SERVER_PORT
.
The following kinds of metrics will be exported:
- domain-specific metrics, e.g.
- number of successful/failed/errored matchings
- DB/cache query timings
- technical details about the Node.js process, e.g. the current state of garbage collection
Refer to the Grafana dashboard in VBB's deployment for an example how to visualize gtfs-rt-feed
's metrics.
This project is ISC-licensed.
Note that PostGIS GTFS importer, one of the service's dependencies, is EUPL-licensed.