Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling uncertainty in travel time calculations #11

Closed
Hussein-Mahfouz opened this issue Aug 25, 2023 · 6 comments
Closed

Handling uncertainty in travel time calculations #11

Hussein-Mahfouz opened this issue Aug 25, 2023 · 6 comments

Comments

@Hussein-Mahfouz
Copy link
Owner

How to address travel time uncertainty when calculating travel time matrices? For background, see:

The time_window parameter (in combination with the percentile parameter) is ideal, but it can only be used with frequency-based gtfs feeds. From the vignette:

Please keep in mind that the time_window only affects the results when the GTFS feeds contain a frequencies.txt table.

Solution using time_window parameter in r5r

One solution is to create a function to convert stop_times to frequency, and use that to edit the gtfs feeds so that they are frequency based feeds

See my comment ipeaGIT/gtfstools#69 (comment) for getting started on the function, and ipeaGIT/r5r#282 (comment) to understand how r5 handles the time_window argument when you are using a gtfs feed without a frequencies.txt file

Hacky manual solution

We can pass different departure times to the travel_time_matrix function (e.g for 8:00am, use 7:55, 8:00, 8:05). This is a hacky way of recreating the time_window functionality, and it will definitely be lot slower

@Hussein-Mahfouz
Copy link
Owner Author

Hussein-Mahfouz commented Aug 25, 2023

I am trying to create a frequencies.txt file so that the routing can use the time_window() parameter.

I tried to use the get_route_frequency() function in tidytransit, but it depends on having a direction_id column in the trips.txt file. This is an optional column in the gtfs feed, and is not present in BODS data

I tried to create the column by grouping trips by route_id and service_id, with the expectation that there should be two trips in each group, and I can give them 0 / 1 values, but turns out there are routes with more than 2 trips:

image

I tried to plot these trips to see how they are different. Here is a facet plot (by trip_id):

image

It looks like 2 are the same (they even have the same stop sequence not opposite which seems wrong to me). The other 3 are all different

Based on these results, I think I should treat each trip separately if I were to calculate frequencies from stop_times (and ignore the route level logic used in get_route_frequency() ). This is more in line with the gtfs frequencies.txt, which has the following columns: trip_id | start_time | end_time | headway_secs

@Robinlovelace

This comment was marked as resolved.

@Hussein-Mahfouz

This comment was marked as off-topic.

@Robinlovelace

This comment was marked as resolved.

@Hussein-Mahfouz
Copy link
Owner Author

stop_times_to_frequencies() is a difficult function to implement.

  • A frequencies.txt file has trip_id | start_time | end_time | headway_secs.
  • In the DfT gtfs feeds (as with most other feeds), the trip_id is unique to one departure on a specific route; if 10 buses have the sameHow do you group trips to get headway_secs? route_id and direction_id, they will still have 10 different trip_ids. How do you group trips to get headway_secs?
  • I tried grouping trips by looking at their stop_sequence and creating a column that had the stop_ids in order.
    trips_stop_sequence <- gtfs$stop_times %>% group_by(trip_id) %>%
    mutate(stop_id_order = paste0(stop_id, collapse = '-')) %>%
    ungroup()

    This column could then be used to grup trips that run on the same exact itinerary. We could then get the number of vehicles and calculate a headway
  • This solution doesn't account for the service_id parameter. Different service_ids reflect the same trip at different days, so a trip will be repeated multiple times in stop_times.txt. This means our calculated headway_secs is overinflated and innacurate. How do we calculate headway while accounting for different services?:
    • We could filter the feed to a specific date. We could then use the above logic normally. I don't like this solution as it reduces the data in a feed from weeks / months to a single day.
    • We could get the headway for each trip + service combination. We can join the service_id column to the stop_times file, and then group by service_id + stop_id_order (the column we created to identify unique trips)

@Hussein-Mahfouz
Copy link
Owner Author

One important thing to note is that the time_window parameter in r5r DOES work with feeds that don't have a frequencies.txt file. here are the results of using the expanded_travel_time_matrix function with a 30 minute time_window

image

For the same departure time, the results are the same for each draw_number. However, if our time_window = 30, we have 30 different departure times for each OD pair, and each one has a different travel_time.

The percentiles argument also works, as shown here:

image

The reason they say frequencies.txt is needed is in order to simulate changes in the start time. That would lead to different draws for the same OD pair having different travel times. For our purposes this is not necessary.

What this means is that a stop_times_to_frequencies() function is not necessary for our purposes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants