Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New function stop_times_to_frequencies() #69

Open
rafapereirabr opened this issue Feb 18, 2023 · 4 comments
Open

New function stop_times_to_frequencies() #69

rafapereirabr opened this issue Feb 18, 2023 · 4 comments

Comments

@rafapereirabr
Copy link
Member

rafapereirabr commented Feb 18, 2023

A function that creates a frequency.txt file based on stop_times.txt. This would help address a common issue of r5r users, see this issue.

@Kaushal333
Copy link

Sorry. But how do I create this function "that creates a frequency.txt file based on stop_times.txt"? Is there any R code link to this function(if it has already been already created)?
The hyperlink on "issue" takes me back to my question page.

In GTFS tools page, i could not find it.

@rafapereirabr
Copy link
Member Author

rafapereirabr commented Feb 18, 2023 via email

@Hussein-Mahfouz
Copy link

I followed ipeaGIT/r5r#321 here as I was also interested in the time_window() argument in r5r::travel_time_matrix. I found a get_route_frequency() function in the tidytransit package that looks useful.

The logic could be used to get the headway of all routes at different time periods and combine that together to create a frequencies.txt file

I might try to create a stop_times_to_frequency function and will comment here if I do. In any case, I thought it could be useful to mention the tidytransit function here

@Hussein-Mahfouz
Copy link

I have a draft stop_times_to_frequencies() function below. Some notes on the logic:

  • I assign each trip from stop_times to a time interval. These time intervals are a custom input so you can have the headway of a trip at specified time intervals during the day
  • The trip_id in a gtfs refers to a unique vehicle departure, so trips that go on the same itinerary from A to B (ie buses going on bus route x with a specific trip_headsign/direction) have different trip_ids.
  • To get headway_secs, we need to identify and group these trips somehow. For each trip, I create a column that has the stop_ids in the order that they are visited by the trip (below I call this column stop_id_order.
  • I also join the service_id to the stop_times. Different service_ids reflect the same trip at different days, so a trip will be repeated multiple times in stop_times.txt. If we don't group by service_id, then we will add together trips that are on different days, which would overinflate our headway_secs.
  • I then group by stop_id_order, service_id, start_time, end_time and get the number of departures, which is used to get the headway.
  • I add the frequencies file to the feed and filter the feed so that it only keeps the trip_ids that are in the frequencies file
  • I don't know if any edits should be made to the stop_times.txt (or if it should be removed), but I keep it as is
stop_times_to_frequencies <- function(gtfs,
                                      time_ranges = tibble(start_time = c("00:00:00", "09:00:00", "12:00:00", "19:00:00"),
                                                           end_time =   c("09:00:00", "12:00:00", "19:00:00", "23:59:00"))){
  # PURPOSE: convert a stop_times based feed to a frequency based feed
  # INPUT:
  # gtfs: the gtfs feed you want to edit
  # time_ranges: the day is split into multiple time slots. We calculate the frequency of trip in each of these slots
  #              format: a tibble with columns "start_time" and "end_time"
  # OUTPUT
  # a frequency based gtfs feed

  message(" ... converting time ranges to hms ... ")
  # ----- convert time ranges to hms
  time_ranges <- time_ranges %>%
    mutate(across(everything(), hms::as_hms))


  # --- Calculate the headway

  # 1. identify trips with the same itinerary. Trip IDs are unique for every departting vehicle,
  #    so we group vehicles that have the same stop sequence

  # Create a column to identify same trips (trips with the same stop sequence)
  message("... identifying trips with same stop sequence ...")

  trips_stop_sequence <- gtfs$stop_times %>% group_by(trip_id) %>%
    mutate(stop_id_order = paste0(stop_id, collapse = '-')) %>%
    ungroup()

  # keep only one row per unique trip
  trips_stop_sequence <- trips_stop_sequence %>%
    # we use stop_sequence == min(stop_sequence) instead of == 0, as stop_sequence doesn't have to start from 0
    filter(stop_sequence == min(stop_sequence)) %>%
    # some arrival times are bigger than 24 - these cause errors when converting to time
    filter(as.character(arrival_time) <= "23:59:59") %>%
    mutate(arrival_time = hms::as_hms(arrival_time))

  # 2. Assign a time range to each trip based on the departure from the first stop
  message("... assigning trips to time ranges ... ")

  trips_time_ranges <- trips_stop_sequence %>%
    inner_join(time_ranges,
               join_by(arrival_time >= start_time, arrival_time < end_time))

  # 3. Get the headway of each trip

  # add the service_id to each trip
  trips_time_ranges <- trips_time_ranges %>%
    left_join(gtfs$trips %>% select(trip_id, service_id), by = "trip_id")

  # calculate number of buses for each unique trip + time range + service_id combination
  message("... calculating headways ... ")

  trips_headways <- trips_time_ranges %>%
    group_by(stop_id_order, service_id, start_time, end_time) %>%
    summarise(vehicles = n(),
              # we don't need all of the trip IDs (all trips in the same group have the same itinerary)
              trip_id = first(trip_id)) %>%
    ungroup()

  # get the headway (time between buses = time period / no. of buses)
  trips_headways <- trips_headways %>%
    mutate(headway_secs = round(as.numeric(end_time - start_time) / vehicles)) %>%
    # keep necessary columns only
    select(trip_id, start_time, end_time, headway_secs)


  # 4. edit the gtfs feed to produce a frequency-based feed
  message("... replacing stop_times with frequencies ... ")

  # # remove stop_times file
  # gtfs <- gtfs[names(gtfs) != "stop_times"]

  # add "frequencies" file to the gtfs feed
  gtfs$frequencies <- trips_headways

  # filter feed to only keep the trip ids in the new "frequencies" file
  gtfs <- gtfs %>%
    #gtfstools::filter_by_trip_id(trip_id =  .$frequencies$trip_id)
    tidytransit::filter_feed_by_trips(trip_ids = .$frequencies$trip_id)

  return(gtfs)

}

The function works with gtfs feeds read in using both gtfstools and tidytransit. however, with gtfstools, the filter_feed_by_trips() function raised an error

Error in vapply(x[[file]], class, character(1)) :
values must be length 1,
but FUN(X[[2]]) result is length 2

I haven't tried to debug this yet as the filtering function in tidytransit was working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants