Reading out data in time interval #120

EwoutH · 2024-09-02T15:03:39Z

Let's say I have a model that simulates n hours and I want to collect data every hour. For example, I would like get something like area_to_pandas every hour (in simulation time), for the past hour.

Currently, if I call area_to_pandas(), it calculates everything from the start of the simulation towards the current state.

So I'm curious what's the best way do do such a thing. I already have mechanisms to call it every hour (or n seconds), but I just need a way to only collect data from the last hour and not the full duration.

The text was updated successfully, but these errors were encountered:

EwoutH · 2024-09-03T08:28:21Z

Since detailed network animations can be created, this information is already saved somewhere, right?

toruseo · 2024-09-03T14:20:41Z

@EwoutH thanks for the issues and PRs. I will give you detailed feedback after I get back to work.

Quick comments: The data is mostly saved in Vehicle.log_x and other Vehicle.log_*s. Plz backtrack the code for the details. Your proposed data access (during simulation and limited areas) must be possible. But this is too specific, and at this moment I dont have plan to implemet by myself. Can you try by yourself? They can be implemented by modigying the current Analyzer.*_to_pandas(). If you implement, plz do so by creating new functions in Analyzer to ensure the backward compatibility.

EwoutH · 2024-09-04T14:04:57Z

Thanks for getting back. Since most data is in the vehicle, that first needs to be translated to the links. Are there existing functions/data I can hook into? Then from the links we need to aggerate per area.

So I think:

Translate vehicle data to link data. Use existing functions if possible.
Create a function to "reset" link data.
Request link data every hour, reset afterwards.
Aggregate link data to area data. Use existing functions if possible.

Another approach could be to use the existing, area_to_pandas, but apply it on a copy of the link data, which is reset every hour.

How would you approach it? And are there specific functions I could use?

EwoutH · 2024-09-08T16:45:22Z

I'm going to do a deep dive into this tomorrow. Goal is to create a convenient area_to_pandas_in_timespawn function that takes a start and (optionally) end time and reports statistics over that period of time.

Having #119 merged might help a lot with keeping that fast.

toruseo · 2024-09-11T08:17:01Z

@EwoutH
thanks for your work!

As you might noticed (and I forgot to mention), some link-level data is stored in the following variables.

        s.cum_arrival = []
        s.cum_departure = []
        s.traveltime_actual = []

In #123, it looks like you modified how these variables computed. Unfortunately it broke some important logic of the simulator. This must be the reason of failure of these tests . Is it possible to implement your functions without altering the internal logic? It is preferable (and hopefully easy) if you can implement your functions by only adding new methods to Analyzer class.

EwoutH · 2024-09-11T08:47:07Z

#123 was a rough draft that's nowhere close to ideal, it just (looks like it) works good enough for my research.

Ideally, all that data generated by UXsim should be able to be indexed over time. I see two approaches:

Attach timestamps to each data point. This gives the most resolution and customizable aggregation after a model run, but also is likely to add the most overhead and memory costs.
Collect data in bins. Define some bin width (number of seconds) beforehand, and store all data in dictionary (or similar) structured with the bin start time as index.

If not all data would be collected, you could get away with some duplication of data on time intervals. Just readout the variable the model uses and save them in a separate variable used for data analysis.

#123 was kind of a hybrid solution, in which I stored some timestamps and then aggerated them over some bin width.

It is preferable (and hopefully easy) if you can implement your functions by only adding new methods to Analyzer class.

Unfortunately, it's impossible to do this without making some (small) changes to uxsim.py, since currently the time data just isn't there.

What might be interesting, in the Mesa library we're currently working on a similar challenge, over how to collect data from agent-based models. We're working on a more complicated solution in which certain variables are tracked for state changes. Might not be the best fit for UXsim, but there might be some ideas in that as wel.

MESA and signals projectmesa/mesa#2281
and slightly broader:
The future of data collection projectmesa/mesa#1944

CC @quaquel

toruseo · 2024-09-11T09:00:09Z

cum_arrival and the other lists are indexed by time step number. So if you want to get one on t second, you can get it by something like link.cum_arrival[int(t/W.DELTAT)] where W.DELTAT is the time step width in second.

Unfortunately, I feel sorry to say that changing the internal logic for this specific purpose (as made by #123) is not acceptable, as it critically break backward compatibility, and even putting that aside, we would still need to thoroughly review all the code.

toruseo · 2024-09-11T09:03:35Z

Maybe I can add more comprehensive and user-friendly getter functions

toruseo · 2024-09-11T09:13:02Z

In fact, there are getter-like functions already: Link.arrival_count(t), Link.departure_count(t)

But I believe directly accessing the lists cum_arrival by using slicing cum_arrival[int(t_start/W.DELTAT):int(t_end/W.DELTAT)] would be more convenient and "Pythonic" if you need a sequence of data.

EwoutH · 2024-09-11T10:18:05Z

Ah thanks, I indeed didn't know all these things. That helps a lot (and I wish I knew them before starting the implementation in #123).

I will try to come up with an implementation using slicing, if I can find the time.

But having #119 and #121 merged helps already, that makes the PR diffs smaller, thanks!

EwoutH · 2024-09-12T09:25:51Z

Thanks for all your work yesterday and today!

Would you like to implement this functionality as an example of how users can use the user_function?

toruseo · 2024-09-12T09:43:46Z

If you can implement this time-interval-based and #122 's zone-based methods as Analyzer 's new methods without modifying uxisim.py, it would be great as other people can use the functions easily.

Works using user_function will be highly customized and may not be easy to reuse. But perhaps useful to showcase the ability of uxsim.

EwoutH · 2024-09-13T16:45:52Z

What's the idea behind this check?

UXsim/uxsim/analyzer.py

Line 1221 in 112b425

if s.flag_pandas_convert == 0:

Finding this took over an hour of debugging why my dataframes kept being empty...

EwoutH · 2024-09-13T16:46:21Z

Right, it's probably so that you don't calculate it too many times? But you want to do exactly that if you want to compute it multiple times.

EwoutH · 2024-09-14T07:55:06Z

I thought splitting in bin_width wouldn't be that complicated. Two hours later still don't have a working implementation.

toruseo · 2024-09-16T01:57:19Z

Right, it's probably so that you don't calculate it too many times? But you want to do exactly that if you want to compute it multiple times.

That's true. The original design intention is that these Analyzer functions only run after the simulation finished, so only 1 computation was sufficient.

I think vehicles_to_pandas is very heavy because Vehicle.log_*s are very large lists. If you want to compute it during the simulation multiple times, you need to improve this function

EwoutH mentioned this issue Sep 2, 2024

Data to gather EwoutH/urban-self-driving-effects#4

Open

12 tasks

toruseo added the enhancement New feature or request label Sep 11, 2024

toruseo mentioned this issue Sep 12, 2024

Add customizable user_function to World, Node, Link, and Vehicle #124

Merged

EwoutH mentioned this issue Sep 13, 2024

Analyzer: Add time-filtering functionality #128

Closed

This was referenced Sep 14, 2024

Analyzer: Add time_bin argument to area_to_pandas #130

Draft

Analyzer: Add time-filtering functionality #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading out data in time interval #120

Reading out data in time interval #120

EwoutH commented Sep 2, 2024

EwoutH commented Sep 3, 2024

toruseo commented Sep 3, 2024

EwoutH commented Sep 4, 2024

EwoutH commented Sep 8, 2024 •

edited

Loading

toruseo commented Sep 11, 2024

EwoutH commented Sep 11, 2024 •

edited

Loading

toruseo commented Sep 11, 2024

toruseo commented Sep 11, 2024

toruseo commented Sep 11, 2024 •

edited

Loading

EwoutH commented Sep 11, 2024

EwoutH commented Sep 12, 2024

toruseo commented Sep 12, 2024

EwoutH commented Sep 13, 2024

EwoutH commented Sep 13, 2024

EwoutH commented Sep 14, 2024

toruseo commented Sep 16, 2024

Reading out data in time interval #120

Reading out data in time interval #120

Comments

EwoutH commented Sep 2, 2024

EwoutH commented Sep 3, 2024

toruseo commented Sep 3, 2024

EwoutH commented Sep 4, 2024

EwoutH commented Sep 8, 2024 • edited Loading

toruseo commented Sep 11, 2024

EwoutH commented Sep 11, 2024 • edited Loading

toruseo commented Sep 11, 2024

toruseo commented Sep 11, 2024

toruseo commented Sep 11, 2024 • edited Loading

EwoutH commented Sep 11, 2024

EwoutH commented Sep 12, 2024

toruseo commented Sep 12, 2024

EwoutH commented Sep 13, 2024

EwoutH commented Sep 13, 2024

EwoutH commented Sep 14, 2024

toruseo commented Sep 16, 2024

EwoutH commented Sep 8, 2024 •

edited

Loading

EwoutH commented Sep 11, 2024 •

edited

Loading

toruseo commented Sep 11, 2024 •

edited

Loading