Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading out data in time interval #120

Open
EwoutH opened this issue Sep 2, 2024 · 16 comments
Open

Reading out data in time interval #120

EwoutH opened this issue Sep 2, 2024 · 16 comments
Labels
enhancement New feature or request

Comments

@EwoutH
Copy link
Contributor

EwoutH commented Sep 2, 2024

Let's say I have a model that simulates n hours and I want to collect data every hour. For example, I would like get something like area_to_pandas every hour (in simulation time), for the past hour.

Currently, if I call area_to_pandas(), it calculates everything from the start of the simulation towards the current state.

So I'm curious what's the best way do do such a thing. I already have mechanisms to call it every hour (or n seconds), but I just need a way to only collect data from the last hour and not the full duration.

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 3, 2024

Since detailed network animations can be created, this information is already saved somewhere, right?

@toruseo
Copy link
Owner

toruseo commented Sep 3, 2024

@EwoutH thanks for the issues and PRs. I will give you detailed feedback after I get back to work.

Quick comments: The data is mostly saved in Vehicle.log_x and other Vehicle.log_*s. Plz backtrack the code for the details. Your proposed data access (during simulation and limited areas) must be possible. But this is too specific, and at this moment I dont have plan to implemet by myself. Can you try by yourself? They can be implemented by modigying the current Analyzer.*_to_pandas(). If you implement, plz do so by creating new functions in Analyzer to ensure the backward compatibility.

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 4, 2024

Thanks for getting back. Since most data is in the vehicle, that first needs to be translated to the links. Are there existing functions/data I can hook into? Then from the links we need to aggerate per area.

So I think:

  1. Translate vehicle data to link data. Use existing functions if possible.
  2. Create a function to "reset" link data.
  3. Request link data every hour, reset afterwards.
  4. Aggregate link data to area data. Use existing functions if possible.

Another approach could be to use the existing, area_to_pandas, but apply it on a copy of the link data, which is reset every hour.

How would you approach it? And are there specific functions I could use?

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 8, 2024

I'm going to do a deep dive into this tomorrow. Goal is to create a convenient area_to_pandas_in_timespawn function that takes a start and (optionally) end time and reports statistics over that period of time.

Having #119 merged might help a lot with keeping that fast.

@toruseo
Copy link
Owner

toruseo commented Sep 11, 2024

@EwoutH
thanks for your work!

As you might noticed (and I forgot to mention), some link-level data is stored in the following variables.

        s.cum_arrival = []
        s.cum_departure = []
        s.traveltime_actual = []

In #123, it looks like you modified how these variables computed. Unfortunately it broke some important logic of the simulator. This must be the reason of failure of these tests . Is it possible to implement your functions without altering the internal logic? It is preferable (and hopefully easy) if you can implement your functions by only adding new methods to Analyzer class.

@toruseo toruseo added the enhancement New feature or request label Sep 11, 2024
@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 11, 2024

#123 was a rough draft that's nowhere close to ideal, it just (looks like it) works good enough for my research.

Ideally, all that data generated by UXsim should be able to be indexed over time. I see two approaches:

  • Attach timestamps to each data point. This gives the most resolution and customizable aggregation after a model run, but also is likely to add the most overhead and memory costs.
  • Collect data in bins. Define some bin width (number of seconds) beforehand, and store all data in dictionary (or similar) structured with the bin start time as index.

If not all data would be collected, you could get away with some duplication of data on time intervals. Just readout the variable the model uses and save them in a separate variable used for data analysis.

#123 was kind of a hybrid solution, in which I stored some timestamps and then aggerated them over some bin width.

It is preferable (and hopefully easy) if you can implement your functions by only adding new methods to Analyzer class.

Unfortunately, it's impossible to do this without making some (small) changes to uxsim.py, since currently the time data just isn't there.


What might be interesting, in the Mesa library we're currently working on a similar challenge, over how to collect data from agent-based models. We're working on a more complicated solution in which certain variables are tracked for state changes. Might not be the best fit for UXsim, but there might be some ideas in that as wel.

CC @quaquel

@toruseo
Copy link
Owner

toruseo commented Sep 11, 2024

cum_arrival and the other lists are indexed by time step number. So if you want to get one on t second, you can get it by something like link.cum_arrival[int(t/W.DELTAT)] where W.DELTAT is the time step width in second.

Unfortunately, I feel sorry to say that changing the internal logic for this specific purpose (as made by #123) is not acceptable, as it critically break backward compatibility, and even putting that aside, we would still need to thoroughly review all the code.

@toruseo
Copy link
Owner

toruseo commented Sep 11, 2024

Maybe I can add more comprehensive and user-friendly getter functions

@toruseo
Copy link
Owner

toruseo commented Sep 11, 2024

In fact, there are getter-like functions already: Link.arrival_count(t), Link.departure_count(t)

But I believe directly accessing the lists cum_arrival by using slicing cum_arrival[int(t_start/W.DELTAT):int(t_end/W.DELTAT)] would be more convenient and "Pythonic" if you need a sequence of data.

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 11, 2024

Ah thanks, I indeed didn't know all these things. That helps a lot (and I wish I knew them before starting the implementation in #123).

I will try to come up with an implementation using slicing, if I can find the time.

But having #119 and #121 merged helps already, that makes the PR diffs smaller, thanks!

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 12, 2024

Thanks for all your work yesterday and today!

Would you like to implement this functionality as an example of how users can use the user_function?

@toruseo
Copy link
Owner

toruseo commented Sep 12, 2024

If you can implement this time-interval-based and #122 's zone-based methods as Analyzer 's new methods without modifying uxisim.py, it would be great as other people can use the functions easily.

Works using user_function will be highly customized and may not be easy to reuse. But perhaps useful to showcase the ability of uxsim.

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 13, 2024

What's the idea behind this check?

if s.flag_pandas_convert == 0:

Finding this took over an hour of debugging why my dataframes kept being empty...

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 13, 2024

Right, it's probably so that you don't calculate it too many times? But you want to do exactly that if you want to compute it multiple times.

@EwoutH
Copy link
Contributor Author

EwoutH commented Sep 14, 2024

I thought splitting in bin_width wouldn't be that complicated. Two hours later still don't have a working implementation.

@toruseo
Copy link
Owner

toruseo commented Sep 16, 2024

Right, it's probably so that you don't calculate it too many times? But you want to do exactly that if you want to compute it multiple times.

That's true. The original design intention is that these Analyzer functions only run after the simulation finished, so only 1 computation was sufficient.

I think vehicles_to_pandas is very heavy because Vehicle.log_*s are very large lists. If you want to compute it during the simulation multiple times, you need to improve this function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants