-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading images takes too long #26
Comments
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). What resolution are your images? You might find that 4000 images might take up more memory than your computers have. An image cache is not a bad idea, but it would only be of use if the images fitted in memory, otherwise they'd be swapping out to disk anyway. Furthermore, if you're using the same images repeatedly, your operating system is likely to be caching the files already to the best of its ability, and if it's still very slow it's evidence that perhaps they don't all fit in RAM. Lyse does have a place you can put things to be kept from one run of your analysis to the next, but if you remove and re-add the analysis routine it is not kept. But, as a test, you could try caching your images in the analysis subprocess with something like this:
If that speeds things up then looking further into providing a persistent, cross-process cache like that dataframe might be useful. Otherwise perhaps not. |
Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd). They are 512x512 pixels or less so memory shouldn't be a problem. This minimal solution already came to mind, but it would keep images in memory even after the were deleted from the lyse Filebox. This could get really full from one measurement to the next, when not emptied. I will definitely give your solution a try though and maybe add some logic to remove shots that are not in the current dataframe anymore. |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). I think I'd broadly support a feature like that - or a way to cache datasets generally regardless of what they are. They can't go in the dataframe easily, as the dataframe is serialised and deserialised repeatedly, but we could with not too much difficulty make a cache that send the images in a binary format - I've sent numpy arrays over zeroMQ sockets (which we use) before and that works well. |
Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd). Thats why I said dataframe-like because I wanted something that behaved in the way the dataframe currently does in the following ways:
It's not really a urgent thing, but just something I wanted to throw out there for discussion. This could have the potential of speeding up things quite a bit when repeatedly using images or other data thats not in the dataframe. |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). Yep, fair enough! |
Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd). Ok so I've been playing around with this and here is what I came up with till now:
This is a generic StorageServer, that stores data in nested dicts. For a proof of concept I modified the Run.get_images:
and also implemented a get_images function for Multishot routines:
Singleshot routines are not really effected when it comes to speed as they usually just run once. When using get_images instead of something like {run.get_image(...) for path, run in seq.runs.items()} this holds a great speed increase (if the storage has already been filled). So we still have to solve the problem of filling the storage upon loading a shot in a away that keep this feature optional for everyone who doesn't need it (so their memory doesn't suffer). Any ideas how this could be done? I'm also open to the idea of running a singleshot script, that does nothing but adding the images to the cross routine storage as this is a simple fix for our problem and doesn't bloat everyones memory. But non the less we would need a API for the storage. Any thoughts or ideas for improvement? |
Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd). Did a bit of playing around with the storage. |
Original comment by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd). I created a branch for my corss routine storage/cached images and runs stuff over on my repo.(here) The storage is a extension to Lyses server. The storage is made up of nested dicts, allowing hierarchical indexes and stores anything that zmq_get will send. One can save, get and delete enteries from storage. I also added the option to automatically cache images (on first load) and multishot sequence runs (on first creation). This gives me a drastic speed increase in my multishot scripts. If a shot gets removed from lyse the cached entries from that shot are removed as well to reduce memory usage. We are already running lyse with these changes in our lab. Before caching one of our scripts using images (with 2000 shots loaded equaling 4000 images) ran for 17 seconds after the change it is running sub 1 second. The options for caching(on/off and timeout) are currently a variable that is hardcoded in the init file. I'm not that happy with having it hardcoded but also not sure where to put it. I would personally opt for putting it into labconfig. Any ideas for improvement or ideas on where to put the option to enable/disable caching? I'd also like to create a pull request for this sometime in the near future (but have too many open pull requests at the moment as is) so input is much appreciated. |
Original report (archived issue) by Jan Werkmann (Bitbucket: PhyNerd, GitHub: PhyNerd).
When loading a lot of images (about 4000 or more is quite uncommon in our lab) in multishot routines this takes a great amount of time.
It would be nice if there was a second dataframe-like variable that stored the images so that loading them becomes quick.
The text was updated successfully, but these errors were encountered: