-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ways of speeding up cycle time. Pipelining, readahead, removing redundant steps. #53
Comments
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
|
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
|
1 similar comment
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
|
Original comment by Philip Starkey (Bitbucket: philipstarkey, GitHub: philipstarkey). I suspect another slow point is the NI cards with multiple worker processes (which is most of them I think). There is no particular reason why communication with each worker process needs to be serialised, other than the fact that it’s a bit more complicated to implement. To change this we would need to rewrite the mainloop in the tab base class (maybe taking advantage of some Python3 coroutine features?). The This will be particularly effective if we cache the HDF5 file. |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). Ah, that's a good point. I think that by itself could be accommodated in the current framework by just having the coroutine yield a dict of jobs to do for each worker, instead of just one - then the mainloop can wait on them all simultaneously with a I'm currently a bit averse to async/await when threads suffice. I investigated it for the new zlock server and it was a) overkill in terms of complexity and b) not very performant. What we're doing with the |
Original comment by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington). Progress on the h5py GIL front! It sounds like soon threading will be sufficient to do HDF5 I/O blocking other threads. Good. Writing a server just to do HDF5 writes would be so far from what HDF5 is supposed to be that we might as well be using a traditional database at that point... Edit: Just tested with development h5py from github, and indeed threads can run during IO! This is great. It will be in the next release, which might be early to mid 2020 judging by their past releases. |
Original report (archived issue) by Chris Billington (Bitbucket: cbillington, GitHub: chrisjbillington).
There are a few ways we might speed up the overall rate at which shots are run. Some are pretty invasive so it's not a minor change, but the basic idea is to split up
transition_to_buffered()
andtransition_to_manual()
into multiple steps, and a) only call the ones that are necessary, and b) call the ones that are not dependent on previous steps simultaneously.So for example,
transition_to_manual()
could be split into:program_manual
could be skipped unless the queue is paused)transition_to_buffered()
could be split into:Running as many of these steps as possible simultaneously, and skipping unnecessary ones could go some way to speeding up the cycle time of BLACS. In the ideal case, devices that are retriggerable with the same date will not need any reconfiguration in between shots, and will contribute no overhead.
Profiling will reveal what the actual overhead is. If after fixing the above sources of overhead (if they are what's dominating), it turns out that opening and closing HDF5 files is the slow thing, then we can have some kind of intelligent "readahead" in one hit in a single process as soon as the shot arrives in BLACS - knowing based on previous shots what groups and datasets a particular driver opened, the data can be read all ahead of time and the worker process will see a proxy HDF5 file object which requires no zlock to open, and which already has all data available, only opening the actual shot file if the driver attempts to read a group that was not read in advance. This would consume more RAM, so should be disableable of course.
These are the sorts of optimisation we could do, but before any of it I would want to do profiling, marking particular functions and when they were called, and getting some statistics to see where the bottlenecks are.
The text was updated successfully, but these errors were encountered: