-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a PPU component #71
base: master
Are you sure you want to change the base?
Conversation
…helpful information to have ; the .trains() method will still work and return the correct information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, as this is well thought including the tests. I don't spot anything obvious, my comments are minor, and partly for my interest.
elif device is not None: | ||
raise KeyError(f"ppu must be a SourceData or str, not {type(device).__name__}") | ||
|
||
# Then we list all PPU device in the run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 'then' is a bit misleading, it sounds like this is logically done after the code above, but this is the default entry point if no device is given by argument (but if it is given, the function will have returned at this point).
It is veeeery minor, but maybe "# Default: we get all PPU devices in the run as checked against hardcoded device classes, respectively substrings"
|
||
Args: | ||
offset (int, optional): | ||
offset to add to the selected trains. Defaults to 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For interest: why would the user want to add an offset? I'm sure there are use cases, just asking
tests/test_components_ppu.py
Outdated
reduced_run = ppu.trains(split_sequence=True) | ||
assert isinstance(reduced_run, list) | ||
assert len(reduced_run) == 3 | ||
assert len(reduced_run[0].train_ids) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure this is overly pedantic / redundant, but maybe:
assert isinstance(reduced_run[0], DataCollection)
in analogy to the split_sequence == False
case
block the XFEL beam when triggered. The rotor is contained within a UHV | ||
chamber. In terms of temporal structure, the beam pipe is blocked by an | ||
absorbing rotor for up to 9/10ths of a second or vice versa, | ||
synchronized to the facility clock/trigger. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very useful information for the docs, I only wonder (naive question) how the 9/10 blocking scheme, i.e. one train per second picked, fits to an overall run where 1 / 309 trains are selected - like run 3379, 50 for instance, shouln't it then be 30 trains with non-consecutive IDs?
split_sequence (bool, optional): Split data per PPU trigger sequence. Defaults to False. | ||
offset (int, optional): offset to apply to train IDs to be selected. Defaults to 0. | ||
|
||
Returns: | ||
Union[DataCollection, List[DataCollection]]: | ||
DataCollection(s) containing only trains triggered by the PPU | ||
""" | ||
data = data or self.data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to make this generic. Strictly speaking, the EXtra component is thus not a PPU component any more, but a "trainSelector" component. I wonder if there should be such class per se. I see that this addition to the trains()
method makes the PPU more flexible, but it also seems a bit awkward to initialize by PPU device information, in order to then feed it with something else, except for comparison. In other words, I am not sure if a generic class with specific methods would be better than a specific class with a generic method by argument 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here. The line you highlighted here is only to make it convenient to use with KeyData
or SourceData
. Otherwise one would need to first filter the DataCollection
and then make source/keydata object. Here you can instantiate all your object at the start of your script and then filter trains for any object as required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ppu = PPU(run)
filtered_run = ppu.trains()
source = filtered_run['awesome_source']
vs
ppu = PPU(run)
source = run['awesome_source']
filtered_source = ppu.trains(source)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. My comment was not about that particular line in the first place, more about the way this method accepts other source data i.e. allowing to do train picking based on a pre-selected source. This may be more convenient indeed. The default filtering i.e. trains()
method without argument is using the PPU device source, if one is actually interested in another source as in your first code block, it is more "natural" to do it the 2nd way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still, on a much higher level, I thought whether it would be good have a very generic "trainPicker" or "trainSelector" component in EXtra, that is, a class that could take PPU or any other source for filtering, already at instantiation level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this comment is not affecting the PR, it is more an outlook to the future. (Maybe it does not make sense, because it lacks a real use case?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not, but do we have other filtering sources/methods at this moment?
train_ids = pd.Series(train_ids, index=sequences) | ||
return train_ids | ||
|
||
def trains( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider a different name for this method? Both in EXtra-data as well as across existing components, we use .trains()
to return an iterator over trains. This method however takes something that has train IDs and changes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Do you have something in mind?
filter
?filter_trains
?train_selection
?select
?select_trains
?pick
?pick_trains
?take
?take_trains
?
I'd probably go for pick_trains
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
select
and select_trains
also have different meanings already (though in the long run, I would love to pass this component to DataCollection.select_trains
, i.e. have some kind of TrainSelector
protocol)
From your list, I would probably also go for pick_trains
, but it's unfortunate one way or the other that it's the "picking-thing" calling on the "thing-to-be-picked" 🤔
Maybe @takluyver can comment on what sounds most natural.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to pass this component to DataCollection.select_trains, i.e. have some kind of TrainSelector protocol
yes, me too!
But do we have an idea of what/how we want to select trains or how this protocol can look like? I can easily think on how to do that specifically for the PPU device, but I can't imagine other cases for now (maybe you have more use cases as part of the data reduction).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we invert the problem: Instead of having a method that selects trains on some object, have a property or method that returns the correspondig extra_data.by_id
slice? Then selection becomes: run.select_trains(ppu.train_sel)
(name obviously preliminary)
Whenever we decide on a universal TrainSelector
protocol, we can simply add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to something like ppu.train_sel
. For the split-sequence option, we could make PPU iterable, giving some sort of TrainSequence or TrainGroup object so you could do [run.select_trains(seq.train_sel) for seq in ppu]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also needs a changelog entry)
- components/scans.md | ||
- components/pulse-patterns.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer if all pages were listed in the index. In general I dislike having disembodied pages because otherwise navigating to them requires that someone knows the right link to click on the right page (e.g. I always get confused when trying to find the xwiz docs).
I have ~mirrored the device lookup to what is done in the XGM component, but realistically, there should not ever be more than 1 PPU per run.~~In the extreme case we could hard code device names for this component for HED (and potentially SPB, where a PPU is also installed, but never used).
There also a device with similar interface for the laser trigger only, so either: