Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Masked array for large detectors #146

Open
jo-moeller opened this issue Feb 16, 2021 · 9 comments
Open

Masked array for large detectors #146

jo-moeller opened this issue Feb 16, 2021 · 9 comments

Comments

@jo-moeller
Copy link

The calibration pipeline provides masks, where different criteria for masking a pixle/storage cell are encoded like this:
grafik
Could the get_dask_array function (for example for AGIPD) be extended, so that it returns the image data with bad pixels already masked ( = np.nan)? For example with an extra parameter mask_bit = [0,1,2,7,8], as normally not for all experiments we need to apply all mask criteria.

@takluyver
Copy link
Member

Thanks, I think that's a reasonable request.

In the meantime, I think it should be possible to load the mask data and apply it something like this:

img_data = agipd.get_dask_array('image.data')
mask_data = agipd.get_dask_array('image.mask')
mask = mask_data & 391  # 391 = bits 0, 1, 2, 7, 8 set
img_data = da.where(mask, np.nan, img_data)

I haven't tried this, though. If you do try it, let me know how it goes, because the implementation in EXtra-data would probably look similar to that.

@jo-moeller
Copy link
Author

Thanks.
One thing that I didn't specify: A pixel is considered bad if at least one of the bits is True, not all of them.

I can confirm that this works for example:

arr = agp.get_dask_array('image.data')
mask = agp.get_dask_array('image.mask')
arr.data[(mask.data > 0) & (mask.data<8)] = np.nan

if external_mask is not None:
    external_mask = external_mask.astype('bool')
    arr = arr.where(~external_mask[:,None,:,:])

arr = arr.unstack('train_pulse')

This obviously only takes into account the first three bits only. Plus, an additional mask is applied as well, which for example masks shadowed regions on the detector. Idk if this can be added as well or goes too far.

@jo-moeller
Copy link
Author

Any decisions/progress no this one?

@takluyver
Copy link
Member

I think that applying the mask data from the file while loading the data is a feature we can add, but I'm afraid I haven't made a start on incorporating it yet.

Do you know if the example you shared before performs well? I'm trying to work out what Dask does with in-place modification (arr.data[...] = np.nan), and whether that's a good way to apply the mask or whether we should be doing something smarter.

I'm inclined to leave out applying an external mask, because it should be simple to do that outside EXtra-data after getting the data, and that avoids having to specify precisely what format(s) of masks it accepts (how many dimensions, 0 good or 0 bad, etc.). But that's open for discussion if there's some use case I'm overlooking.

@takluyver
Copy link
Member

I looked into the dask.array code, and it appears that arr[mask] = np.nan is roughly equivalent to arr = da.where(mask, arr, np.nan), which should be OK.

@jo-moeller
Copy link
Author

I didn't run in any obvious performance issues with the example code. But to be honest, I also didn't investigated it thoroughly. I think that da.where is a bit easier if there is a dimension missing.

Leaving out the "mask from file" is also OK, this would be more an extra good to have.

More important would be to have an easy way to select certain mask bits that should be masked in the array.

@jo-moeller
Copy link
Author

Hey,
any news on this one?
Best, Johannes

@JamesWrigley
Copy link
Member

This is on my todo-list, but I haven't quite found the time for it yet :\

@takluyver
Copy link
Member

This came up again as #308. We're keeping this issue, but wanted to record from there:

The BadPixels enum will be added as part of the CorrectionData interface, and we should make sure there are easy ways to set individual bits and/or retrieve the data already in such a way.

I'd imagine this would look like an enum.IntFlags object, and .get_array() and .get_dask_array accepting values from it like this:

agipd.get_array('image.data', mask_out=(
    BadPixels.OFFSET_OUT_OF_THRESHOLD | BadPixels.VALUE_IS_NAN
))

And probably a shortcut like BadPixels.ALL to mask out anything with a non-zero mask value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants