Synergies with LiberTEM, UDFs vs. map_blocks #737

din14970 · 2021-03-02T16:55:05Z

I'm posting this mainly as a point of discussion.

In my experience, indexing real and large datasets (whichever method is used) is still quite slow, even with whatever I implemented in #673. These things should not take more than a few minutes on a cluster if functions could make use of GPU's, which is something I'd like to look into - e.g. with Cupy. More than that however is that I feel the functionality we write here to perform operations on datasets using dask.array.map_blocks requires a lot of boilerplate: first write a function to operate on a frame, then a function that loops over all the frames, then wrap that in another thing. After a discussion with Dieter Weber I recently started looking into LiberTEM's UDF API, which seems like a much more natural, granular and scaleable approach to operating on large datasets. I'm in the process of experimenting a bit with this to see if I can get implement "fast" template matching as a LiberTEM UDF, hopefully also with CUDA support, so we could run it on our GPU cluster.

I'm wondering if there are any longer term ideas about using the LiberTEM API inside Pyxem or porting the Pyxem functionality to LiberTEM. Hence opening the issue here for a discussion.

The text was updated successfully, but these errors were encountered:

magnunor · 2021-03-02T17:47:20Z

Doesn't the dask_tools._process_dask_array already do this? Example:

import hyperspy.api as hs
from scipy.ndimage import gaussian_filter
from pyxem.utils.dask_tools import _process_dask_array
s = hs.load("test_data.hspy", lazy=True) # 4-dimensional dataset
data_gaussian = _process_dask_array(s.data, gaussian_filter, sigma=20) # Get dask array as output and retain chunk sizes

It can easily be extended to a .map like functionality, for example via the hyperspy/hyperspy#2617 pull request.

One advantage is that data_gaussian is a dask array with the chunks retained. So we can easily combine this wdth further processing, by "layering" additional processing steps on-top:

import pyxem as pxm
s_gaussian = pxm.signals.Diffraction2D(data_gaussian).as_lazy()
s_gaussian.unit = "2th_deg"
s_gaussian.set_ai()
s_ai = s_gaussian.get_azimuthal_integral1d(npt=100) # Still lazy signal
s_ai.save("test_azimuthal.hspy")

pc494 · 2021-03-02T18:02:42Z

There isn't a really a short answer to these questions, but the last time Dieter and I spoke the conclusion we reached was that there are a lot of unexploited synergies between the work being done with LiberTEM and hyperspy and pyxem. I think in the short-medium term they (LiberTEM) were hoping to improve hyperspy's lazy signal handling. As I've recently discovered @CSSFrancis also has pull requests open with hyperspy to a similar effect (upstreaming some of are dask processing code) and that seems to be the direction things are going.

In terms of the use case @din14970 is describing, if the functionality was rewritten into a LiberTEM UDF I would be very happy to merge it into pyxem. I have no objections to depending on LiberTEM (but similarly, there is little enthusiasm from either us or LiberTEM to "merge" the two packages into one) and it may well be faster than using dask for most users (although I've not personally run experiments to confirm this).

din14970 · 2021-03-02T18:35:26Z

@magnunor Actually wasn't aware of dask_tools._process_dask_array so that's a nice tip. However, in my specific case I suspect your implementation would be slower because the _process_chunk function relies on a python for-loop which locks up the GIL when you call compute, whereas my "apply to chunk" function is njit-ed and should be able to bypass the GIL entirely, though I must admit I have not benchmarked it. The main points I am thinking about are:

scaleability to multi-cpu, multi-gpu, multi-node systems, which LiberTEM was designed to exploit from the start. I know dask is also made for this with distributed, and you could pass the dask array you created to a dask Client to compute it. LiberTEM as far as I understand also uses dask distributed under the hood. I'm wondering how vanilla dask using an array generated with dask_tools would measure up against LiberTEM.
analysis of stream data. While I don't know how it works in practice, the LiberTEM guys assured me LiberTEM can analyze data simultaneously as it is being written do disk at the microscope. I was thinking that would be an interesting platform to build live orientation indexing on for example.

@pc494

but similarly, there is little enthusiasm from either us or LiberTEM to "merge" the two packages into one

As far as I understood, LiberTEM is seeing themselves less and less as a 4D-STEM analysis package, as they are trying to focus on the speedy calculation aspect and trying to push the application specific stuff out to Hyperspy/Pyxem.

pc494 · 2021-03-02T19:04:58Z

@pc494

but similarly, there is little enthusiasm from either us or LiberTEM to "merge" the two packages into one

As far as I understood, LiberTEM is seeing themselves less and less as a 4D-STEM analysis package, as they are trying to focus on the speedy calculation aspect and trying to push the application specific stuff out to Hyperspy/Pyxem.

Yes, that's certainly my impression, and it seems like an excellent way of allocating the skills we have at present in the community(ies)

hakonanes · 2021-03-03T08:57:55Z

We have plans to try out LiberTEM's UDFs in kikuchipy as well (pyxem/kikuchipy#321), so I'm interested in seeing it goes, @din14970!

As Magnus mentions, the "interactivity" of dask array signals is really nice and practical. I'm thinking that using a combination of dask and LiberTEM might be good, in that LiberTEM can do some specialized computations, while dask does all the rest.

magnunor · 2021-03-03T16:03:25Z

With regards to _process_dask_array, it is currently implemented in a fairly "general" way. So that any function which processes a single 2D image can be used. However, there are probably several optimizations which can be done. For example, many functions can handle any number of dimensions. So the for loop could be skipped in many cases, by slightly rewriting the _process_chunk function. This is something which should be pretty quick to implement.

However, in my specific case I suspect your implementation would be slower because the _process_chunk function relies on a python for-loop which locks up the GIL when you call compute, whereas my "apply to chunk" function is njit-ed and should be able to bypass the GIL entirely, though I must admit I have not benchmarked it.

Give it a shot! I'm very curious to see how it handles it.

One thing to keep in mind when benchmarking very big datasets, is doing the "full" processing from the harddrive to the end result. It is sometimes possible to over-optimize things in one part of the processing, while the real bottleneck is somewhere else.

analysis of stream data. While I don't know how it works in practice, the LiberTEM guys assured me LiberTEM can analyze data simultaneously as it is being written do disk at the microscope. I was thinking that would be an interesting platform to build live orientation indexing on for example.

I did a little work on this earlier, with fpd_live_imaging. However, I haven't really had time to maintain it... https://fast_pixelated_detectors.gitlab.io/fpd_live_imaging/

CSSFrancis · 2021-03-03T17:30:03Z

scaleability to multi-cpu, multi-gpu, multi-node systems, which LiberTEM was designed to exploit from the start. I know dask is also made for this with distributed, and you could pass the dask array you created to a dask Client to compute it. LiberTEM as far as I understand also uses dask distributed under the hood. I'm wondering how vanilla dask using an array generated with dask_tools would measure up against LiberTEM.

It's been a little while since I've looked at LiberTEM but it does seem like they have a lot better handle on dealing with large datasets than hyperspy does (partially because I don't think Hyperspy has really had to deal with them until recently). For operating using multiple cores on a cluster, I've recently been playing around with dask-jobqueue which might be of interest if you are trying to schedule jobs on a cluster through Dask.

I wouldn't say that hyperspy/hyperspy#2617 is going to completely fix the map function in hyperspy but I think it is a good start. I'll try and benchmark it's ability to run in parallel hopefully later today.

ericpre · 2021-03-04T23:03:27Z

On this topic, dask seems to be playing very well with the RAPIDS and the ucx framework https://medium.com/rapids-ai/high-performance-python-communication-with-ucx-py-221ac9623a6a

Also, NEP35 will be very convenient to write code which works for cupy, numpy, dask/numpy or dask/cupy arrays. I have started experimenting with a branch of hyperspy which supports cupy array and it works well without changing much of the code.

In my experience, indexing real and large datasets (whichever method is used) is still quite slow, even with whatever I implemented in #673. These things should not take more than a few minutes on a cluster if functions could make use of GPU's, which is something I'd like to look into - e.g. with Cupy. More than that however is that I feel the functionality we write here to perform operations on datasets using dask.array.map_blocks requires a lot of boilerplate:

As @CSSFrancis said, this shouldn't be necessary, if the hyperspy map function was improved.

pc494 · 2023-01-16T14:15:52Z

Although this was never resolved, I am closing as old+discussion.

hakonanes added the discussion label Mar 3, 2021

CSSFrancis mentioned this issue Mar 11, 2021

Map Function Wish List hyperspy/hyperspy#2669

Open

ericpre mentioned this issue Mar 17, 2021

Support CUDA GPU accelerated template matching #743

Merged

3 tasks

pc494 closed this as completed Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synergies with LiberTEM, UDFs vs. map_blocks #737

Synergies with LiberTEM, UDFs vs. map_blocks #737

din14970 commented Mar 2, 2021

magnunor commented Mar 2, 2021

pc494 commented Mar 2, 2021

din14970 commented Mar 2, 2021

pc494 commented Mar 2, 2021

hakonanes commented Mar 3, 2021

magnunor commented Mar 3, 2021

CSSFrancis commented Mar 3, 2021

ericpre commented Mar 4, 2021

pc494 commented Jan 16, 2023 •

edited

Loading

Synergies with LiberTEM, UDFs vs. map_blocks #737

Synergies with LiberTEM, UDFs vs. map_blocks #737

Comments

din14970 commented Mar 2, 2021

magnunor commented Mar 2, 2021

pc494 commented Mar 2, 2021

din14970 commented Mar 2, 2021

pc494 commented Mar 2, 2021

hakonanes commented Mar 3, 2021

magnunor commented Mar 3, 2021

CSSFrancis commented Mar 3, 2021

ericpre commented Mar 4, 2021

pc494 commented Jan 16, 2023 • edited Loading

pc494 commented Jan 16, 2023 •

edited

Loading