[Discussion] Improve the way we handle signals of different nature when they come in the same stream #3197

h-mayorquin · 2024-07-12T18:42:14Z

On neo the concept of a stream indicates that the underlying data has the same:

dtype
shape
sampling_rate

And that makes a lot of sense in neo raw io because they can be thought off as a block for IO and processing while loading and accessing data. It achieves its purpose there.

Because it is readily available we also use the concept of stream in spikeinterface to load data from spikeinterface. The reason for this is that having the same shape, sampling rate and dtype are characteristics that a buffer of data requires to be loaded as a recording. The purposes match at that level.

Where the purposes don't match is on the application purpose of recording extractors objects. Let's take a common one which is sorting and two cases to illustrate.

plexon: currently both wide band and filtered signals are in the same stream. See here: How to select only wideband channels in Plexon #3196
CED from Cambridge Electronic Design: for our test example in gin raw, LFP, mechanical and even laser recordings are stored in the same stream. See this issue on neo.

I hope that this two situations illustrate that loading a recording and then trying to run a sorter or most analysis of the data will not make sense under those cases. The data needs further separation so it can be feed to a typical spikeinterface pipeline.

This is a discussion issue that introduces context so we can make a decision:

Can this be improved on the spikeinterfae side?
Should we improve the concept of stream or come with a new concept that ensures that only electrophysiological data of the same nature is loaded when users access data through our extractors API?
Are those cases above rare enough that we can just document those and not do anything else?

This is also important for us in neuroconv because usually we write a whole stream as an ElectricalSeries which does not work so we need further refinements to achieve our purpose as a curators of data.

Tagging @CodyCBakerPhD and @bendichter from the neuroconv side.

zm711 · 2024-07-15T12:01:34Z

I would argue for Plexon we should split the wideband and the filtered into different streams. Although strictly they could qualify as the same stream, for me it makes more sense to be different streams since it is the exact same data, but filtered vs unfiltered. CED I don't know at all.

By splitting plexon into two streams the user can easily select to use wideband and filter in spikeinterface/the sorter or choose to use the plexon filtered data. Curious what others think.

samuelgarcia · 2024-07-15T12:51:52Z

Good analysis of teh situation.
Even spikeglx last channel could be handle as a separate stream.

h-mayorquin · 2024-07-19T15:33:19Z

Agreed Solution

The agreed solution is to improve the concept of a stream, keep the name as it is, and fix it fundamentally in neo raw io.

Context

It is unclear what a "logical stream" is, but people have come to have expectations about it because it is exposed in our API. To make this more precise, let's build a provisional characterization of what a "logical stream" is.

A logical stream should:

Be a buffer stream in the IO sense: it should have dtype, sampling_frequency, and shape so it can be thought of as an IO block.
An analysis pipeline such as spike sorting should make analytical sense when loaded in spikeinterface or neo.

A logical stream should not:

Have channels that have different units.
Have channels that have different filtering.

We can add or remove points from this characterization as we move forward.

How to Implement It

The current concept of a stream in neo raw io will be renamed (provisionally "buffer stream") and hidden from the user. In other words, it will become an internal implementation detail of neo. In practice, this probably means expanding the neo header struct with another field that will characterize whether the signal can be loaded as one buffer and then sub-dividing this into logical or sub-streams that are exposed to the users. Note that in some cases, this will imply a small inefficiency as more data than is needed will be memmaped, but this can be mitigated and has a low cost overall. This is a lot of work that will happen at the neo level, so the rest of the details should probably be fleshed out there.

Some Things This Does Not Cover

There are cases like intan where the current stream is not narrow enough, but the "logical streams" cannot be determined from the format alone. For example, the channels of different ports might come from different probes, and the criterion of "you have to be able to run an analysis pipeline on the stream" does not apply neatly. In that case, it will be the responsibility of the user to partition the stream appropriately. I don't see what else we can be done there.

h-mayorquin · 2024-08-23T17:28:37Z

Here is another case of non-logical streams that came to us in neuroconv:
catalystneuro/neuroconv#1023 (comment)

zm711 · 2024-09-04T12:15:27Z

There are cases like intan where the current stream is not narrow enough, but the "logical streams" cannot be determined from the format alone. For example, the channels of different ports might come from different probes, and the criterion of "you have to be able to run an analysis pipeline on the stream" does not apply neatly.

I think in the intan port case we know they come from different ports. I think the right thing to do in this case is to only do one port at a time. Because it is one port per headstage which would mean one port per probe. So someone could easily sort their data separately and as far as I know it would be better to preprocess those ports separately. So in this case I think we would want to eventually hive that off in Intan since logically the ports are like doing separate experiments and so ought to be treated as separate streams.

h-mayorquin · 2024-09-04T19:54:54Z

@zm711
To add more context, I just worked with an experimental setup where the arrangement was the following (quoting):

You’ll see three tabs, as the rig can accommodate up to three Utah arrays: (1) the first array uses Ports A, B, and C (32 sites each), (2) the second uses Ports D, E, F (32 sites each), and (3) the third, if present, uses Ports G (32 sites) and H (64 sites).

For Spikeinterface purposes, I would rather give users too much data (that is memmaped anyway) so they can slice instead of too little, so they would need to open two recordings and then concatenate.

Would you still maintain your view in the light of this case?

zm711 · 2024-09-04T20:10:10Z

This is a tough one. Let me think on this case. utah arrays are their own special case. Often the electrodes 1) lack a rigid geometry and 2) are spaced enough they should be treated as small sets of electrodes rather than as one overall probe. I guess I don't care enough to fight if you feel strongly about keeping this together and having the user slice rather than have us slice and make the user concatenate/append. I guess I'm slightly in favor of your approach since I don't think we necessary have a channel_append machinery that would make this easy whereas we have a channel_slice machinery. So from a spikeinterface side it is much easier to slice in this situation.

thanks for providing this example. I would be curious how the utah setup actually works but I do remember you asking me about ports in general ages ago and bringing up this setup as an example of multiport use. Intan does provide spi cable splitter/adapters which would allow them to merge multiple headstages to one port if you really wanted them to go into one port only which I think would be the better thing to do to fit with our schema, but end-users can do whatever they want. So your solution reduces the decisions that we make vs the end-user makes.

I'm fine with keeping amplifier all coming out regardless of port. I think writing this has pushed me more toward your camp.

h-mayorquin added discussion General discussions and community feedback extractors Related to extractors module labels Jul 12, 2024

zm711 mentioned this issue Jul 17, 2024

Improve RawIO Documentation NeuralEnsemble/python-neo#1508

Open

h-mayorquin mentioned this issue Aug 15, 2024

Fix-ate plexon signal streams NeuralEnsemble/python-neo#1524

Merged

h-mayorquin mentioned this issue Aug 23, 2024

[Feature]: Selecting channels from OpenEphys data to convert catalystneuro/neuroconv#1023

Open

2 tasks

h-mayorquin mentioned this issue Sep 3, 2024

neo.rawio : API enhance proposal buffer_id and stream_id NeuralEnsemble/python-neo#1543

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Discussion] Improve the way we handle signals of different nature when they come in the same stream #3197

[Discussion] Improve the way we handle signals of different nature when they come in the same stream #3197

h-mayorquin commented Jul 12, 2024

zm711 commented Jul 15, 2024

samuelgarcia commented Jul 15, 2024

h-mayorquin commented Jul 19, 2024

h-mayorquin commented Aug 23, 2024

zm711 commented Sep 4, 2024

h-mayorquin commented Sep 4, 2024

zm711 commented Sep 4, 2024

[Discussion] Improve the way we handle signals of different nature when they come in the same stream #3197

[Discussion] Improve the way we handle signals of different nature when they come in the same stream #3197

Comments

h-mayorquin commented Jul 12, 2024

zm711 commented Jul 15, 2024

samuelgarcia commented Jul 15, 2024

h-mayorquin commented Jul 19, 2024

Agreed Solution

Context

How to Implement It

Some Things This Does Not Cover

h-mayorquin commented Aug 23, 2024

zm711 commented Sep 4, 2024

h-mayorquin commented Sep 4, 2024

zm711 commented Sep 4, 2024