-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Namespace collision with two or more SpikeGLX interfaces #1112
Comments
Hi, thanks for your detailed report. Indeed we are lacking documentation on how to do that. Moreover, as I was trying to build an example for you I found our current data model in certain parts inadequate to cover this. We would be working on a way to make this simpler in the future but meanwhile, I wanted to provide working code to solve your problem. First, though, I am giving you code to write two SpikeGLX interfaces without name collision and adequate device and electrode group handling: from neuroconv.datainterfaces import SpikeGLXRecordingInterface, KiloSortSortingInterface
from neuroconv import ConverterPipe
file_path_spikeglx_1 = ECEPHY_DATA_PATH / "spikeglx"/ "long_nhp_stubbed" /"snippet_g0/snippet_g0_imec0" / "snippet_g0_t0.imec0.ap.bin"
es_key_1 = "SpikeGLX1" # Electrical Series Metadata Key
interface_spikeglx_1 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_1, es_key=es_key_1)
metadata = interface_spikeglx_1.get_metadata()
probe_name = "ProbeA"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]
electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]
# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata]
for entry, name in zip(electrode_group_metadata, electrode_group_names):
entry.update(name=name, device=probe_name)
metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata
channel_group_names = interface_spikeglx_1.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_1.recording_extractor.set_property(key="group_name", values=new_channel_group_names)
# Note that the first interface needs to cretate the nwbfile
nwbfile = interface_spikeglx_1.create_nwbfile(metadata=metadata)
file_path_spikeglx_2 = ECEPHY_DATA_PATH / "spikeglx" / "NP2_no_sync" / "all_chan_g0_t0.exported.imec0.ap.bin"
es_key_2 = "SPIKEGLX2" # Electrical Series Metadata Key
interface_spikeglx_2 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_2, es_key=es_key_2)
metadata = interface_spikeglx_2.get_metadata()
probe_name = "ProbeB"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]
electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]
# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata]
for entry, name in zip(electrode_group_metadata, electrode_group_names):
entry.update(name=name, device=probe_name)
metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata
channel_group_names = interface_spikeglx_2.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_2.recording_extractor.set_property(key="group_name", values=new_channel_group_names)
# Note that the first interface needs to cretate the nwbfile
interface_spikeglx_2.add_to_nwbfile(nwbfile=nwbfile, metadata=metadata)
nwbfile
# When you are done write to NWB
nwbfile_path = "nwbfile.nwb"
from pynwb import NWBHDF5IO
with NWBHDF5IO(nwbfile_path, mode='w') as io:
io.write(nwbfile) I think this can be put inside a loop the way you wanted. Let me know if it makes sense. Before adding the sorting part I just wanted to clarify:
What do you mean by this? Just so I don't start writing code that does not relate to your intent. |
Thanks for the detailed response! I shall try it shortly
With the standard caveat that I may misunderstand the structure of NWB, we currently have a tool that converts from our internal Datajoint system to NWB. It doesn't use Neuroconv, it's based directly on In the current tool, when Units are added to the nwbfile, each Unit is cross-linked to an Electrode and therefore to an ElectrodeGroup which is linked to a Device. In psuedocode, that looks something like this: for probe in inserted_probes:
device = nwbfile.create_device()
electrode_group = nwbfile.create_electrode_group() # currently, this is 1 group per probe, even for 4-shank probes. Which may be inappropriate.
electrodes = make_electrodes(nwbfile, electrode_group) # Create however many hundred electrodes a probe has
for unit in (set_of_units_recorded_by_this_probe):
nwbfile.add_unit(unit_id, electrodre=unit_central_electrode, electrode_group=electrode_group) Thus, my assumption is that when I use Neuroconv to add, say, 2 SpikeGLXInterfaces (each of which corresponds to a unique Device) and 2 KilosortInterfaces, there is, or should be, some way to indicate that "Units from As my original bug report indicates, I didn't get far enough into the process to see how, or if, neuroconv works as I assume it does. Maybe that expectation is wrong, and the way our current tool works is not the correct, or not the only correct, way to do it. But that's what I meant :-) |
Digging into the output with the help of your example, I can generate a complete NWB file for further debugging purposes
However, it looks like adding KiloSort to the mix introduces a fresh collision. Recording is done by channel, which is a subset of the entire set of electrodes - so a 4-shank NPX2 probe has 5120 electrodes from which 384 can be recorded simultaneously. The NWBfile gets the list of those 384 channels as its table print("Electrode Channels")
print("Number of channels: ", len(nwbfile.electrodes))
print("Number of unique channel names: ", len(set(nwbfile.electrodes.channel_name)))
>>> Electrode Channels
>>> Number of channels: 768
>>> Number of unique channel names: 384 However, since Kilosort doesn't have any concept of plural devices, it only tracks units by channel number, and that is what is picked up by the I'm not sure what the best solution to this is for the project as a whole, but I guess in the short term, I need to build a slightly modified CustomKiloSortSortingInterface to add an additional field to the That's starting to get off topic from the issue I originally raised, but might be relevant to the project as a whole, requiring a more general solution. |
No, this is something that we have been pondering for a while. You are correct that the canonical way to make this linkage is by writing the electrodes in the units table. As you can see our base function for adding unit tables has a way of passing this information (see the `unit_electrode_indices``) but we have not propagated to the interface level because of some conceptual difficulties: Exposing this and adding documentation is something on our todo list. Meanwhile, I will come back to you soon with a way of doing this but it will be more manual work like the example above. Another option you have is write the sorting of each probe in a different unit table, see this issue: Btw, do you guys have a paper where you have this multiple probe setup or you know references in the literature where this is used? It is always useful for us to be able to point out to specific user cases when developing features. |
Those are both from 2022 using the older NPX 1 probes. I don't know if we've published anything with multiple NPX2 probes yet, but that will be coming - I discovered this issue doing the prep work for exactly that. In addition, while I don't have access to all of our surgical records, I have data on at least 40 subjects with n>=2 Neuropixel probes implanted since we first started testing the first generation prototypes |
Thank you, as promised, here is some code that would handle the correct mapping between raw and sorted data on ecephys. To this end, this introduced a new converted class from neuroconv.datainterfaces import SpikeGLXRecordingInterface, KiloSortSortingInterface
from neuroconv import ConverterPipe
from neuroconv.datainterfaces import SpikeGLXRecordingInterface, KiloSortSortingInterface
from neuroconv import ConverterPipe
from neuroconv import ConverterPipe
from neuroconv.datainterfaces.ecephys.baserecordingextractorinterface import BaseRecordingExtractorInterface
from neuroconv.datainterfaces.ecephys.basesortingextractorinterface import BaseSortingExtractorInterface
class SortedRecordingConverter(ConverterPipe):
def __init__(
self,
recording_interface: BaseRecordingExtractorInterface,
sorting_interface: BaseSortingExtractorInterface,
unit_ids_to_channel_ids: dict[str, list[str]],
):
self.recording_interface = recording_interface
self.sorting_interface = sorting_interface
self.unit_ids_to_channel_ids = unit_ids_to_channel_ids
self.channel_ids = self.recording_interface.recording_extractor.get_channel_ids()
self.unit_ids = self.sorting_interface.sorting_extractor.get_unit_ids()
data_interfaces = [recording_interface, sorting_interface]
super().__init__(data_interfaces=data_interfaces)
def add_to_nwbfile(self, nwbfile, metadata, conversion_options=None):
conversion_options = conversion_options or dict()
conversion_options_recording = conversion_options.get("recording", dict())
self.recording_interface.add_to_nwbfile(
nwbfile=nwbfile,
metadata=metadata,
**conversion_options_recording,
)
from neuroconv.tools.spikeinterface import add_sorting_to_nwbfile
from neuroconv.tools.spikeinterface.spikeinterface import _get_electrode_table_indices_for_recording
# This returns the indices in the electrode table that corresponds to the channel ids of the recording
electrode_table_indices = _get_electrode_table_indices_for_recording(
recording=self.recording_interface.recording_extractor,
nwbfile=nwbfile,
)
# Map ids in the recording to the indices in the electrode table
channel_id_to_electrode_table_index = {
channel_id: electrode_table_indices[channel_index]
for channel_index, channel_id in enumerate(self.channel_ids)
}
# Create a list of lists with the indices of the electrodes in the electrode table for each unit
unit_electrode_indices = []
for unit_id in self.unit_ids:
unit_channel_ids = self.unit_ids_to_channel_ids[unit_id]
unit_indices = [channel_id_to_electrode_table_index[channel_id] for channel_id in unit_channel_ids]
unit_electrode_indices.append(unit_indices)
# TODO: this should be the interface add_to_nwbfile method but we have not exposed unit_electrode_indices yet
add_sorting_to_nwbfile(
nwbfile=nwbfile,
sorting=self.sorting_interface.sorting_extractor,
unit_electrode_indices=unit_electrode_indices,
)
return
file_path_spikeglx_1 = (
ECEPHY_DATA_PATH / "spikeglx" / "long_nhp_stubbed" / "snippet_g0/snippet_g0_imec0" / "snippet_g0_t0.imec0.ap.bin"
)
es_key_1 = "SpikeGLX1" # Electrical Series Metadata Key
interface_spikeglx_1 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_1, es_key=es_key_1)
metadata = interface_spikeglx_1.get_metadata()
probe_name = "ProbeA"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]
electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]
# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [
electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata
]
for entry, name in zip(electrode_group_metadata, electrode_group_names):
entry.update(name=name, device=probe_name)
metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata
channel_group_names = interface_spikeglx_1.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_1.recording_extractor.set_property(key="group_name", values=new_channel_group_names)
# Note that the first interface needs to cretate the nwbfile
# Interface for the corresponding sorting
folder_path = ECEPHY_DATA_PATH / "phy" / "phy_example_0"
sorting_interface_1 = KiloSortSortingInterface(folder_path=folder_path)
# Dummy mapping, you need to provide the actual mapping
recording_extractor = interface_spikeglx_1.recording_extractor
sorting_extractor = sorting_interface_1.sorting_extractor
number_of_units = sorting_extractor.get_num_units()
unit_ids = sorting_extractor.get_unit_ids()
channel_ids = recording_extractor.get_channel_ids()
unit_ids_to_channel_ids = {unit_ids[unit_index]: [channel_ids[unit_index]] for unit_index in range(number_of_units)}
sorted_recorded_interface_1 = SortedRecordingConverter(
recording_interface=interface_spikeglx_1,
sorting_interface=sorting_interface_1,
unit_ids_to_channel_ids=unit_ids_to_channel_ids,
)
nwbfile = sorted_recorded_interface_1.create_nwbfile(metadata=metadata)
file_path_spikeglx_2 = ECEPHY_DATA_PATH / "spikeglx" / "NP2_no_sync" / "all_chan_g0_t0.exported.imec0.ap.bin"
es_key_2 = "SPIKEGLX2" # Electrical Series Metadata Key
interface_spikeglx_2 = SpikeGLXRecordingInterface(file_path=file_path_spikeglx_2, es_key=es_key_2)
metadata = interface_spikeglx_2.get_metadata()
probe_name = "ProbeB"
probe_metadata = {"name": probe_name, "description": "probe_description", "manufacturer": "IMEC"}
metadata["Ecephys"]["Device"] = [probe_metadata]
electrode_group_metadata = metadata["Ecephys"]["ElectrodeGroup"]
# This is a list of dicts with entries name, description and device
# Update name to Probe_{shank_index} and device to probe_name
electrode_group_names = [
electrode_metadata["name"].replace("Imec", probe_name) for electrode_metadata in electrode_group_metadata
]
for entry, name in zip(electrode_group_metadata, electrode_group_names):
entry.update(name=name, device=probe_name)
metadata["Ecephys"]["ElectrodeGroup"] = electrode_group_metadata
channel_group_names = interface_spikeglx_2.recording_extractor.get_property("group_name")
new_channel_group_names = [name.replace("Imec", probe_name) for name in channel_group_names]
interface_spikeglx_2.recording_extractor.set_property(key="group_name", values=new_channel_group_names)
# Interface for the corresponding sorting
folder_path = ECEPHY_DATA_PATH / "phy" / "phy_example_0"
sorting_interface_2 = KiloSortSortingInterface(folder_path=folder_path)
# Dummy mapping, you need to provide the actual mapping
recording_extractor = interface_spikeglx_2.recording_extractor
sorting_extractor = sorting_interface_2.sorting_extractor
number_of_units = sorting_extractor.get_num_units()
unit_ids = sorting_extractor.get_unit_ids()
channel_ids = recording_extractor.get_channel_ids()
unit_ids_to_channel_ids = {unit_ids[unit_index]: [channel_ids[unit_index]] for unit_index in range(number_of_units)}
# Add the second recording and sorting to the nwbfile
sorted_recorded_interface_2 = SortedRecordingConverter(
recording_interface=interface_spikeglx_2,
sorting_interface=sorting_interface_2,
unit_ids_to_channel_ids=unit_ids_to_channel_ids,
)
sorted_recorded_interface_2.add_to_nwbfile(nwbfile=nwbfile, metadata=metadata)
# When you are done write to NWB
nwbfile_path = "nwbfile.nwb"
from pynwb import NWBHDF5IO
with NWBHDF5IO(nwbfile_path, mode="w") as io:
io.write(nwbfile)
nwbfile We will be integrating something like this to main but first I need to fix a couple of things. Expose |
The example that I am using for debugging uses our test data from here: https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/spikeglx This includes multi shank 2.0 probe where the electrode groups are a bit more complicated precisely because I wanted to edge for that case.
Do you think you could share one of those sessions with us? It would be good to have more testing data for us that includes this type of setup. Feel free to reach me at h . mayorquin @ gmail (without spaces of course) to discuss this further if you could help us with it. |
Your example code seems to work with one exception, possibly deliberately (you mention it's a dummy). Your example assigns channels consecutively by unit id value, rather than by unit channel. The below assigns by "central" channel, and there probably exist ways with which I have not yet experimented to assign a group of electrodes. I think that line should read unit_ids_to_channel_ids = {
unit_ids[unit_index] : [channel_ids[sorting_extractor.get_unit_property(unit_index, "ch")]
for unit_index in range(number_of_units)
} With that modification, I get units assigned to electrode IDs exceeding the range of the zeroth device; and as far as I have checked manually, matching the expected electrode. |
Correct, the mapping I provided is for illustration purposes. There is no guarantee that the sorting extractor should have a "unit_index" property. There is also no guarantee that they would come in any order. This is something that the user should know and/or assign at writing time. |
What happened?
A large proportion of the experimental data recorded within our lab uses two or more Neuropixels probes (recorded with SpikeGLX) simultaneously. Each probe recording is then clustered independently via KiloSort. I expect to then generate a single NWB file, containing these multiple sets of SpikeGLX data, correctly linked to the correct member of multiple sets of Kilosort data.
However, I have not yet found a way to handle this case via Neuroconv. Possibly I am missing something obvious, possibly it was not envisaged during development. Nowhere in the documentation nor CatalystNeuro examples can I find something similar.
Data from a typical recording session looks something like this (irrelevant files excluded for clarity)
I have tried the following approaches, with the following failure modes:
Attempt 1:
NWBConverter
, following the documentation here< See code
attempt_1
>This runs into several problems
ThreeProbeConverter
,FourProbeConverter
, etcks_1
is associated withprobe_1
and not withprobe_2
SpikeGLXRecordingInterface
does not name its component parts uniquely (despite having access to the unique serial number encoded in the metadata file used to extract the model type). Therefore, for example, theDevice
entity for eachSpikeGLXRecordingInterface
is identical as far as name, description etc, as are the various ElectrodeGroups, and so when the metadata dictionary is repeatedly updated withinmd = converter.get_metadata()
, the outcome only lists a single device, and as many electrode groups as a single probeI did not get as far as figuring out how to crosslink
ks_1
withprobe_1
, as the device issue proved terminal.Attempt 2:
ConverterPipe
, similar to the documentation here, overwriting the metadata as needed to provide unique device IDs< see code
attempt_2
>This solves the issue of degenerate devices, and also avoids the issue of needing a new Converter class for each number of probes. However, it still doesn't address crosslinking the ks interface, and even before that, it runs into an issue with
Ecephys.ElectricalSeriesAP
<seeTraceback
>Unlike
Ecephys.Device
,Ecephys.ElectricalSeriesAP
is a single entry per file (at least, right now), and forcibly casting it to a list unsurprisingly causes a validation error.I haven't found any other potential work arounds to attempt, yet.
Is what I'm trying to do possibly via Neuroconv right now, and I've just missed the bleeding obvious? Given the complete absence of any such case of multiple identical interfaces from the documentation, perhaps this is a sufficiently esoteric edge case that it has either never been considered or considered and discarded.
Steps to Reproduce
Traceback
Operating System
macOS
Python Executable
Python
Python Version
3.11
Package Versions
annotated-types==0.7.0
appdirs==1.4.4
appnope @ file:///home/conda/feedstock_root/build_artifacts/appnope_1707233003401/work
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
asciitree==0.3.3
asttokens @ file:///home/conda/feedstock_root/build_artifacts/asttokens_1698341106958/work
attrs==24.2.0
blinker==1.8.2
certifi==2024.8.30
cffi==1.17.1
click==8.1.7
comm @ file:///home/conda/feedstock_root/build_artifacts/comm_1710320294760/work
contourpy==1.3.0
cryptography==43.0.1
cycler==0.12.1
datajoint==0.14.3
debugpy @ file:///Users/runner/miniforge3/conda-bld/debugpy_1728594122099/work
decorator @ file:///home/conda/feedstock_root/build_artifacts/decorator_1641555617451/work
docstring_parser==0.16
exceptiongroup @ file:///home/conda/feedstock_root/build_artifacts/exceptiongroup_1720869315914/work
executing @ file:///home/conda/feedstock_root/build_artifacts/executing_1725214404607/work
Faker==30.3.0
fasteners==0.19
Flask==3.0.3
fonttools==4.54.1
h5py==3.12.1
hdmf==3.14.5
hdmf_zarr==0.9.0
importlib_metadata @ file:///home/conda/feedstock_root/build_artifacts/importlib-metadata_1726082825846/work
ipykernel @ file:///Users/runner/miniforge3/conda-bld/ipykernel_1719845458456/work
ipython @ file:///home/conda/feedstock_root/build_artifacts/ipython_1727944696411/work
itsdangerous==2.2.0
jedi @ file:///home/conda/feedstock_root/build_artifacts/jedi_1696326070614/work
Jinja2==3.1.4
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
jupyter_client @ file:///home/conda/feedstock_root/build_artifacts/jupyter_client_1726610684920/work
jupyter_core @ file:///home/conda/feedstock_root/build_artifacts/jupyter_core_1727163409502/work
kiwisolver==1.4.7
MarkupSafe==3.0.1
matplotlib==3.9.2
matplotlib-inline @ file:///home/conda/feedstock_root/build_artifacts/matplotlib-inline_1713250518406/work
minio==7.2.9
neo==0.13.3
nest_asyncio @ file:///home/conda/feedstock_root/build_artifacts/nest-asyncio_1705850609492/work
networkx==3.4
neuroconv==0.6.4
numcodecs==0.13.1
numpy==1.26.4
otumat==0.3.1
packaging @ file:///home/conda/feedstock_root/build_artifacts/packaging_1718189413536/work
pandas==2.2.3
parse==1.20.2
parso @ file:///home/conda/feedstock_root/build_artifacts/parso_1712320355065/work
pexpect @ file:///home/conda/feedstock_root/build_artifacts/pexpect_1706113125309/work
pickleshare @ file:///home/conda/feedstock_root/build_artifacts/pickleshare_1602536217715/work
pillow==10.4.0
platformdirs @ file:///home/conda/feedstock_root/build_artifacts/platformdirs_1726613481435/work
probeinterface==0.2.24
prompt_toolkit @ file:///home/conda/feedstock_root/build_artifacts/prompt-toolkit_1727341649933/work
psutil @ file:///Users/runner/miniforge3/conda-bld/psutil_1725737862086/work
ptyprocess @ file:///home/conda/feedstock_root/build_artifacts/ptyprocess_1609419310487/work/dist/ptyprocess-0.7.0-py2.py3-none-any.whl
pure_eval @ file:///home/conda/feedstock_root/build_artifacts/pure_eval_1721585709575/work
pycparser==2.22
pycryptodome==3.21.0
pydantic==2.9.2
pydantic_core==2.23.4
pydot==3.0.2
Pygments @ file:///home/conda/feedstock_root/build_artifacts/pygments_1714846767233/work
PyMySQL==1.1.1
pynwb==2.8.2
pyparsing==3.1.4
python-dateutil @ file:///home/conda/feedstock_root/build_artifacts/python-dateutil_1709299778482/work
pytz==2024.2
PyYAML==6.0.2
pyzmq @ file:///Users/runner/miniforge3/conda-bld/pyzmq_1725448984636/work
quantities==0.15.0
referencing==0.35.1
rpds-py==0.20.0
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
scipy==1.14.1
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
spikeinterface==0.101.2
stack-data @ file:///home/conda/feedstock_root/build_artifacts/stack_data_1669632077133/work
threadpoolctl==3.5.0
tornado @ file:///Users/runner/miniforge3/conda-bld/tornado_1724956123063/work
tqdm==4.66.5
traitlets @ file:///home/conda/feedstock_root/build_artifacts/traitlets_1713535121073/work
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1717802530399/work
tzdata==2024.2
urllib3==2.2.3
watchdog==5.0.3
wcwidth @ file:///home/conda/feedstock_root/build_artifacts/wcwidth_1704731205417/work
Werkzeug==3.0.4
zarr==2.17.2
zipp @ file:///home/conda/feedstock_root/build_artifacts/zipp_1726248574750/work
Code of Conduct
The text was updated successfully, but these errors were encountered: