Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas groupby is too slow for experiment size datasets #23

Open
turbach opened this issue Jun 9, 2020 · 0 comments
Open

pandas groupby is too slow for experiment size datasets #23

turbach opened this issue Jun 9, 2020 · 0 comments
Assignees

Comments

@turbach
Copy link
Collaborator

turbach commented Jun 9, 2020

Problem

epf.py uses groupby epoch_id and time operations, for instance in QC and center_eeg.

The groupby operations are too slow for use on experiment sized datasets and need to be replaced, probably with numpy operations.

Solution

TBD. Centering is operations on floats, only need the numpy arrays

Maybe vectorize ... something like this pseudo code for center_eeg

  • look up rows in each epoch in the centering interval
idxs = np.where((epochs.time >= start & epochs.time < stop))
  • slice out the np array of (n_epochs * n_center_times, n_channels) for the centering interval
center_data = epochs[idxs]
  • unstack/reshape the center_data 2D (n_epochs * n_center_times, n_eeg_streams) to 3D (n_epochs, n_center_times, n_eeg_streams)
  • compute epoch mean across times (axis 1) = a 2D array of interval means (n_epochs, n_eeg_streams)
  • np. repeat/tile/broacast the interval means for each epoch by the number of times per epoch to original dimensions (n_epochs * n_times, n_channels)

This gives a new 2D array (n_epochs * n_times, n_eeg_streams) where each epoch has the value of the mean in the centering interval for that epoch at that eeg_stream

center_mns = np.[tile?repeat?](center_data.reshape(?,?,?).mean(axis=1))
assert center_mns.shape == epochs[data_streams].shape

Centering the epochs by the mean of the centering interval is a one line subtraction

epochs[eeg_streams] = epochs[eeg_streams] - center_mns

Run %%timeit to see if this helps, if not find something that does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants