Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute_expected_transition_matrix #185

Open
Mattlab-hub opened this issue Jul 17, 2024 · 3 comments
Open

Compute_expected_transition_matrix #185

Mattlab-hub opened this issue Jul 17, 2024 · 3 comments
Labels
❔ question Further information is requested

Comments

@Mattlab-hub
Copy link

Hello, this issue is moderately related to issue #145. I was looking into how the expected transition matrix is calculated because I noticed the correction causes the one state that has the largest timecov, meandurs, and occurrences ends up being negative for all participants in the controlled transition matrix. Though plotting just the observed transition matrix the state transition numbers look more like what the parameters would suggest, as though subtracting the expected matrix is over correcting or something. the first part of the code in the API:
# reshape if epochs (returns a view) labels = labels.reshape(-1) states = np.arange(-1, n_clusters) # expected probabilities T_expected = np.zeros(shape=(states.size, states.size)) for state_from in states: n_from = np.sum(labels == state_from) # no state_from in labels for state_to in states: n_to = np.sum(labels == state_to) if n_from != 0: if state_from != state_to: T_expected[state_from, state_to] = n_to else: T_expected[state_from, state_to] = n_to - 1 else: T_expected[state_from, state_to] = 0
ends up giving me this expected transition matrix for 5 states:
38556 59466 55404 58947 45512 25558
38557 59465 55404 58947 45512 25558
38557 59466 55403 58947 45512 25558
38557 59466 55404 58946 45512 25558
38557 59466 55404 58947 45511 25558
38557 59466 55404 58947 45512 25557

Running the second part of the code in the API gives me this:
0 0.271127 0.252607 0.268761 0.207506
0.19432 0 0.279226 0.297082 0.229372
0.190422 0.293685 0 0.291122 0.224771
0.193813 0.298916 0.278497 0 0.228774
0.181552 0.280006 0.260879 0.277562 0

That final matrix is what I get when using the function seg.compute_expected_transition_matrix() as well. So, am I doing something wrong to be getting that first matrix with so many of the same numbers in each column of is that supposed to be like that? I figured it was the number of occurrences of each transition type, so it looks wrong if that's the case.

Best,
Matt

@vferat
Copy link
Owner

vferat commented Sep 6, 2024

Hey @Mattlab-hub,

Sorry for the late reply,

Do you observe this "overcorrection" when the ignore_repetitions parameter is set to True, False or both ?
I have identified a possible issue, and would like to have your input to confirm it

Thanks for reporting the issue !

@Mattlab-hub
Copy link
Author

Hi @vferat ,
No problem, I did realize that I did not fully understand the matrix multiplication that was going on in the code, so at least some of what I said is probably not important (the difference between the first and second matrices).

I did test out the code again, running my data this time with ignore_repetitons=False for both the observed and expected functions. With it set to False for either or both functions, besides the diagonal not being zeroed out, which is to be expected, there is still some type of possible overcorrection when I subtract the expected from the observed. By that I mean state C still has the highest time coverage, mean duration, and occurrences by quite a bit and the observed matrix alone shows state C as having a higher rate of being transitioned to (ignoring the diagonal now since it is highest when set to False) from all other states (the C column), but when I subtract the expected and plot the controlled the opposite is true and state C has the lowest probability of being transitioned to from all other states.

Although there does seem to be an overcorrection with the expected matrix, it might also be highlighting different aspects of the data, such as state C might occur the most, but if it is transitioned to indiscriminately from periods of low GFP then that might be what I am seeing. In that case I think subtracting the expected is doing the right thing then. I am just suspicious since it is a pretty big difference with my data really highlighting C as being the most present and having the highest transition probability to it in the observed matrix, but having the lowest statistical transition probability after the expected correction. I would think some level of it should remain with a high probability at least from one or two other states. This is on task data, so I am expecting a sticky state like C, and I am doing this on group data where these results are consistent across participants as well. The resting state also highlights C but to a slightly lesser extent than the task.

Keep me updated as I am finishing up my dissertation's final analyses, so if I do need to change something, sooner than later would be helpful. You can email me at [email protected] if you want to give me small updates that are not big enough to warrant another post here yet. let me know if you need more info as well.
Best,
Matt

@vferat
Copy link
Owner

vferat commented Sep 10, 2024

The rationale behind the expected transition matrix is to account for chance. When one state occurs more frequently (e.g., state "C" in your example), there is a higher likelihood that other states (A, B, D) will transition to this dominant state simply by chance. If a particular state doesn't have a high probability of transitioning to this dominant state, it might indicate an underlying physiological phenomenon influencing this behavior.

This is why we subtract the expected transition matrix from the observed matrix—to isolate non-random effects.

Additionally, you can implement a statistical test to quantify this difference. For instance, by reshuffling the state labels (i.e., destroying the temporal structure while preserving the overall time coverages), you can generate a distribution of transition matrices that would occur purely by chance. By comparing the empirical transition matrix to these shuffled matrices, you can calculate p-values that tell you how likely the observed transitions are under random chance.

from pycrostates.segmentation.transitions import _compute_transition_matrix

# Compute the empirical transition matrix
empirical_matrix = _compute_transition_matrix(
    labels, n_clusters, ignore_repetitions=ignore_repetitions, stat=stat
)

# Generate shuffled transition matrices
transition_matrices = []
for _ in range(n_shuffle):
    shuffled_labels = labels.copy()
    np.random.shuffle(shuffled_labels)  # Shuffle the state labels to destroy temporal structure
    shuffled_matrix = _compute_transition_matrix(
        shuffled_labels, n_clusters, ignore_repetitions=ignore_repetitions, stat=stat
    )
    transition_matrices.append(shuffled_matrix)
transition_matrices = np.array(transition_matrices)

# Calculate p-values for each transition (i -> j)
p_values = np.zeros_like(empirical_matrix)
for i in range(n_clusters):
    for j in range(n_clusters):
        # Count how many shuffled matrices have a transition probability >= empirical matrix
        greater_or_equal = np.sum(transition_matrices[:, i, j] >= empirical_matrix[i, j])
        # Compute the p-value for the transition from state i to state j
        p_values[i, j] = greater_or_equal / n_shuffle

@vferat vferat added the ❔ question Further information is requested label Sep 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
❔ question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants