Replies: 2 comments 1 reply
-
The PCA is done not on the entire experiments matrix but only on the subset of experiments that is of interest (i.e., TRUE). So your eigenvalues are based on a subset of experiments yet your visual is for all experiments. Have you checked the paper by Dalal et al? To be clear: it is still conceivable, that there is a mistake in the current code. It is not the most used or most extensively tested part of the code base. |
Beta Was this translation helpful? Give feedback.
1 reply
-
I did a quick follow-up test. x = pd.DataFrame(np.random.rand(250, 2), columns=['a', 'b'])
y = np.sum(x, axis=1)>1.2
rotated_experiments, rotation_matrix = prim.pca_preprocess(x, y)
figure, ax = plt.subplots(figsize=(8, 6))
ax.scatter(rotated_experiments['r_0'], rotated_experiments['r_1'],c=y, alpha=0.75, edgecolors='k', s=100)
ax.set_xlabel('r_0')
ax.set_ylabel('r_1')
plt.show() This does produce the expected behavior: |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! I have a question related to the PCA-PRIM implementation of the Workbench. It seems like the pca_preprocess() function produces unexpected values, or maybe they are ordered in an unexpected way? I will illustrate the issue based on the sd_prim_PCA_flu.py example.
My main question is this: why are the two first principal components not separating the data very well? Perhaps I am plotting this wrong or understanding the dataframes rotated_experiments and rotation_matrix incorrectly. But when I plot the transformed x-values (i.e. the rotated_experiments) against r_0 and r_1, they are not clearly distinguishable based on the classification applied:
y = outcomes["deceased population region 1"][:, -1] > 1000000
Which I guess they should be. I tried to check the eigenvectors of the covariance matrix of the original data, and it seems like the first and second eigenvectors are relatively large. I.e., their principal components should actually separate the data well. Anyone has an explanation? You can run the code below to see what I mean.
P.S. amazing work putting this Workbench together. Admirable!
Best regards,
Oscar Stenström
Beta Was this translation helpful? Give feedback.
All reactions