BUG snapml early stopping with `gpu=True` on leukemia #105

tanglef · 2022-05-17T17:46:05Z

snapml solver stops if the patience is not high enough on the leukemia dataset (but this seem to be highly dependent on the hardware - to confirm though).

A quick script to reproduce:

from snapml import LinearRegression
import numpy as np

from sklearn.datasets import fetch_openml
from sklearn.preprocessing import LabelBinarizer

X, y = fetch_openml("leukemia", return_X_y=True)
X = X.to_numpy()
y = LabelBinarizer().fit_transform(y)[:, 0].astype(X.dtype)


lmbd = 0.01 * max(abs(X.T @ y))
clf = LinearRegression(
    fit_intercept=False,
    regularizer=lmbd,
    penalty="l1",
    tol=0,
    dual=False,
    use_gpu=True,
    verbose=True,
    generate_training_history="full",
)

clf.max_iter = 4
clf.fit(X, y)
print(np.where(clf.coef_.squeeze()))
print(clf.training_history_)

What we observed for now is that with GeForce RTX (2080 and 2090) SUPER (and two different cuda 11.X versions), the objective stays put and in practice in Benchopt we stop the run and do not converge.
BUT, with a QUADRO T2000, the objective decreases at the last iter, meaning in Benchopt we have a full curve showing the convergence.
poke @mathurinm for the original test script and issue

The text was updated successfully, but these errors were encountered:

agramfort · 2022-05-17T18:00:22Z

can you check that computation are done in float64 on both hardware?

…

Message ID: ***@***.***>

tanglef · 2022-05-17T18:14:30Z

can you check that computation are done in float64 on both hardware?

Everything in intern is casted to float32 as shows these lines when I debug the code (I'm going through the code this way, the files are not in open access....)

tanglef · 2022-05-17T18:29:52Z

And the labels are also transformed later with the dtype float32 later on.

agramfort · 2022-05-17T18:32:47Z

I am not surprised ! GPU are more efficient with float32. It's interesting to add to the paper !

…

Message ID: ***@***.***>

tanglef · 2022-05-17T18:37:10Z

Yes, it takes less space so better for GPUs - at the cost of possibly less precision sometimes though.
But this also means that its not the problem's source....

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG snapml early stopping with `gpu=True` on leukemia #105

BUG snapml early stopping with `gpu=True` on leukemia #105

tanglef commented May 17, 2022 •

edited

Loading

agramfort commented May 17, 2022 via email

tanglef commented May 17, 2022

tanglef commented May 17, 2022

agramfort commented May 17, 2022 via email

tanglef commented May 17, 2022

BUG snapml early stopping with gpu=True on leukemia #105

BUG snapml early stopping with gpu=True on leukemia #105

Comments

tanglef commented May 17, 2022 • edited Loading

agramfort commented May 17, 2022 via email

tanglef commented May 17, 2022

tanglef commented May 17, 2022

agramfort commented May 17, 2022 via email

tanglef commented May 17, 2022

BUG snapml early stopping with `gpu=True` on leukemia #105

BUG snapml early stopping with `gpu=True` on leukemia #105

tanglef commented May 17, 2022 •

edited

Loading