Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on the fifth shard #3

Open
clemley opened this issue Dec 11, 2020 · 7 comments
Open

Error on the fifth shard #3

clemley opened this issue Dec 11, 2020 · 7 comments

Comments

@clemley
Copy link

clemley commented Dec 11, 2020

On the first request of the fifth shard I believe there is an index error as it causes an error. All other pieces run properly aside from the fifth shard. Is there a way to fix this?

@huxi2
Copy link

huxi2 commented Jul 2, 2021

I found that the number of data in purchase2_train.npy generated by running init.sh was 249215, which was different from the number in the datasetfile.
So I fix this by modifying this code in prepare_data.py :
X_train, X_test, y_train, y_test = train_test_split(data, label, **test_size=0.1**)

Hope that helps

@swagStar123-code
Copy link

According to the proposal, the change from 0.2 to 0.1 still has the above problems.

@KatieHYT
Copy link

KatieHYT commented Jun 8, 2023

Same here.
Even I change from 0.2 to 0.1, there is still the index error IndexError: index 280367 is out of bounds for axis 0 with size 280367.

any suggestion till now?

@nimeshagrawal
Copy link

Any solution found regarding this issue?

@nimeshagrawal
Copy link

The problem is there in the datasets/purchase/datasetfile. They have hard coded train and test sample size. The prepare_data.py splits according to test_size = 0.2, but "datasetfile" has sample sizes according to test_size = 0.1. Hence, change train & test sample size in "datasetfile". (Replace with nb_train = 249215 and nb_test = 62304)

@scottshufe
Copy link

Thanks for your solution. It solved my problem perfectly.

The problem is there in the datasets/purchase/datasetfile. They have hard coded train and test sample size. The prepare_data.py splits according to test_size = 0.2, but "datasetfile" has sample sizes according to test_size = 0.1. Hence, change train & test sample size in "datasetfile". (Replace with nb_train = 249215 and nb_test = 62304)

@GM-git-dotcom
Copy link

The problem is there in the datasets/purchase/datasetfile. They have hard coded train and test sample size. The prepare_data.py splits according to test_size = 0.2, but "datasetfile" has sample sizes according to test_size = 0.1. Hence, change train & test sample size in "datasetfile". (Replace with nb_train = 249215 and nb_test = 62304)

This. And remember to run python prepare_data.py after making this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants