Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add usage of validation split to cleaning script #5

Open
Lena-Jurkschat opened this issue Dec 15, 2022 · 0 comments
Open

Add usage of validation split to cleaning script #5

Lena-Jurkschat opened this issue Dec 15, 2022 · 0 comments

Comments

@Lena-Jurkschat
Copy link

Lena-Jurkschat commented Dec 15, 2022

Python cleaning script data-preparation/preprocessing/training/01a_catalogue_cleaning_and_filtering/clean.py
is using only the train split at the moment. Iteration over splits is needed and the filter application on all of them is needed!

Hint: just deleting the used split load_from_disk(dataset_path)['train'] by deleting the square brackets will not do it, because you will receive a DatasetDict Object then instead of a Dataset one. In consequence there is dataset.select() not possible because the method only exists for Dataset type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant