Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chap 15, pg 513 : ModuleNotFoundError: No module named 'torchdata.datapipes' #199

Open
Emmanuel-Ibekwe opened this issue Dec 24, 2024 · 7 comments

Comments

@Emmanuel-Ibekwe
Copy link

Emmanuel-Ibekwe commented Dec 24, 2024

`
from torchtext.datasets import IMDB
train_dataset = IMDB(split='train')
test_dataset = IMDB(split='test')

`
I keep getting this error despite manually installing torchdata. When I tried installing the exact version of torchtext used in the chapter, version 0.10.0, pip couldn't recognize as a valid version.

I can't find any solution to it online

@kostuyn
Copy link

kostuyn commented Jan 4, 2025

@Emmanuel-Ibekwe I installed 0.17.0 version the package and it work (for colab)
!pip install portalocker --quiet
!pip install torchtext==0.17.0 --quiet

after installed - Runtime -> Restart runtime option in the Colab menu

(last version of torchtext has a problem pytorch/text#2272)

@rasbt
Copy link
Owner

rasbt commented Jan 4, 2025

@Emmanuel-Ibekwe It looks like you are right, and the PyTorch maintainers removed torchtext 0.10.0 from PyPi for some reason. The ch15 notebook here on GitHub should be updated to work with newer versions of torchtext though as @kostuyn mentioned. It would require installing portalocker as well as described above. Let us know in case this still doesn't work.

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Jan 7, 2025

Thanks @rasbt and @kostuyn for the responses. I did find out through chatgpt (great tool) that the datasets package from the Huggingface community has the imdb dataset. So I used it.
Using the datasets package I got values for the various training and validation accuracies of different epochs that were different from the ones in the text. The model overfitted. At some point both accuracies maintained an accuracy score of 100%. But the model performed terribly on the test dataset. I got an accuracy of 68.5%.
Thanks one more time.

Edit: built a custom dataset for the imdb dataset from torch.utils.data to help in data loading.

@rasbt
Copy link
Owner

rasbt commented Jan 8, 2025

Thanks for the feedback. Yes, I think the dataset would nowadays be easier to get from the datasets library. The splits are different though, and I am surprised about the low test set accuracy. Both the training and validation accuracy were 100% though? This is an interesting case of overfitting where the validation accuracy seems almost too good to be true (and the test accuracy unexpectedly bad).

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Jan 15, 2025

Yes sir, both the training and validation accuracies were 100% at some point during the training process and maintained it till the last epoch. Here's a link to the repo containing the code just in case you want to take a look at the code. https://github.com/Emmanuel-Ibekwe/Machine-learning-by-S.-Raschka-notebooks

I had commented out the training code and saved the model.

Pls sir, if you are still interested in the code, try manually copying it with your mouse because clicking on the link just leads to a non-existent issue.

Also, sorry for the lack of comments and headings in the code (I had not cared much about them since it was basically for learning purposes). The training code is found at the very bottom of the file with the related code that builds up to it preceding it.

@rasbt
Copy link
Owner

rasbt commented Jan 16, 2025

Thanks for sharing, but it seems the link doesn't work:

Image

@Emmanuel-Ibekwe
Copy link
Author

Emmanuel-Ibekwe commented Jan 17, 2025

Good day sir. I finally figured out why it keeps directing to a wrong address. Github seems to be embedding the wrong url in the link. I've fixed that.

It works now.
https://github.com/Emmanuel-Ibekwe/Machine-learning-by-S.-Raschka-notebooks

The training code is towards the bottom of the file. Sorry once again for the lack of comments and headings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants