-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating "Classifying Names with a Character-Level RNN" #2954
base: main
Are you sure you want to change the base?
Conversation
…ain and test sets as well as simplifying content
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2954
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Hi @mgs28! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
@svekars - it looks like you are active a lot in this repo, any chance you could help me with this? Thanks! |
…to mgs28-char-rnn-update
Added functionality to process training data in mini batches to satisfy original story. However, had to use numpy + random to create batch indices from a given dataset. Also, simplified training so it was a closer match to https://pytorch.org/tutorials/beginner/basics/optimization_tutorial.html |
@svekars - would you please help me with this tutorial update pull request or point me to someone who could? |
cc: @spro |
…rs, adding device config for CI steps, cleaning up documentatation
@@ -3,6 +3,7 @@ | |||
NLP From Scratch: Classifying Names with a Character-Level RNN | |||
************************************************************** | |||
**Author**: `Sean Robertson <https://github.com/spro>`_ | |||
**Updated**: `Matthew Schultz <https://github.com/mgs28>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we typically don't add Update. Please remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svekars - anything else I can do to make this better? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svekars - hello, are there other items I should address here? I appreciate your help with this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! At a high level, I have the following comments / concerns:
- I think it's a good idea to update the tutorial to utilize PyTorch's
Dataset
andDataLoader
abstractions.- I'm not sold on the need for the
NameData
class. It adds what I consider unnecessary complexity / code. It's perfectly simple to do any conversions between names <-> tensors as simple standalone functions utilized by the dataset.
- I'm not sold on the need for the
- I agree that proper train / test / validation splits should be done in the tutorial, so that's a nice addition.
- I'm okay with defining a custom
RNN
module for illustration purposes, although in practice we'd encourage the use ofnn.RNN
, and I'm not sure if this existed when the tutorial was originally written. One thing the officially provided module intorch.nn
provides is better perf due to e.g. cuDNN-accelerated kernels. If we don't use this within the tutorial, I think it should at least be mentioned and recommended for the perf benefits. All that said, I have some comments on theRNN
module defined in the tutorial:- It's a bit confusing to redefine it multiple times in the tutorial, adding stuff to it each time. I'd recommend a single definition.
- The
learn()
API does not belong onRNN
and I suggest leaving the training logic in a standalonetrain()
. This way, it's more PyTorch-idiomatic and easier for users to switch to some third-party training API (e.g. ignite, PyTorch Lightning, etc.).
# ASCII). | ||
# | ||
# The first thing we need to define is our data items. In this case, we will create a class called NameData | ||
# which will have an __init__ function to specify the input fields and some helper functions. Our first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# which will have an __init__ function to specify the input fields and some helper functions. Our first | |
# which will have an ``__init__`` function to specify the input fields and some helper functions. Our first |
# | ||
# The first thing we need to define is our data items. In this case, we will create a class called NameData | ||
# which will have an __init__ function to specify the input fields and some helper functions. Our first | ||
# helper function will be __str__ to convert objects to strings for easy printing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# helper function will be __str__ to convert objects to strings for easy printing | |
# helper function will be ``__str__`` to convert objects to strings for easy printing |
# ``all_categories`` (just a list of languages) and ``n_categories`` for | ||
# later reference. | ||
######################### | ||
#Now we can use that class to create a singe piece of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#Now we can use that class to create a singe piece of data. | |
#Now we can use that class to create a single piece of data. |
@@ -181,21 +255,22 @@ def lineToTensor(line): | |||
# | |||
# This RNN module implements a "vanilla RNN" an is just 3 linear layers | |||
# which operate on an input and hidden state, with a ``LogSoftmax`` layer | |||
# after the output. | |||
# after the output.s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# after the output.s | |
# after the output. |
…ng all RNN definition into one, moving RNN.learn() to separate train()
…ather than building it up
@jbschlosser - thank you for the lovely suggestions to improve. If possible, I'd like to split into two things: First, the edits to my existing content.
Secondly, I would really like to use the nn.RNN if possible. There are very few tutorials that mention them and everyone seems to drive their RNN builds off this tutorial. However to solve this task, I think I need a network with layers like [57, 128, 18] and it looks like the default Elman networks are stuck at [57, 18] with layers. Is best practice to inherit from nn.RNN and add my own fully connected output layer or am I misunderstanding something? Thanks! |
To make it simpler, I assume extending the nn.RNN class might look like (which runs about 40% faster) class MyRNN(nn.RNN):
|
Rather than inherit, we generally encourage composition. In this case, something like:
|
Thanks @jbschlosser ! I used nn.rnn in composition and changed some of the surrounding text. That forced the addition of a few terms to the repo dictionary. I appreciate you teaching me something again and hopefully the tutorial is better for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, thanks for the updates!
The confusion matrix does look quite a bit worse than pre-changes though; can this be entirely attributed to the use of a test set? Maybe we need a bit more time for training to converge?
Co-authored-by: Joel Schlosser <[email protected]>
@jbschlosser - thanks!
all_losses = train(rnn, alldata, n_epoch=100, learning_rate=0.1, report_every=5) and evaluate on alldata then I get a bright diagonal line that looks pretty similar to original. I imagine with some parameter tuning then I could get closer. |
It's a good point that we want to balance this some. That said, I think it'd be nice to be a little bit closer to the original (at least somewhat of a diagonal line confusion matrix). Hopefully we can strike a reasonable balance where we're beginning to see a diagonal line trend. No need to spend a ton of time parameter tuning though :) that's okay left as an exercise to the reader. |
…hanged epochs, training rate, more of split to training data
No problem @jbschlosser - I tuned some of the parameters to get pretty close to a diagonal confusion matrix. It still gets a little confused between English and Scottish as well as Korean and Chinese. However, there's a strong diagonal in the confusion matrix. Thanks! |
Thanks for the updates! I'm good with it; will let @svekars make the final approval :) |
Fixes #1166
Description
Updating Sean's excellent RNN classification tutorial that is now 8 years old and missing some newer pytorch functionality.
Checklist
cc @albanD