-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to passthrough num_seqs in PostprocessingDataset #1677
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, I implemented the exact same thing about a month ago. Convergent evolution at work!
I don't think that this is a very good idea though, because it invites buggy behavior if your mapper function doesn't actually cohere to the guarantees you made by specifying iterator_preserves_sequences
. We could catch this using an assert once the underlying dataset has ran out of seqs (comparing actual vs. forwarded num_seqs
), but in case there is a bug this assertion would trigger only at the very end of a subepoch, which would be quite annoying.
If you simply want to do laplace ordering there is the LaplaceOrdering
combinator in this very module that you can compose with your postprocessor function using Sequential
(also in this module) or by doing another PostprocessingDataset
-wrap. It uses a small buffer to do the ordering instead. It's efficient enough though as long as the buffer size doesn't grow too large.
Yes, we should definitely have such a check. I don't think it's too annoying. The user should anyway hopefully know whether the |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
I have also changed the message to show the actual value of |
I'm open to adjusting the name of the variable
My use case for this is to get Laplace Ordering on a Dataset, so map_seq_stream wouldn't drop any sequences.