-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pointer-generator behavior with features #111
Comments
I am testing pointer generator for #80 and sneaking the We should raise an Exception that a specific features encoder needs to be set, if a features column is requested. Or else, we should set a default features encoder for each architecture. |
+1.
Yeah, got a different way to do that?
I think that is the current state of affairs: each model handles its own features and has a default behavior, but you can specify a different one and it might work. (I submit it should work for all of them.) |
Just the below suggestion in the original message.
I did not look closely, but it seems that this is only the default behavior due essentially to chance: most of our models encode features as extra symbols in the input sequence. For models that do not do this (e.g. pointer generator), the default (AFAIK) is to ignore the features. It should probably raise an exception if a |
Actually while I am here I will remark that:
sound concerning and are worth investigation as well... |
Okay, so, to summarize (tell me if I've got it wrong). If a non-zero feature column is specified:
and so on? |
Hmm, think I got some wires crossed when updating features. For the moment just adding the flag seems the best call. Though in the long term I think I'm just going to make it that the default is a concat operation. (That way the behavior is the same throughout when you pass the flags.) |
Concat can’t be the default for pointer generator or transducer though…doesn’t make sense…so we need a default feature encoder for those two, or, and I prefer this, an informative exception.
|
Sorry what I meant was, concat is the default feature encoder for other models and PT and Transducer have a separate default. |
Since it's not obvious to me what the defaults should be for the non-concatenating models, we should at least consider just throwing an exception instead. E.g., what should the default feature encoder be here?
|
It sounds like we need to test exhaustively to know for sure what the default of every case is. What I did observe was, after fixing the indentation error for the
IMO 1) is confusing behavior, and if the features are explicitly requested, we should either raise an Exception if we do not know what to do with them, or encode them by default (presumably the the same encoder of the requested architecture). While we are on this topic, iirc |
Yep. I think I'd prefer an exception.
OK.
I am lukewarm on this because the implementation sounds weirdly complex to get right, but I see the point. |
More notes---I am struggling through this stuff a bit as I try to implement the transformer version of this and my main bottleneck right now is figuring out how to modify general controller code for getting the features. In the train script, we only consider features as separate for specific architectures. I believe that this means no other architecture can ever make use of a separate feature encoder even if it is requested. Furthermore, it is pretty buried in the code that you need to hardcode archtectures at all in order to get a separate features encoding for a new model you are implementing. Digging through the giant stack of errors that are thrown when trying to implement a model to use separate features is basically no help, and before coming across this, the encoder_cls seemed to just sort of magically be not getting set haha. I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern. |
Following so far.
Yes, I agree. |
I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern. I think this works well for an initial stopgap. A later PR will make it that all architectures can use a |
Status update: I think that (1) is still true; (2-4) are no longer crashes but maybe someone will see a |
Is this still open after #172? @michaelpginn @Adamits |
@kylebgorman Yes, #172 just makes pg-transformer work with the predict.py script. I can take this issue over, and try to fix the interface so things are more intuitive. I will try to write a proposal by Friday. I also do not see anything explaining why you were getting |
Let's both try to replicate. I was doing |
Was not able to replicate. (I got a checkpoint with accuracy .922.) You? |
I ran this:
and got |
Yeah, so (3-4) are solved sounds like. |
I think this is all taken care of so I'm going to close. We can reopen if it arises again. |
I was thinking of the needed fixes to the modules interface was related to this issue. Maybe I can open a new issue specific to updating better defaults and raising errors. I probably will not get to that until tomorrow. |
Are you thinking like, dealing with preconditions that models have? My thought re: that, was that models (not modules) should have a callback interface which is provided with the index/indexes, and can either do nothing or run a test. In the pointer-generator case it should test that the source/features share symbols with the target (and this in turn needs to be independent of whether we are sharing embeddings or not). IDK what else it would do. |
This sounds like a nice way of doing it. I think your general concept that these types of checks should be defined on the model (rather than any of our yoyodyne data modules, or train.py, or modules) is probably the best design pattern here. |
My thought exactly. Stuff in The more I think about it, the set of checks is not open, nor need they be Turing-complete, so the interface might be like, each model registers zero or more of a fixed set of checks, and then we just run through that list and call them. In fact there may only be one, or a very small nubmer of such checks we need right now, in which case the YAGNI solution is to have each model define a zero-place boolean method
If the former strategy why not make it an error? |
Reporting some funky behavior with the pointer-generator with features:
--arch pointer_generator_lstm
and features enabled (i.e.,--features_col 3
or something other than the default0
), I get the following report:Since it doesn't list a feature encoder in either place, this makes me think it has ignored features even though I have explicitly requested them.
--arch pointer_generator_lstm --source_encoder lstm --features_encoder lstm
(which should be the same thing) we get a crash:The reason from this should be clear from the code: the branch that begins at line 345 doesn't define
predictions
.--arch pointer_generator_lstm --source_encoder lstm --features_encoder linear
works, though after hill climbing for a while both losses gonan
(e.g., on our Polish data).Same story as (3) with
--arch pointer_generator_lstm --source_encoder transformer --features_encoder linear
.I am assigning this to @bonham79; I think the fix will be quite small.
The text was updated successfully, but these errors were encountered: