Pointer-generator behavior with features #111

kylebgorman · 2023-07-17T17:51:22Z

Reporting some funky behavior with the pointer-generator with features:

With --arch pointer_generator_lstm and features enabled (i.e., --features_col 3 or something other than the default 0), I get the following report:

Model: pointer-generator
Encoder: LSTM
Decoder: attentive LSTM
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name                   | Type                  | Params
-----------------------------------------------------------------
0 | loss_func              | NLLLoss               | 0     
1 | dropout_layer          | Dropout               | 0     
2 | source_encoder         | LSTMEncoder           | 333 K 
3 | decoder                | LSTMAttentiveDecoder  | 283 K 
4 | classifier             | Linear                | 12.0 K
5 | log_softmax            | LogSoftmax            | 0     
6 | generation_probability | GenerationProbability | 601   
-----------------------------------------------------------------

Since it doesn't list a feature encoder in either place, this makes me think it has ignored features even though I have explicitly requested them.

With --arch pointer_generator_lstm --source_encoder lstm --features_encoder lstm (which should be the same thing) we get a crash:

  File "/home/kbg/.miniconda3/lib/python3.10/site-packages/yoyodyne/models/pointer_generator.py", line 370, in forward
    return predictions
UnboundLocalError: local variable 'predictions' referenced before assignment

The reason from this should be clear from the code: the branch that begins at line 345 doesn't define predictions.

--arch pointer_generator_lstm --source_encoder lstm --features_encoder linear works, though after hill climbing for a while both losses go nan (e.g., on our Polish data).
Same story as (3) with --arch pointer_generator_lstm --source_encoder transformer --features_encoder linear.

I am assigning this to @bonham79; I think the fix will be quite small.

The text was updated successfully, but these errors were encountered:

Adamits · 2023-08-14T23:55:29Z

I am testing pointer generator for #80 and sneaking the predictions fix in there (unless it turns out to be more complicated). However, I will comment that it is very unintuitive that setting the features column, but no features encoder tells the model to simply ignore features.

We should raise an Exception that a specific features encoder needs to be set, if a features column is requested. Or else, we should set a default features encoder for each architecture.

kylebgorman · 2023-08-15T20:44:04Z

I am testing pointer generator for #80 and sneaking the predictions fix in there (unless it turns out to be more complicated).

+1.

However, I will comment that it is very unintuitive that setting the features column, but no features encoder tells the model to simply ignore features.

Yeah, got a different way to do that?

We should raise an Exception that a specific features encoder needs to be set, if a features column is requested. Or else, we should set a default features encoder for each architecture.

I think that is the current state of affairs: each model handles its own features and has a default behavior, but you can specify a different one and it might work. (I submit it should work for all of them.)

Adamits · 2023-08-15T21:12:07Z

I am testing pointer generator for #80 and sneaking the predictions fix in there (unless it turns out to be more complicated).

+1.

However, I will comment that it is very unintuitive that setting the features column, but no features encoder tells the model to simply ignore features.

Yeah, got a different way to do that?

Just the below suggestion in the original message.

We should raise an Exception that a specific features encoder needs to be set, if a features column is requested. Or else, we should set a default features encoder for each architecture.

I think that is the current state of affairs: each model handles its own features and has a default behavior, but you can specify a different one and it might work. (I submit it should work for all of them.)

I did not look closely, but it seems that this is only the default behavior due essentially to chance: most of our models encode features as extra symbols in the input sequence. For models that do not do this (e.g. pointer generator), the default (AFAIK) is to ignore the features. It should probably raise an exception if a features-col is provided.

Adamits · 2023-08-15T21:14:33Z

Actually while I am here I will remark that:

--arch pointer_generator_lstm --source_encoder lstm --features_encoder linear works, though after hill climbing for a while both losses go nan (e.g., on our Polish data).

Same story as (3) with --arch pointer_generator_lstm --source_encoder transformer --features_encoder linear.

sound concerning and are worth investigation as well...

kylebgorman · 2023-08-15T22:56:07Z

I did not look closely, but it seems that this is only the default behavior due essentially to chance: most of our models encode features as extra symbols in the input sequence. For models that do not do this (e.g. pointer generator), the default (AFAIK) is to ignore the features. It should probably raise an exception if a features-col is provided.

Okay, so, to summarize (tell me if I've got it wrong). If a non-zero feature column is specified:

--arch attentive_lstm just works: it concatenates the features
--arch transducer raises an exception because no feature encoder is provided
--arch attentive_lstm --source_encoder lstm --features_encoder lstm has separate source and feature encoders (even though it would normally have concatenated)
--arch transducer --source_encoder lstm --feature_encoder lstm has separate source and feature encoders

and so on?

tbartley94 · 2023-08-16T13:29:48Z

Hmm, think I got some wires crossed when updating features. For the moment just adding the flag seems the best call. Though in the long term I think I'm just going to make it that the default is a concat operation. (That way the behavior is the same throughout when you pass the flags.)

kylebgorman · 2023-08-16T13:38:57Z

Concat can’t be the default for pointer generator or transducer though…doesn’t make sense…so we need a default feature encoder for those two, or, and I prefer this, an informative exception.

tbartley94 · 2023-08-16T13:42:25Z

Sorry what I meant was, concat is the default feature encoder for other models and PT and Transducer have a separate default.

kylebgorman · 2023-08-16T22:31:32Z

Since it's not obvious to me what the defaults should be for the non-concatenating models, we should at least consider just throwing an exception instead. E.g., what should the default feature encoder be here?

yoyodyne-train --features_col 3 --arch pointer_generator_lstm --source_encoder transformer ...

Adamits · 2023-08-16T22:52:21Z

It sounds like we need to test exhaustively to know for sure what the default of every case is. What I did observe was, after fixing the indentation error for the PointerGeneratorLSTM so that using it with a feature encoder works:

yoyodyne-train --features_col 3 --arch pointer_generator_lstm completely ignores the features
yoyodyne-train --features_col 3 --arch pointer_generator_lstm --source_encoder lstm ... Seems to work

IMO 1) is confusing behavior, and if the features are explicitly requested, we should either raise an Exception if we do not know what to do with them, or encode them by default (presumably the the same encoder of the requested architecture).
yoyodyne-train --features_col 3 --arch attentive_lstm seemed to concatenate the features to the source tokens, which seems like reasonable default behavior.

While we are on this topic, iirc --features_col 0 is the default behavior, and it could potentially also be worth raising a warning if there are unused columns in the data file, if we do not already, but this is not a concern for this issue.

kylebgorman · 2023-08-16T22:59:04Z

yoyodyne-train --features_col 3 --arch pointer_generator_lstm completely ignores the features

IMO 1) is confusing behavior, and if the features are explicitly requested, we should either raise an Exception if we do not know what to do with them, or encode them by default (presumably the the same encoder of the requested architecture).

Yep. I think I'd prefer an exception.

yoyodyne-train --features_col 3 --arch attentive_lstm seemed to concatenate the features to the source tokens, which seems like reasonable default behavior.

OK.

While we are on this topic, iirc --features_col 0 is the default behavior, and it could potentially also be worth raising a warning if there are unused columns in the data file, if we do not already, but this is not a concern for this issue.

I am lukewarm on this because the implementation sounds weirdly complex to get right, but I see the point.

Adamits · 2023-08-17T19:37:14Z

More notes---I am struggling through this stuff a bit as I try to implement the transformer version of this and my main bottleneck right now is figuring out how to modify general controller code for getting the features.

In the train script, we only consider features as separate for specific architectures.
https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L107
https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L161

I believe that this means no other architecture can ever make use of a separate feature encoder even if it is requested.

Furthermore, it is pretty buried in the code that you need to hardcode archtectures at all in order to get a separate features encoding for a new model you are implementing. Digging through the giant stack of errors that are thrown when trying to implement a model to use separate features is basically no help, and before coming across this, the encoder_cls seemed to just sort of magically be not getting set haha.

I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern.

kylebgorman · 2023-08-17T20:07:42Z

More notes---I am struggling through this stuff a bit as I try to implement the transformer version of this and my main bottleneck right now is figuring out how to modify general controller code for getting the features.

In the train script, we only consider features as separate for specific architectures. https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L107 https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L161

I believe that this means no other architecture can ever make use of a separate feature encoder even if it is requested.

Following so far.

I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern.

Yes, I agree.

bonham79 · 2023-08-22T01:24:40Z

I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern.

I think this works well for an initial stopgap. A later PR will make it that all architectures can use a feature_encoder, hard coding assignment of which architectures get feature encoders is just limiting the amount of testing cases we need atm.

kylebgorman · 2023-08-23T17:49:32Z

Status update: I think that (1) is still true; (2-4) are no longer crashes but maybe someone will see a nan from them.

kylebgorman · 2024-04-08T16:21:16Z

Is this still open after #172? @michaelpginn @Adamits

Adamits · 2024-04-08T16:35:56Z

@kylebgorman Yes, #172 just makes pg-transformer work with the predict.py script. I can take this issue over, and try to fix the interface so things are more intuitive. I will try to write a proposal by Friday.

I also do not see anything explaining why you were getting nan losses with pg-lstm. Do you recall if we ever worked on that? I can try to reproduce when I come back to this.

kylebgorman · 2024-04-08T16:40:35Z

I also do not see anything explaining why you were getting nan losses with pg-lstm. Do you recall if we ever worked on that? I can try to reproduce when I come back to this.

Let's both try to replicate. I was doing --arch pointer_generator_lstm --source_encoder lstm --features_encoder linear on our Polish abstractness data.

kylebgorman · 2024-04-09T02:09:22Z

I also do not see anything explaining why you were getting nan losses with pg-lstm. Do you recall if we ever worked on that? I can try to reproduce when I come back to this.

Let's both try to replicate. I was doing --arch pointer_generator_lstm --source_encoder lstm --features_encoder linear on our Polish abstractness data.

Was not able to replicate. (I got a checkpoint with accuracy .922.) You?

Adamits · 2024-04-09T16:20:16Z

I ran this:

#!/bin/bash

set -eou pipefail

readonly DATA=../../abstractness/data
readonly ARCH=pointer_generator_lstm
# TODO: Set according to language.
readonly LANGUAGE=pol
readonly MODEL_DIR="../models/${ARCH}"

readonly TRAIN="${DATA}/${LANGUAGE}/${LANGUAGE}_train.tsv"
cat "${DATA}/${LANGUAGE}/${LANGUAGE}_"[0-7]".tsv" > "${TRAIN}"
trap "rm -f ${TRAIN}" EXIT
readonly VAL="${DATA}/${LANGUAGE}/${LANGUAGE}_8.tsv"


yoyodyne-train \
    --experiment "${LANGUAGE}" \
    --train "${TRAIN}" \
    --val "${VAL}" \
    --model_dir "${MODEL_DIR}" \
    --features_col 3 \
    --arch "${ARCH}" \
    --batch_size 400 \
    --hidden_size 512 \
    --embedding_size 256 \
    --source_encoder transformer \
    --features_encoder linear \
    --decoder_layers 2 \
    --encoder_layers 2 \
    --dropout .3 \
    --beta2 .98 \
    --max_epochs 20 \
    --check_val_every_n_epoch 2 \
    --seed 49

and got
val loss: 0.12661617994308472
val accuracy: 0.8870481848716736
train loss: 0.04390707239508629

kylebgorman · 2024-04-09T16:38:32Z

Yeah, so (3-4) are solved sounds like.

kylebgorman · 2024-04-10T02:22:36Z

I think this is all taken care of so I'm going to close. We can reopen if it arises again.

Adamits · 2024-04-11T18:25:47Z

I was thinking of the needed fixes to the modules interface was related to this issue. Maybe I can open a new issue specific to updating better defaults and raising errors. I probably will not get to that until tomorrow.

kylebgorman · 2024-04-11T18:34:32Z

I was thinking of the needed fixes to the modules interface was related to this issue. Maybe I can open a new issue specific to updating better defaults and raising errors. I probably will not get to that until tomorrow.

Are you thinking like, dealing with preconditions that models have? My thought re: that, was that models (not modules) should have a callback interface which is provided with the index/indexes, and can either do nothing or run a test. In the pointer-generator case it should test that the source/features share symbols with the target (and this in turn needs to be independent of whether we are sharing embeddings or not). IDK what else it would do.

Adamits · 2024-04-11T18:42:51Z

Are you thinking like, dealing with preconditions that models have? My thought re: that, was that models (not modules) should have a callback interface which is provided with the index/indexes, and can either do nothing or run a test. In the pointer-generator case it should test that the source/features share symbols with the target (and this in turn needs to be independent of whether we are sharing embeddings or not). IDK what else it would do.

This sounds like a nice way of doing it. I think your general concept that these types of checks should be defined on the model (rather than any of our yoyodyne data modules, or train.py, or modules) is probably the best design pattern here.
I want to run checks on the requested architecture though. E.g. if we have requested features but we a) are not concatenating them to the input string and b) have no features encoder, this log a warning. Or alternatively, we should default ptr-gen models to have a feature encoder. I think there were some other things I thought of in the past that I need to brainstorm again.

kylebgorman · 2024-04-11T19:04:32Z

Are you thinking like, dealing with preconditions that models have? My thought re: that, was that models (not modules) should have a callback interface which is provided with the index/indexes, and can either do nothing or run a test. In the pointer-generator case it should test that the source/features share symbols with the target (and this in turn needs to be independent of whether we are sharing embeddings or not). IDK what else it would do.

This sounds like a nice way of doing it. I think your general concept that these types of checks should be defined on the model (rather than any of our yoyodyne data modules, or train.py, or modules) is probably the best design pattern here.

My thought exactly. Stuff in train.py is going to be hard to incorporate if you're using this like a library. And right now, it really is not set up to be used like a library (that's what I'm complaining about in #151 for instance).

The more I think about it, the set of checks is not open, nor need they be Turing-complete, so the interface might be like, each model registers zero or more of a fixed set of checks, and then we just run through that list and call them.

In fact there may only be one, or a very small nubmer of such checks we need right now, in which case the YAGNI solution is to have each model define a zero-place boolean method check_vocabulary or something (terrible name, please fix) which returns True iff you need to do vocabulary check described above. And then these methods are called when the superclass initialization is run.

I want to run checks on the requested architecture though. E.g. if we have requested features but we a) are not concatenating them to the input string and b) have no features encoder, this log a warning. Or alternatively, we should default ptr-gen models to have a feature encoder. I think there were some other things I thought of in the past that I need to brainstorm again.

If the former strategy why not make it an error?

kylebgorman added bug Something isn't working release blocker Should be solved before release labels Jul 17, 2023

kylebgorman assigned bonham79 Jul 17, 2023

kylebgorman mentioned this issue Aug 15, 2023

Loss function updates #133

Merged

kylebgorman assigned Adamits Apr 8, 2024

kylebgorman closed this as completed Apr 10, 2024

Adamits mentioned this issue Apr 11, 2024

WIP: single embeddings matrix design #140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pointer-generator behavior with features #111

Pointer-generator behavior with features #111

kylebgorman commented Jul 17, 2023 •

edited

Loading

Adamits commented Aug 14, 2023

kylebgorman commented Aug 15, 2023

Adamits commented Aug 15, 2023

Adamits commented Aug 15, 2023 •

edited

Loading

kylebgorman commented Aug 15, 2023

tbartley94 commented Aug 16, 2023

kylebgorman commented Aug 16, 2023 via email •

edited

Loading

tbartley94 commented Aug 16, 2023

kylebgorman commented Aug 16, 2023

Adamits commented Aug 16, 2023 •

edited

Loading

kylebgorman commented Aug 16, 2023

Adamits commented Aug 17, 2023 •

edited

Loading

kylebgorman commented Aug 17, 2023

bonham79 commented Aug 22, 2023

kylebgorman commented Aug 23, 2023

kylebgorman commented Apr 8, 2024

Adamits commented Apr 8, 2024

kylebgorman commented Apr 8, 2024

kylebgorman commented Apr 9, 2024

Adamits commented Apr 9, 2024

kylebgorman commented Apr 9, 2024

kylebgorman commented Apr 10, 2024

Adamits commented Apr 11, 2024

kylebgorman commented Apr 11, 2024

Adamits commented Apr 11, 2024

kylebgorman commented Apr 11, 2024

Pointer-generator behavior with features #111

Pointer-generator behavior with features #111

Comments

kylebgorman commented Jul 17, 2023 • edited Loading

Adamits commented Aug 14, 2023

kylebgorman commented Aug 15, 2023

Adamits commented Aug 15, 2023

Adamits commented Aug 15, 2023 • edited Loading

kylebgorman commented Aug 15, 2023

tbartley94 commented Aug 16, 2023

kylebgorman commented Aug 16, 2023 via email • edited Loading

tbartley94 commented Aug 16, 2023

kylebgorman commented Aug 16, 2023

Adamits commented Aug 16, 2023 • edited Loading

kylebgorman commented Aug 16, 2023

Adamits commented Aug 17, 2023 • edited Loading

kylebgorman commented Aug 17, 2023

bonham79 commented Aug 22, 2023

kylebgorman commented Aug 23, 2023

kylebgorman commented Apr 8, 2024

Adamits commented Apr 8, 2024

kylebgorman commented Apr 8, 2024

kylebgorman commented Apr 9, 2024

Adamits commented Apr 9, 2024

kylebgorman commented Apr 9, 2024

kylebgorman commented Apr 10, 2024

Adamits commented Apr 11, 2024

kylebgorman commented Apr 11, 2024

Adamits commented Apr 11, 2024

kylebgorman commented Apr 11, 2024

kylebgorman commented Jul 17, 2023 •

edited

Loading

Adamits commented Aug 15, 2023 •

edited

Loading

kylebgorman commented Aug 16, 2023 via email •

edited

Loading

Adamits commented Aug 16, 2023 •

edited

Loading

Adamits commented Aug 17, 2023 •

edited

Loading