Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ideas to improve this project #3

Open
0b01 opened this issue Jun 14, 2017 · 5 comments
Open

Some ideas to improve this project #3

0b01 opened this issue Jun 14, 2017 · 5 comments

Comments

@0b01
Copy link

0b01 commented Jun 14, 2017

I am a seq2seq beginner so these are just my 2 cents. Correct me if I'm wrong.

  • Variable length output
  • Add weights to the loss function. e.g. the first few predicted points have bigger weights.
  • Dropout so it can better deal with noisy channels
  • ARIMA-like confidence interval (kind of like logits from softmax in the discrete case)
  • Use the new tf.contrib.tensorflow attention decoder

That's all.

@0b01
Copy link
Author

0b01 commented Jun 14, 2017

  • swap the loss function to NRMSE

@0b01
Copy link
Author

0b01 commented Jun 14, 2017

@0b01
Copy link
Author

0b01 commented Jun 17, 2017

The error function should be mean absolute percentage error:

https://en.wikipedia.org/wiki/Mean_absolute_percentage_error

@guillaume-chevalier
Copy link
Owner

Hi @RickyHan,

I think your ideas are nice and that it would be worth trying! I really appreciate your suggestions. Here are my comments on that:

  • Variable length output:
  • Add weights to the loss function. e.g. the first few predicted points have bigger weights.
    • Good idea! I have thought a lot about using an exponential decay on the loss function, but I never did it.
  • Dropout so it can better deal with noisy channels
    • This is not a priority, but it would be interesting. Having dropout would have considerably slowed the training phase. Since this project was an interactive demo at a master class / workshop / conference, training with dropout would have been too slow. But now it would be O.K.
  • ARIMA-like confidence interval (kind of like logits from softmax in the discrete case)
    • I wonder how one could implement that for a seq2seq. Maybe by tampering randomly with the state and inputs at each decoding steps to then record many randomized decoding passes to then have a neat Gaussian discrete distribution for each time step? Another way to do that would be to use a Mixture Density Networks (MDN) RNN, such as here: https://github.com/zhaoyu611/basketball_trajectory_prediction
  • Use the new tf.contrib.tensorflow attention decoder
    • I already thought about using attention for predicting time series, however, I am tempted to think that it would not help.
  • swap the loss function to NRMSE
    • I never tried nor thought to use this as a loss function for optimization, because MSE seems to be very standard and used. I am curious about whether or not this would be good. Yet, I recall that optimizing on the absolute error would optimize for the median, while optimizing on the squared error (such as in MSE) would optimize for the mean. For RMSE, I have no clues, but the loss function would appear less convex which might be less good. At least, trying that empirically is simple with the actual code. And for normalizing such as in NRMSE, this seems interesting too. At least, the outputs are already scaled according to how the inputs were scaled by their mean and std, so the outputs should not diverge too much from there, at least.
  • Replace enc_inp by expected_sparse_output: 1b17013#commitcomment-22530082
    • I tried to implement it, but it seems not trivial to implement because at test time we would rather use the predictions themselves (with feedback) rather than spoiling the decoder with the true test values. This would enable it to recover at test time. There is maybe a simple way to code this, I would like to have feedback on that. It might be possible to get some inspiration from this code, but right now I don't have the time for that: https://github.com/tensorflow/models/tree/master/tutorials/rnn/translate
  • The error function should be mean absolute percentage error
    • Thanks for the hint! Yeah, we could use that as an error metric aside from the optimization objective, loss.

I'll create a CONTRIBUTING.md file right now for instructions on how to contribute, in case anyone is interested.

@nicolapiccinelli
Copy link

Give a solution for each exercise

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants