Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An Empirical Evaluation of generic Convolutional and Recurrent Networks for Sequence Modeling #12

Open
flrngel opened this issue Mar 10, 2018 · 0 comments

Comments

@flrngel
Copy link
Owner

flrngel commented Mar 10, 2018

https://arxiv.org/abs/1803.01271
this paper introduces Temporal Convolutional Networks (aka TCN)

Summary

Showing empirical general convolutional Model(Temporal Convolutional Networks; TCN) are better than RNNs in several tasks.

Abstract

  • Convolutional networks should be regarded as a natural starting point for sequence modeling tasks

1. Introduction

  • As starting sequence modeling, Recurrent models are first approach
  • But there's some research about convolutional models can reach state-of-art
  • This paper shows TCN architecture that is applied across all tasks
  • TCN is
    • simple and clearer than canonical recurrent networks
    • combines modern convolutional architectures
    • outperforms baseline recurrent architectures
    • retains longer memory and longer history

3. Temporal Convolutional Networks

image

  • Paper aims simple and powerful architecture
  • Characteristics of TCNs are
    • there is no information leakage from future to past
    • architecture can take sequence of any length and output sequence of same length
    • uses residual layers and dilated convolutions

3.1. Sequence Modeling

image

  • The goal of learning in sequence modeling setting is to find a network f that minimizes some expected loss between the actual outputs and the predictions.

3.2. Causal Convolutions

  • TCN's principle
    1. input and output has same lengths
    2. no leakage from the future into past
  • To achieve 2. above, TCN uses causal convolutions
    • that convolutions where an output at time t is convolved only with elements from t and ealier in the previous layer
      • this seems like masked convolution (van den Oord et al., 2016)
    • TCN = 1D FCN + causal convolutions

3.3. Dilated Convolutions

  • Simple causal convolution has retention memory problem
  • Paper employs diilated convolutions (van den Oord et al., 2016) to enable an exponentially large receptive field (Yu & Koltun, 2106)
    image
  • Dilated factors are exponential (d=1, d=2, d=4 ...)

3.4. Residual Connections

  • see Figure 1 (b) and (c)

3.5. Discussion

Advantage

  • Parallelism
  • Flexible receptive field size
  • Stable Gradients comparing to RNN
    • TCN avoids exploding/vanishing gradients
      • because TCN has a backpropagation path different from the temporal direction of the sequence.
  • Low memory requirement for training
  • Variable length inputs

Disadvantage

  • Data storage during evaluation
    • RNN can use less memory on evaluation compare to training process
  • Potential parameter change for a transfer of domain
    • TCN may not perform well on transfer of domain
      • when little memory need -> large memory need
      • for not having a sufficiently large receptive field

5. Experiments

image
image

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant