New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

An Empirical Evaluation of generic Convolutional and Recurrent Networks for Sequence Modeling #12

Open

flrngel opened this issue Mar 10, 2018 · 0 comments

Labels

Convolution NLP

Owner

flrngel commented Mar 10, 2018 •

edited

Loading

https://arxiv.org/abs/1803.01271
this paper introduces Temporal Convolutional Networks (aka TCN)

Summary

Showing empirical general convolutional Model(Temporal Convolutional Networks; TCN) are better than RNNs in several tasks.

Abstract

Convolutional networks should be regarded as a natural starting point for sequence modeling tasks

1. Introduction

As starting sequence modeling, Recurrent models are first approach
But there's some research about convolutional models can reach state-of-art
This paper shows TCN architecture that is applied across all tasks
TCN is
- simple and clearer than canonical recurrent networks
- combines modern convolutional architectures
- outperforms baseline recurrent architectures
- retains longer memory and longer history

3. Temporal Convolutional Networks

Paper aims simple and powerful architecture
Characteristics of TCNs are
- there is no information leakage from future to past
- architecture can take sequence of any length and output sequence of same length
- uses residual layers and dilated convolutions

3.1. Sequence Modeling

The goal of learning in sequence modeling setting is to find a network f that minimizes some expected loss between the actual outputs and the predictions.

3.2. Causal Convolutions

TCN's principle
1. input and output has same lengths
2. no leakage from the future into past
To achieve 2. above, TCN uses causal convolutions
- that convolutions where an output at time t is convolved only with elements from t and ealier in the previous layer
  - this seems like masked convolution (van den Oord et al., 2016)
- TCN = 1D FCN + causal convolutions

3.3. Dilated Convolutions

Simple causal convolution has retention memory problem
Paper employs diilated convolutions (van den Oord et al., 2016) to enable an exponentially large receptive field (Yu & Koltun, 2106)
Dilated factors are exponential (d=1, d=2, d=4 ...)

3.4. Residual Connections

see Figure 1 (b) and (c)

3.5. Discussion

Advantage

Parallelism
Flexible receptive field size
Stable Gradients comparing to RNN
- TCN avoids exploding/vanishing gradients
  - because TCN has a backpropagation path different from the temporal direction of the sequence.
Low memory requirement for training
Variable length inputs

Disadvantage

Data storage during evaluation
- RNN can use less memory on evaluation compare to training process
Potential parameter change for a transfer of domain
- TCN may not perform well on transfer of domain
  - when little memory need -> large memory need
  - for not having a sufficiently large receptive field

5. Experiments

References

flrngel added Convolution NLP labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment