New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Language Modeling with Gated Convolutional Networks #16

Open

flrngel opened this issue Jun 10, 2018 · 0 comments

Labels

Convolution Gating Mechanism

Owner

flrngel commented Jun 10, 2018 •

edited

Loading

https://arxiv.org/abs/1612.08083

Abstract

propose gating mechanism
uses WikiText-103 and Google Billion Words
proposed model is very competitive to strong recurrent models on large scale language tasks

1. Introduction

convolutional network has parallelization benefit
- but cuDNN is not optimized for 1d convolutions yet
Gated linear solves vanishing gradient
Compare to PixcelCNN, Oord et al. 2016, GLU is better than LSTM-style gating (GTU)

2. Approach

convolutional has no temporal dependency compare to recurrent models
recurrent models have infinite contexts but paper's experiment shows it is not necessary
GLU
abstract model
model uses adaptive softmax which assign higher capacity to very frequent words and lower capacity to rare words
- this results faster coputation and needs lower memory

3. Gating Mechanisms

purpose of gating mechanism is to control what information should be propagated through the hierarchy of layers
comparing to GTU (LSTM-style gating mechanism), gradient of GTU gradually vanishes because of downscaling factors tanh'(X) and \sigma'(X) but GLU doesn't have downscaling factor
- this can be thought of as a multiplicative skip connection (which helps gradients flow through the layers)

4. Experimental Setup

4.2. Training

uses gradient clipping on training and it works well

4.3. Hyper-parameters

initialize layers with Kaiming initialization

5. Results

5.3. Non-linear Modeling

Bilinear layers + GLU performs best

TODO

read http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf

flrngel added Convolution Gating Mechanism labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment