Paper

Title: Resnet in Resnet: Generalizing Residual Architectures
Authors: Sasha Targ, Diogo Almeida, Kevin Lyman
Link: http://arxiv.org/abs/1603.08029
Tags: Neural Network, residual
Year: 2016

Summary

What
- They describe an architecture that merges classical convolutional networks and residual networks.
- The architecture can (theoretically) learn anything that a classical convolutional network or a residual network can learn, as it contains both of them.
- The architecture can (theoretically) learn how many convolutional layers it should use per residual block (up to the amount of convolutional layers in the whole network).
How
- Just like residual networks, they have "blocks". Each block contains convolutional layers.
- Each block contains residual units and non-residual units.
- They have two "streams" of data in their network (just matrices generated by each block):
  - Residual stream: The residual blocks write to this stream (i.e. it's their output).
  - Transient stream: The non-residual blocks write to this stream.
- Residual and non-residual layers receive both streams as input, but only write to their stream as output.
- Their architecture visualized:
- Because of this architecture, their model can learn the number of layers per residual block (though BN and ReLU might cause problems here?):
- The easiest way to implement this should be along the lines of the following (some of the visualized convolutions can be merged):
  - Input of size CxHxW (both streams, each C/2 planes)
    - Concat
      - Residual block: Apply C/2 convolutions to the C input planes, with shortcut addition afterwards.
      - Transient block: Apply C/2 convolutions to the C input planes.
    - Apply BN
    - Apply ReLU
  - Output of size CxHxW.
- The whole operation can also be implemented with just a single convolutional layer, but then one has to make sure that some weights stay at zero.
Results
- They test on CIFAR-10 and CIFAR-100.
- They search for optimal hyperparameters (learning rate, optimizer, L2 penalty, initialization method, type of shortcut connection in residual blocks) using a grid search.
- Their model improves upon a wide ResNet and an equivalent non-residual CNN by a good margin (CIFAR-10: 0.5-1%, CIFAR-100: 1-2%).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resnet_in_Resnet.md

Resnet_in_Resnet.md

Paper

Summary

Files

Resnet_in_Resnet.md

Latest commit

History

Resnet_in_Resnet.md

File metadata and controls

Paper

Summary