A simple convolutional network for image classification can be found in CNN_custom_dataset.py
. To try it on your own dataset, you should prepare your images in the following format:
images_folder
|-- class_01
|-- 001.png
|-- ...
|-- class_02
|-- 001.png
|-- ...
|-- ...
Its required argument is
--dataset
: path to the dataset,
while the optional arguments are
--epochs
: number of epochs,--batch_size
: size of the training batch,--lr
: learning rate.
To define a neural network, the nn.net.Net
object can be used. Its parameters are
layers
: a list of layers fromnn.layers
, for example[Linear(2, 4), ReLU(), Linear(4, 2)]
,loss
: a loss function fromnn.losses
, for exampleCrossEntropyLoss
orMeanSquareLoss
. If you would like to train the model with dataX
and labely
, you should
- perform the forward pass, during which local gradients are calculated,
- calculate the loss,
- perform the backward pass, where global gradients with respect to the variables and layer parameters are calculated,
- update the weights.
In code, this looks like the following:
out = net(X)
loss = net.loss(out, y)
net.backward()
net.update_weights(lr)
The currently implemented layers can be found in nn.layers
. Each layer is a callable object, where calling performs the forward pass and calculates local gradients. The most important methods are:
.forward(X)
: performs the forward pass for X. Instead callingforward
directly, the layer object should be called directly, which calculates and caches local gradients..backward(dY)
: performs the backward pass, wheredY
is the gradient propagated backwards from the consequtive layer..local_grad(X)
: calculates the local gradient of the input.
The input to the layers should always be a numpy.ndarray
of shape (n_batch, ...)
. For the 2D layers for images, the input should have shape (n_batch, n_channels, n_height, n_width)
.
A simple fully connected layer. Parameters:
in_dim
: integer, dimensions of the input.out_dim
: integer, dimensions of the output.
Usage:
- input:
numpy.ndarray
of shape(N, in_dim)
. - output:
numpy.ndarray
of shape(N, out_dim)
.
2D convolutional layer. Parameters:
in_channels
: integer, number of channels in the input image.out_channels
: integer, number of filters to be learned.kernel_size
: integer or tuple, the size of the filter to be learned. Defaults to 3.stride
: integer, stride of the convolution. Defaults to 1.padding
: integer, number of zeros to be added to each edge of the images. Defaults to 0.
Usage:
- input:
numpy.ndarray
of shape(N, C_in, H_in, W_in)
. - output:
numpy.ndarray
of shape(N, C_out, H_out, W_out)
.
2D max pooling layer. Parameters:
kernel_size
: integer or tuple, size of the pooling window. Defaults to 2.
Usage:
- input:
numpy.ndarray
of shape(N, C, H, W)
. - output:
numpy.ndarray
of shape(N, C, H//KH, W//KW)
with kernel size(KH, KW)
.
2D batch normalization layer. Parameters:
n_channels
: integer, number of channels.epsilon
: epsilon parameter for BatchNorm, defaults to 1e-5.
Usage:
- input:
numpy.ndarray
of shape(N, C, H, W)
. - output:
numpy.ndarray
of shape(N, C, H, W)
.
A simple layer which flattens the outputs of a 2D layer for images.
Usage:
- input:
numpy.ndarray
of shape(N, C, H, W)
. - output:
numpy.ndarray
of shape(N, C*H*W)
.
The implemented loss functions are located in nn.losses
. As Layers, they are callable objects, with predictions and targets as input.
Cross-entropy loss. Usage:
- input:
numpy.ndarray
of shape(N, D)
containing the class scores for each element in the batch. - output:
float
.
Mean square loss. Usage:
- input:
numpy.ndarray
of shape(N, D)
. - output:
numpy.ndarray
of shape(N, D)
.
The activation layers for the network can be found in nn.activations
. They are functions, applying the specified activation function elementwisely on a numpy.ndarray
. Currently, the following activation functions are implemented:
- ReLU
- Leaky ReLU
- Sigmoid