Skip to content

Commit

Permalink
Start writing documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
beam2d committed May 11, 2015
1 parent b249fb1 commit 4a99cbb
Show file tree
Hide file tree
Showing 15 changed files with 1,294 additions and 37 deletions.
269 changes: 239 additions & 30 deletions chainer/function.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,87 @@
from variable import Variable

class Function(object):
"""Function node.
Each function implementation inherits this class. The implementation should
provides {forward,backward}_{cpu,gpu} methods. By default, backward
propagates None, which indicates no error propagates back from this node.
Application of __call__ method inserts a copy of the Function instance to
the computational graph. Note: the copy is shallow, so be careful to
implement a function that shares some reference values between multiple
applications.
If the implementation holds parameters and corresponding gradients, these
should be kept as attributes and ``parameter_names`` and ``gradient_names``
attributes should indicate their names. Then, appropriate accessors are
defined. Otherwise, the implementation itself must provide these accessors.
"""Function of variable(s) to variable(s) that leaves footprint to the
output variables on application.
All function implementations defined in :mod:`functions` inherit this class.
The main feature of this class is keeping track of function applications as
a backward graph. When a function is applied to :class:`Variable` objects,
the function is copied, and its :meth:`forward` method is called on
:`~Variable.data` fields of input variables, and at the same time it chains
references from output variables to the function and from the function to
its inputs.
.. note::
Strictly speaking, when a function is applied to some variable, a special
:class:`Function` object called *splitter* is inserted between the
variable and the function. The splitter is used to manipulate multiple
function applications on the same variable, where gradients from
different backward paths are accumulated at the variable.
.. note::
:meth:`__call__` copies the function instance before the forward
computation and chaining. This enables us to reuse one function object
for multiple function applications, where the different calls must use
different references to the function object. Note that the copy is
shallow, so implementations of :class:`Function` must take care of any
member attributes shared accross forward and backward computations.
.. admonition:: Example
Let ``x`` an instance of :class:`Variable` and ``f`` that of
:class:`Function` taking only one argument. Then a line
>>> y = f(x)
computes a new variable ``y`` and makes backward references. Actually,
backward references are set as the following diagram::
x <--- (splitter) <--- x' <--- f' <--- y
where prime "'" indicates a copy of the original object. If another
application the function occurs as
>>> z = f(x)
then the splitter acts like a branch as the following new diagram::
|--- x' <--- f' <--- y
x <--- (splitter) <-+
|--- x'' <--- f'' <--- z
Note that the splitter is implicitly inserted and user does not need to
take any special care of it; just remember that such branching is
correctly managed by chainer.
Every function implementation should provides :meth:`forward_cpu`,
:meth:`forward_gpu`, :meth:`backward_cpu` and :meth:`backward_gpu`.
Alternatively, one can provide :meth:`forward` and :meth:`backward` instead
of separated methods. Backward methods have default implementations that
just return ``None``, which indicates that the function is non-
differentiable.
Function implementations are classified into two types: parameterized ones
and non-parameterized ones. Parameterized function holds parameter arrays
and coresponding gradient arrays. Implementation can choose any way to keep
these arrays, but it is recommended to keep them as attributes to easily
migrate between CPU and GPU. Parameterized function must provide accessors
to these arrays called :meth:`parameters` and :meth:`gradients`.
Attributes:
inputs: A tuple or list of input variables.
outputs: A tuple or list of output variables.
parameter_names: A tuple or list of names of parameter attributes.
It is set to empty tuple by default. This attribute is used by the
default implementation of :meth:`parameters` property to gather the
collection of parameter arrays. Implementation of parameterized
function should override this field as an attribute or a property,
or otherwise it should override :meth:`parameters` property.
gradient_names: A tuple or list of names of gradient attributes. The
detail is same as :data:`parameter_names`.
"""
parameter_names = ()
Expand All @@ -31,8 +97,29 @@ def __init__(self):
self.rank = None

def __call__(self, *inputs):
"""Execute function and chainer the input/output variables."""
"""Applies forward propagation on input variables with chaining backward
reference.
Basic behavior is also expressed in documentation of :class:`Function`
class. This function first copies itself to avoid conflict over multiple
invokations.
.. note::
If the :data:`~Variable.data` attribute of input variables reside on
GPU device, then, before it calls :meth:`forward` method, the
appropriate device is selected, so in most cases implementor does not
need to take care of device selection.
Args:
*inputs: Tuple of input :class:`Variable` objects. All input
variables must have same volatile flag.
Returns:
One :class:`Variable` object or a tuple of multiple
:class:`Variable` objects.
"""
# First copy itself to avoid duplication within the graph.
self = copy.copy(self)

Expand Down Expand Up @@ -81,10 +168,23 @@ def __call__(self, *inputs):
return ret

def forward(self, inputs):
"""Forward function.
"""Applies forward propagation to input arrays.
It delegates the procedure to :meth:`forward_cpu` or :meth:`forward_gpu`
by default. Which it selects is determined by the type of input arrays.
Implementation of :class:`Function` must implement either cpu/gpu
methods or this method.
Args:
inputs: Tuple of input array(s).
It delegates the procedure to forward_{cpu,gpu} by default. User must
either implement cpu/gpu methods or override this method.
Returns:
Tuple of output array(s).
.. warning::
Implementation of :class:`Function` must take care of that the
return value must be a tuple even if it returns only one array.
"""
if any(isinstance(x, cuda.GPUArray) for x in inputs):
Expand All @@ -93,30 +193,116 @@ def forward(self, inputs):
return self.forward_cpu(inputs)

def forward_cpu(self, inputs):
"""Forward function on CPU implemented by child class."""
"""Applies forward propagation to input arrays on CPU.
Args:
inputs: Tuple of :class:`~numpy.ndarray` object(s).
Returns:
Tuple of :class:`~numpy.ndarray` object(s).
.. warning::
Implementation of :class:`Function` must take care of that the
return value must be a tuple even if it returns only one array.
"""
raise NotImplementedError()

def forward_gpu(self, inputs):
"""Forward function on GPU implemented by child class."""
"""Applies forward propagation to input arrays on GPU.
Args:
inputs: Tuple of :class:`~pycuda.gpuarray.GPUArray` object(s).
Returns:
Tuple of :class:`~pycuda.gpuarray.GPUArray` object(s).
.. warning::
Implementation of :class:`Function` must take care of that the
return value must be a tuple even if it returns only one array.
"""
raise NotImplementedError()

def backward(self, inputs, grad_outputs):
"""Applies backprop to output gradient arrays.
It delegates the procedure to :meth:`backward_cpu` or
:meth:`backward_gpu` by default. Which it selects is determined by the
type of input arrays and output gradient arrays. Implementation of
:class:`Function` must implement either cpu/gpu methods or this method,
if the function is intended to be backprop-ed.
Args:
inputs: Tuple of input arrays.
grad_outputs: Tuple of output gradient arrays.
Returns:
Tuple of input gradient arrays. Some or all of them can be ``None``,
if the function is not differentiable on corresponding inputs.
.. warning::
Implementation of :class:`Function` must take care of that the
return value must be a tuple even if it returns only one array.
"""
if any(isinstance(x, cuda.GPUArray) for x in inputs + grad_outputs):
return self.backward_gpu(inputs, grad_outputs)
else:
return self.backward_cpu(inputs, grad_outputs)

def backward_cpu(self, inputs, grad_outputs):
"""Default implementation of backward on CPU, which does nothing."""
"""Applies backprop to output gradient arrays on CPU.
Args:
inputs: Tuple of input :class:`~numpy.ndarray` object(s).
grad_outputs: Tuple of output gradient :class:`~numpy.ndarray`
object(s).
Returns:
Tuple of input gradient :class:`~numpy.ndarray` object(s). Some or
all of them can be ``None``, if the function is not differentiable
on corresponding inputs.
.. warning::
Implementation of :class:`Function` must take care of that the
return value must be a tuple even if it returns only one array.
"""
return tuple(None for _ in inputs)

def backward_gpu(self, inputs, grad_outputs):
"""Default implementation of backward on GPU, which does nothing."""
"""Applies backprop to output gradient arrays on GPU.
Args:
inputs: Tuple of input :class:`~pycuda.gpuarray.GPUArray` object(s).
grad_outputs: Tuple of output gradient
:class:`~pycuda.gpuarray.GPUArray` object(s).
Returns:
Tuple of input gradient :class:`~pycuda.gpuarray.GPUArray`
object(s). Some or all of them can be ``None``, if the function is
not differentiable on corresponding inputs.
.. warning::
Implementation of :class:`Function` must take care of that the
return value must be a tuple even if it returns only one array.
"""
return tuple(None for _ in inputs)

def unchain(self):
"""Purge in/out variables and remove this node from the graph."""
"""Purges in/out variables and removes this function from the backward
graph.
This method is called from :meth:`Variable.unchain_backward` method.
"""
for y in self.outputs:
y_ref = y()
if y_ref is not None:
Expand All @@ -126,10 +312,18 @@ def unchain(self):
self.inputs = None

def to_gpu(self, device=None):
"""Migrate to GPU and return self.
"""Migrates the function to GPU and returns self.
The default implementation moves all fields of type ``numpy.ndarray``
onto GPU.
The default implementation moves all fields of type
:class:`~numpy.ndarray` onto GPU.
Args:
device (int or :class:`pycuda.driver.Device` or ``None``): Device ID
of GPU that the function will be migrated on. If this is
``None``, the current device is used.
Returns:
self.
"""
with cuda.using_device(device):
Expand All @@ -141,10 +335,13 @@ def to_gpu(self, device=None):
return self

def to_cpu(self):
"""Migrate to CPU and return self.
"""Migrates the function to CPU and returns self.
The default implementation moves all fields of type
:class:`pycuda.gpuarray.GPUArray` onto CPU.
The default implementation moves all fields of type ``cuda.GPUArray``
onto CPU.
Returns:
self.
"""
for k, v in self.__dict__.iteritems():
Expand All @@ -154,6 +351,12 @@ def to_cpu(self):

@property
def parameters(self):
"""A tuple of parameter arrays.
Default implementation collects parameter arrays based on
:data:`parameter_names` attribute.
"""
return tuple(getattr(self, name) for name in self.parameter_names)

@parameters.setter
Expand All @@ -163,6 +366,12 @@ def parameters(self, values):

@property
def gradients(self):
"""A tuple of gradient arrays.
Default implementation collects gradient arrays based on
:data:`gradient_names` attribute.
"""
return tuple(getattr(self, name) for name in self.gradient_names)

@gradients.setter
Expand Down
Loading

0 comments on commit 4a99cbb

Please sign in to comment.