Deep Resource-aware OpenCL Inference Networks
- Deploy computation graphs (such as trained deep neural network models) to mobile or desktop OpenCL supporting platforms.
- Automated, resource-aware graph scheduling and parametrization
- Runs on Linux, Windows (and MacOS systems, not tested)
- Nvidia, AMD, Intel and mobile GPUs: Mali (OpenCL 1.1 required)
If you have questions, feedback, suggestions or if you want to contribute, feel free to contact me!
- Define a computation graph in python, test it and store it in the deepRacin format
- Load the stored model in a C application
- Initialize OpenCL or use an existing context and buffers
- In a processing loop:
- Feed data
- Run the graph
For Step 1 in Python:
import deepracin as dr
# Create empty graph
graph = dr.create_graph()
# Fill graph
# Feed node - Will be fed with data for each graph application
feed_node = dr.feed_node(graph, shape=(224, 224, 3))
# Conv2d node, given numpy arrays conv_weights and conv_biases
conv = dr.Conv2d(feed_node, shape, stride, activation='relu', weights=conv_weights, biases=conv_biases)
# MaxPooling node
pool = dr.Pooling(conv, pooling_type='max', shape, stride)
# FullyConnected node, given numpy arrays fc_weights and fc_biases
fc = dr.Fully_Connected(pool, shape, activation='relu', weights=fc_weights, biases=fc_biases)
# Mark output node
# Save deepracin graph
# Graph testing in python:
# Setup and schedule everything
for img_data in img_paths:
# Feed data
# Apply graph - returns one numpy array for each node marked as output
fc_output = dr.apply(graph)
For Steps 2, 3 and 4 in C with a new OpenCL environment:
// Load Graph
net = dR_NewGraph();
// Mark Output Node
// Initialize OpenCL
// Setup and schedule everything
// Get OpenCL buffers for outputs
for(int i = 0; i<numImages;i++)
// Feed data
// Apply graph
// Get output data
dR_downloadArray(net,"", outbuffers[0],0,out_size*sizeof(cl_float),data_out);
or with an existing OpenCL context and buffers:
// Load Graph
net = dR_NewGraph();
// Use existing OpenCL context
dR_setClEnvironment(net, clContext, clPlatformId, clCommandQueue, clDeviceId);
// Setup and schedule everything
for(int i = 0; i<numImages;i++)
// Apply graph
Dependencies of the C library:
- OpenCL 1.1
- Glib 2.0
Dependencies of the Python interface:
- Numpy
- For the C part of the examples, libpng is required to load test images.
- For building, CMake 2.8 (3.4 on Windows) is required.
- Install glib > 2.6, OpenCL, libpng and zlib
- Checkout deepRacin git repository
- Navigate to checkout folder
- Create build dir, navigate there
mkdir build cd build
- Apply cmake. Choose ON or OFF for options (without brackets). Note that Python and Numpy are required for installing the Python interface and libpng is required for building the examples
- Install the library
sudo make install
On Windows: (Overview, detailed version not available at the moment)
- Download and compile glib > 2.6, libpng and zlib with Visual Studio and install OpenCL
- Checkout deepRacin git repository
- Use CMake to configure
- Set all missing paths to OpenCL, glib, zlib and libpng
- Adjust Install Prefix
- Generate Project
- Build INSTALL Target of the generated Visual Studio Project
- DataFeedNode
- DNN Nodes
- Conv2d (direct, winograd(2x2, 3x3) and specialized 1x1 implementations)
- Pooling (currently Max, Avg)
- FullyConnected
- Activation fuctions (currently ReLU, Linear)
- Softmax
- Math Operations
- Add (with tensor or scalar)
- Sub (with tensor or scalar)
- Mul (with tensor or scalar)
- Div (with tensor or scalar)
- Pow (with tensor or scalar)
- Log
- Sqrt
- Exp
- Fill
- Transforms
- Concat
- Slice
- Image
- Normalization (per image to given mean and stddev)
- CropOrPad
- Upscaling
- RGBtoGray
- MaskDependentFilter (applies one of k image filters to each pixel, depending on integer mask)