Deep Resource-aware OpenCL Inference Networks
- Deploy computation graphs (such as trained deep neural network models) to mobile or desktop OpenCL supporting platforms.
- Automated, resource-aware graph scheduling and parametrization
- Runs on Linux, Windows (and MacOS systems, not tested)
- Nvidia, AMD, Intel and mobile GPUs: Mali (OpenCL 1.1 required)
If you have questions, feedback, suggestions or if you want to contribute, feel free to contact me!
- Define a computation graph in python, test it and store it in the deepRacin format
- Load the stored model in a C application
- Initialize OpenCL or use an existing context and buffers
- In a processing loop:
- Feed data
- Run the graph
For Step 1 in Python:
import deepracin as dr
# Create empty graph
graph = dr.create_graph()
# Fill graph
# Feed node - Will be fed with data for each graph application
feed_node = dr.feed_node(graph, shape=(224, 224, 3))
# Conv2d node, given numpy arrays conv_weights and conv_biases
conv = dr.Conv2d(feed_node, shape, stride, activation='relu', weights=conv_weights, biases=conv_biases)
# MaxPooling node
pool = dr.Pooling(conv, pooling_type='max', shape, stride)
# FullyConnected node, given numpy arrays fc_weights and fc_biases
fc = dr.Fully_Connected(pool, shape, activation='relu', weights=fc_weights, biases=fc_biases)
# Mark output node
dr.mark_as_output(fc)
# Save deepracin graph
dr.save_graph(graph,model_path)
# Graph testing in python:
# Setup and schedule everything
dr.prepare(graph)
for img_data in img_paths:
# Feed data
dr.feed_data(feed_node,data)
# Apply graph - returns one numpy array for each node marked as output
fc_output = dr.apply(graph)
For Steps 2, 3 and 4 in C with a new OpenCL environment:
// Load Graph
net = dR_NewGraph();
dR_loadGraph(net,model_path,&nodeslist,&numnodes,&feedlist,&numfeeds);
// Mark Output Node
dR_setAsOutput(net,nodeslist[numnodes-1]);
// Initialize OpenCL
dR_initCL(net);
// Setup and schedule everything
dR_prepare(net);
// Get OpenCL buffers for outputs
dR_getOutputBuffers(net,outbuffers);
for(int i = 0; i<numImages;i++)
{
// Feed data
dR_feedData(net,feedlist[0],(cl_float*)data[i],0,buffersize*sizeof(cl_float));
// Apply graph
dR_apply(net);
// Get output data
dR_downloadArray(net,"", outbuffers[0],0,out_size*sizeof(cl_float),data_out);
}
or with an existing OpenCL context and buffers:
// Load Graph
net = dR_NewGraph();
dR_loadGraph(net,model_path,&nodeslist,&numnodes,&feedlist,&numfeeds);
// Use existing OpenCL context
dR_setClEnvironment(net, clContext, clPlatformId, clCommandQueue, clDeviceId);
dR_setDataFeedNodeBuffer(net,feedlist[0],existingCLMemPointer1);
dR_setPreexistingOutputBuffer(net,nodeslist[numnodes-1],existingCLMemPointer2);
// Setup and schedule everything
dR_prepare(net);
for(int i = 0; i<numImages;i++)
{
...
// Apply graph
dR_apply(net);
...
}
Dependencies of the C library:
- OpenCL 1.1
- Glib 2.0
Dependencies of the Python interface:
- Numpy
Misc:
- For the C part of the examples, libpng is required to load test images.
- For building, CMake 2.8 (3.4 on Windows) is required.
- Install glib > 2.6, OpenCL, libpng and zlib
- Checkout deepRacin git repository
- Navigate to checkout folder
- Create build dir, navigate there
mkdir build cd build
- Apply cmake. Choose ON or OFF for options (without brackets). Note that Python and Numpy are required for installing the Python interface and libpng is required for building the examples
cmake .. -DINSTALL_PYTHON_INTERFACE=<ON|OFF> -DCOMPILE_EXAMPLES=<ON|OFF>
- Install the library
sudo make install
On Windows: (Overview, detailed version not available at the moment)
- Download and compile glib > 2.6, libpng and zlib with Visual Studio and install OpenCL
- Checkout deepRacin git repository
- Use CMake to configure
- Set all missing paths to OpenCL, glib, zlib and libpng
- Adjust Install Prefix
- Generate Project
- Build INSTALL Target of the generated Visual Studio Project
- DataFeedNode
- DNN Nodes
- Conv2d (direct, winograd(2x2, 3x3) and specialized 1x1 implementations)
- Pooling (currently Max, Avg)
- FullyConnected
- Activation fuctions (currently ReLU, Linear)
- Softmax
- Math Operations
- Add (with tensor or scalar)
- Sub (with tensor or scalar)
- Mul (with tensor or scalar)
- Div (with tensor or scalar)
- Pow (with tensor or scalar)
- Log
- Sqrt
- Exp
- Fill
- Transforms
- Concat
- Slice
- Image
- Normalization (per image to given mean and stddev)
- CropOrPad
- Upscaling
- RGBtoGray
- MaskDependentFilter (applies one of k image filters to each pixel, depending on integer mask)