Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce support for generic elementwise binary operations #10

Merged
merged 28 commits into from
Feb 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
9bb8313
Introduce support for generic elementwise binary operations
iksnagreb Apr 12, 2024
f61a538
[Streamline] Fix FoldQuantWeights input order and shape annotations
iksnagreb Nov 13, 2023
8691d3f
Make quantized activation handlers data layout aware
iksnagreb Nov 20, 2023
24acc18
[Streamline] Fix AbsorbAddIntoMultiThreshold assumed input order
iksnagreb Nov 13, 2023
e632328
Fix clipping range issue in RoundAndClipThresholds transformation
iksnagreb Mar 13, 2024
8dd85f4
Rework RoundAndClipThresholds to avoid range and type promotion issues
iksnagreb Apr 6, 2024
8b7c2eb
[Thresholding] Make sure the output of python simulation is float32
iksnagreb Apr 17, 2024
f01d02f
[Tests] Rework test-cases for reworked RoundAndClipThresholds
iksnagreb Apr 6, 2024
023d950
[Streamline] Check validity of broadcasting Add into MultiThreshold
iksnagreb Apr 17, 2024
945db12
[Streamline] Fix backwards-propagating shapes in MoveAddPastMul
iksnagreb Apr 17, 2024
3f13673
[Elementwise] Add InferElementwiseBinaryOperation transformation
iksnagreb Apr 18, 2024
6a6616a
[Tests] Add simple integration test for ElementwiseBinaryOperation
iksnagreb Apr 18, 2024
fd1aedd
[Elementwise] Some cleanup / simplification of generated code
iksnagreb Apr 19, 2024
f010d18
[Streamline] Fix shape propagation of MoveLinearPastEltwiseAdd
iksnagreb Apr 19, 2024
7aaf739
[Tests] Add missing streamlining for testing ElementwiseBinaryOperation
iksnagreb Apr 19, 2024
5268ffe
[Elementwise] Implement bit-width minimization for all specializations
iksnagreb Apr 19, 2024
4769d8e
[Elementwise] Add support for floating-point operations
iksnagreb Apr 19, 2024
87fc002
[Elementwise] Implement get_exp_cycles for ElementwiseBinaryOperation
iksnagreb May 3, 2024
efb1cc9
[Elementwise] Add support for ElementwiseBinaryOperation to SetFolding
iksnagreb May 3, 2024
f34dcfc
[Elementwise] Remove FIFO depths attribute overloads
iksnagreb May 10, 2024
e361cb9
[Elementwise] Add ARRAY_PARTITION and BIND_STORAGE directives
iksnagreb May 17, 2024
653673b
[Streamline] Prevent FactorOutMulSignMagnitude from handling join-nodes
iksnagreb Aug 8, 2024
de97911
[Streamline] Delete initializer datatype annotation after MoveAddPastMul
iksnagreb Aug 8, 2024
dd68078
[Elementwise] Reintroduce FIFO depths attribute overloads
iksnagreb Aug 28, 2024
48be8a5
Merge remote-tracking branch 'xilinx/dev' into elementwise-binary
iksnagreb Jan 20, 2025
2501f58
[Thresholding] Remove second offset left in due to merge conflict
iksnagreb Jan 28, 2025
57625f6
[Deps] flatten is now part of finn-hlslib but we do not have ap_float
iksnagreb Jan 28, 2025
af99d03
Merge remote-tracking branch 'eki-project/dev' into elementwise-binary
iksnagreb Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .isort.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,6 @@ sections=FUTURE,STDLIB,TEST,THIRDPARTY,FIRSTPARTY,LOCALFOLDER
default_section=THIRDPARTY
multi_line_output=3
profile=black
ignore_comments=true
ignore_whitespace=true
honor_noqa=true
5 changes: 5 additions & 0 deletions src/finn/custom_op/fpgadataflow/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,12 +51,17 @@ def register_custom_op(cls):
# Disable linting from here, as all import will be flagged E402 and maybe F401


# Import the submodule containing specializations of ElementwiseBinaryOperation
# Note: This will automatically register all decorated classes into this domain
import finn.custom_op.fpgadataflow.elementwise_binary

# Import the submodule containing the Squeeze operation
# Note: This will automatically register all decorated classes into this domain
import finn.custom_op.fpgadataflow.squeeze

# Import the submodule containing the Unsqueeze operation
import finn.custom_op.fpgadataflow.unsqueeze

from finn.custom_op.fpgadataflow.addstreams import AddStreams
from finn.custom_op.fpgadataflow.channelwise_op import ChannelwiseOp
from finn.custom_op.fpgadataflow.concat import StreamingConcat
Expand Down
809 changes: 809 additions & 0 deletions src/finn/custom_op/fpgadataflow/elementwise_binary.py

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions src/finn/custom_op/fpgadataflow/hls/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,12 +54,17 @@ def register_custom_op(cls):
# flake8: noqa
# Disable linting from here, as all import will be flagged E402 and maybe F401

# Import the submodule containing specializations of ElementwiseBinaryOperation
# Note: This will automatically register all decorated classes into this domain
import finn.custom_op.fpgadataflow.hls.elementwise_binary_hls

# Import the submodule containing the specialization of the Squeeze operation
# Note: This will automatically register all decorated classes into this domain
import finn.custom_op.fpgadataflow.hls.squeeze_hls

# Import the submodule containing the specialization of the Unsqueeze operation
import finn.custom_op.fpgadataflow.hls.unsqueeze_hls

from finn.custom_op.fpgadataflow.hls.addstreams_hls import AddStreams_hls
from finn.custom_op.fpgadataflow.hls.channelwise_op_hls import ChannelwiseOp_hls
from finn.custom_op.fpgadataflow.hls.checksum_hls import CheckSum_hls
Expand Down
766 changes: 766 additions & 0 deletions src/finn/custom_op/fpgadataflow/hls/elementwise_binary_hls.py

Large diffs are not rendered by default.

2 changes: 2 additions & 0 deletions src/finn/custom_op/fpgadataflow/templates.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@

# template for single node execution
docompute_template = """
#define HLS_CONSTEXPR_ENABLE
#define AP_INT_MAX_W $AP_INT_MAX_W$
#include "cnpy.h"
#include "npy2apintstream.hpp"
Expand Down Expand Up @@ -108,6 +109,7 @@

# cpp file
ipgen_template = """
#define HLS_CONSTEXPR_ENABLE
#define AP_INT_MAX_W $AP_INT_MAX_W$

#include "bnn-library.h"
Expand Down
2 changes: 1 addition & 1 deletion src/finn/custom_op/fpgadataflow/thresholding.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ def execute_node(self, context, graph):
if act == DataType["BIPOLAR"]:
# binary to bipolar
y = 2 * y - 1
context[node.output[0]] = y
context[node.output[0]] = y.astype(np.float32)

def calc_tmem(self):
"""Calculates and returns TMEM."""
Expand Down
132 changes: 131 additions & 1 deletion src/finn/transformation/fpgadataflow/convert_to_hw_layers.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
import numpy as np
import qonnx.core.data_layout as DataLayout
import warnings
from onnx import TensorProto, helper
from onnx import NodeProto, TensorProto, helper
from qonnx.core.datatype import DataType
from qonnx.core.modelwrapper import ModelWrapper
from qonnx.custom_op.registry import getCustomOp
Expand All @@ -41,6 +41,9 @@
from qonnx.util.basic import get_by_name
from qonnx.util.onnx import nchw_to_nhwc

# Module containing specializations of elementwise binary operations
import finn.custom_op.fpgadataflow.elementwise_binary as elementwise_binary

# Base class for all FINN custom ops, here just used for type-hinting
from finn.custom_op.fpgadataflow.hwcustomop import HWCustomOp

Expand Down Expand Up @@ -1831,6 +1834,133 @@ def apply(self, model):
return (model, graph_modified)


# Lifts scalar to rank-1 tensor
def lift_to_rank1(name: str, model: ModelWrapper):
# Scalars have a shape of lengths zero
if len(model.get_tensor_shape(name)) == 0:
# Lift shape to rank-1 tensor with single element
model.set_tensor_shape(name, [1])
# Check whether this tensor has an initializer
if (tensor := model.get_initializer(name)) is not None:
# Set new initializer tensor of shape [1]
model.set_initializer(name, tensor.reshape(1))


# Converts supported elementwise binary operations to their FINN custom
# operation
class InferElementwiseBinaryOperation(Transformation):
# Filter function to filter out the last elementwise Mul operation,
# typically corresponding to output de-quantization, which should happen
# off-chip
@staticmethod
def reject_output_dequant(model: ModelWrapper, node: NodeProto):
# The operator must be a Mul and have no successor nodes
if node.op_type == "Mul" and not model.find_direct_successors(node):
# If the output is a floating-point tensors, reject this
if model.get_tensor_datatype(node.output[0]) == "FLOAT32":
# Filter False rejects this node
return False
# Filter True accepts this node
return True

# Filter function to filter out any operation involving any floating-point
# tensor
@staticmethod
def reject_floats(model: ModelWrapper, node: NodeProto):
# Check for any input being floating-point
if any(model.get_tensor_datatype(x) == "FLOAT32" for x in node.input):
# Filter False rejects this node
return False
# Check for any output being floating-point
if any(model.get_tensor_datatype(x) == "FLOAT32" for x in node.output):
# Filter False rejects this node
return False
# Filter True accepts this node
return True

# Initializes the transformation method with an optional filter function
def __init__(self, _filter=None):
# Initialize the base class Transformation object
super().__init__()
# Register the filter function as attribute
self._filter = _filter if _filter is not None else lambda *_: True

# Applies the transform to a whole model graph
def apply(self, model: ModelWrapper): # noqa
# Get the model graph out of the model wrapper object
graph = model.graph
# Keep track of whether the graph has been modified
graph_modified = False
# Iterate all nodes in the graph keeping track of the index
for index, node in enumerate(graph.node):
# Skip transforming nodes rejected by the filter
if not self._filter(model, node):
continue
# If a custom operation with corresponding name is implemented in
# the module, this operator is supported for conversion
if f"Elementwise{node.op_type}" in dir(elementwise_binary):
# Transplant this operator into our FINN domain
node.domain = "finn.custom_op.fpgadataflow"
# Adapt the op-type prefixing it with Elementwise
# TODO: Consider dropping the prefix?
node.op_type = f"Elementwise{node.op_type}"
# Now we can get the CustomOp wrapper instance providing easier
# attribute access
inst: HWCustomOp = getCustomOp(node)
# Set the backend attribute to mark this an operation supported
# to be implemented on an FPGA by FINN
inst.set_nodeattr("backend", "fpgadataflow")
# Need to "lift" potential scalar inputs to rank-1 tensors
lift_to_rank1(node.input[0], model)
lift_to_rank1(node.input[1], model)

# fmt: off
# Disable formatter. This is deliberately formatted to stay
# within 80 characters per line. Black, however, formats some
# lines going beyond this.

# Insert data type attributes from "context" into the CustomOp
# node
# TODO: Find a way to handle this via data type inference?
inst.set_nodeattr(
"lhs_dtype", str(model.get_tensor_datatype(node.input[0]))
)
inst.set_nodeattr(
"rhs_dtype", str(model.get_tensor_datatype(node.input[1]))
)
inst.set_nodeattr(
"out_dtype", str(model.get_tensor_datatype(node.output[0]))
)
# Insert shape attributes from "context" into the CustomOp node
# TODO: Find a way to handle this via shape inference?
inst.set_nodeattr(
"lhs_shape", model.get_tensor_shape(node.input[0])
)
inst.set_nodeattr(
"rhs_shape", model.get_tensor_shape(node.input[1])
)
inst.set_nodeattr(
"out_shape", model.get_tensor_shape(node.output[0])
)

# fmt: on

# Consider the graph to be modified, triggering exhaustive
# re-application of this transformation
graph_modified = True
# Exiting here triggers type and shape inference and cleanup
# after each transformed node. This helps QONNX to behave
# better / more consistent in certain cases...
break
# Re-do shape and data type annotations after potential changes to the
# model graph
model = model.transform(InferShapes())
model = model.transform(InferDataTypes())
# Return the transformed model and indicate whether the graph actually
# has been transformed
return model, graph_modified


# Converts the Squeeze operation to the corresponding FINN custom operation
class InferSqueeze(Transformation):
# Applies the transform to a whole model graph
Expand Down
26 changes: 25 additions & 1 deletion src/finn/transformation/fpgadataflow/set_folding.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,17 @@
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# Inspect information on Python objects like modules
import inspect
import numpy as np
import warnings
from qonnx.custom_op.registry import getCustomOp
from qonnx.transformation.base import Transformation
from qonnx.transformation.general import GiveUniqueNodeNames

# Import the elementwise binary operation module to extract names of all
# specializations (which require PE parallelism to be configured)
import finn.custom_op.fpgadataflow.hls.elementwise_binary_hls as elementwise_binary_hls
from finn.analysis.fpgadataflow.dataflow_performance import dataflow_performance
from finn.transformation.fpgadataflow.annotate_cycles import AnnotateCycles
from finn.util.fpgadataflow import is_hls_node, is_rtl_node
Expand All @@ -44,6 +49,15 @@ def divisors(num):
yield x


# Find the op-type names for all HLS specializations of elementwise binary
# operations
ELEMENTWISE_BINARY_OPS = [
op_type
for op_type, cls in inspect.getmembers(elementwise_binary_hls, inspect.isclass)
if issubclass(cls, elementwise_binary_hls.ElementwiseBinaryOperation_hls)
]


class SetFolding(Transformation):
"""Attempt to set parallelism attributes in all nodes to meet a specific
target expressed as cycles per frame target_cycles_per_frame. For each
Expand Down Expand Up @@ -106,6 +120,7 @@ def apply(self, model):
"GlobalAccPool_hls",
"Thresholding_hls",
"Thresholding_rtl",
*ELEMENTWISE_BINARY_OPS,
"Squeeze_hls",
"Unsqueeze_hls",
]
Expand Down Expand Up @@ -157,7 +172,16 @@ def apply(self, model):
# increase PE until target met or reached max_pe
self.optimize_attribute_val(node_inst, max_pe, "PE")
elif op_type in pe_ops:
max_pe = node_inst.get_nodeattr("NumChannels")
# Note: Keep original behavior for all custom-ops defining the
# NumChannels attribute as it is
try:
max_pe = node_inst.get_nodeattr("NumChannels")
# Note: Some of the recent additions do not define the
# NumChannels attribute
except AttributeError:
# We can extract the channels from the normal, i.e., not
# folded, shape of the input in these cases
max_pe = node_inst.get_normal_input_shape()[-1]
self.optimize_attribute_val(node_inst, max_pe, "PE")
elif op_type == "LabelSelect_hls":
max_pe = node_inst.get_nodeattr("Labels")
Expand Down
35 changes: 29 additions & 6 deletions src/finn/transformation/qonnx/fold_quant_weights.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,8 @@ def apply(self, model):
mul_tensor = helper.make_tensor_value_info(
model.make_new_valueinfo_name(),
TensorProto.FLOAT,
mul_shape,
mul_shape, # Note: This shape is known exactly as
# it is an initializer with known shape
)
graph.value_info.append(mul_tensor)
model.set_initializer(mul_tensor.name, scale)
Expand All @@ -168,7 +169,9 @@ def apply(self, model):
act_mul_tensor = helper.make_tensor_value_info(
model.make_new_valueinfo_name(),
TensorProto.FLOAT,
output_shape,
None, # Note: Explicitly delete the shape
# annotation to be redone by the next shape
# inference
)
graph.value_info.append(act_mul_tensor)
successor.output[0] = act_mul_tensor.name
Expand All @@ -186,19 +189,37 @@ def apply(self, model):
div_tensor = helper.make_tensor_value_info(
model.make_new_valueinfo_name(),
TensorProto.FLOAT,
mul_shape,
None, # Note: Explicitly delete the shape
# annotation to be redone by the next shape
# inference
)
graph.value_info.append(div_tensor)
model.set_initializer(div_tensor.name, scale)

succ_input_name = successor.input[0]
# Detect which input of the add-like successor is
# fed by the quantizer node to select the other
# branch to insert the scale factor
if successor.input[0] == node_out:
succ_input_name = successor.input[1]
else:
succ_input_name = successor.input[0]

act_mul_tensor = helper.make_tensor_value_info(
model.make_new_valueinfo_name(),
TensorProto.FLOAT,
output_shape,
None, # Note: Explicitly delete the shape
# annotation to be redone by the next shape
# inference
)
graph.value_info.append(act_mul_tensor)
successor.input[0] = act_mul_tensor.name

# Detect which input of the add-like successor is
# fed by the quantizer node to select the other
# branch to insert the scale factor
if successor.input[0] == node_out:
successor.input[1] = act_mul_tensor.name
else:
successor.input[0] = act_mul_tensor.name

div_node = helper.make_node(
"Div",
Expand All @@ -210,6 +231,8 @@ def apply(self, model):
# remove old node
graph.node.remove(n)
graph_modified = True
# Note: Running shape inference is necessary as shape
# annotations have been deleted above
model = model.transform(InferShapes())
return (model, graph_modified)
return (model, graph_modified)
Loading
Loading