Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred Allocation #1704

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
b10475f
Early changes to support reallocation for CPU_Heap storage
ThrudPrimrose Oct 23, 2024
fae0704
Minimal functioning realloc
ThrudPrimrose Oct 23, 2024
023c86c
Add first prototype of deferred allocation support
ThrudPrimrose Oct 24, 2024
4aca5ee
Add reading the size of array, add size input as a special in connector
ThrudPrimrose Oct 29, 2024
e1442f7
Refactor
ThrudPrimrose Oct 30, 2024
dcbf2a2
Do not rely on naming conventions but save the size array descriptor'…
ThrudPrimrose Oct 30, 2024
33b9702
Merge branch 'main' into deferred_allocation
ThrudPrimrose Oct 30, 2024
e516985
dace/sdfg/validation.py
ThrudPrimrose Nov 4, 2024
925f8c7
Improve validation
ThrudPrimrose Nov 5, 2024
93eae37
More validation cases
ThrudPrimrose Nov 5, 2024
5b55425
Add support for deferred allocation on GPU global arrays
ThrudPrimrose Nov 5, 2024
c783668
Non-transient support attempt 1
ThrudPrimrose Nov 5, 2024
dc81d69
Improvements in GPU_Global support
ThrudPrimrose Nov 6, 2024
400257d
Merge branch 'main' into deferred_allocation
ThrudPrimrose Nov 19, 2024
c14b91e
Add tests
ThrudPrimrose Nov 20, 2024
506d0aa
Change connector names
ThrudPrimrose Nov 20, 2024
b956142
Add more test cases and fix some bugs
ThrudPrimrose Nov 20, 2024
c4eef0c
Merge branch 'main' into deferred_allocation
ThrudPrimrose Dec 3, 2024
82cdfde
Bug fixes
ThrudPrimrose Dec 3, 2024
97bc728
More codegen fixes
ThrudPrimrose Dec 3, 2024
08cb50c
Split size and array storage
ThrudPrimrose Dec 3, 2024
ac90c86
Major fixes regarding name changes etc.
ThrudPrimrose Dec 3, 2024
fe3748e
Rm rogue pritn
ThrudPrimrose Dec 3, 2024
1164e8c
Rm array length checks for now
ThrudPrimrose Dec 3, 2024
c597e24
Naming fixes
ThrudPrimrose Dec 3, 2024
21dd0c3
Name update fixes
ThrudPrimrose Dec 3, 2024
e669f7c
Various bugfixes on the feature
ThrudPrimrose Dec 6, 2024
6ac34f6
Add various fixes on distinguishing size and normal arrays
ThrudPrimrose Dec 6, 2024
75e2739
Move size array name check to validation
ThrudPrimrose Dec 6, 2024
8c164a4
Fix type shadowing in GPU kernel size array unpacking
ThrudPrimrose Dec 6, 2024
e76f39d
Make tests consider size arrays (todo: maybe make arrays do not retur…
ThrudPrimrose Dec 6, 2024
15b00cc
Fix size array name check in validation
ThrudPrimrose Dec 6, 2024
ee8a708
Various fixes
ThrudPrimrose Dec 6, 2024
f195e3f
Fix validation case
ThrudPrimrose Dec 6, 2024
e915607
Improve filtering for size arrays
ThrudPrimrose Dec 6, 2024
9d646dc
Improve tests, improve deferred alloc check
ThrudPrimrose Dec 9, 2024
3854c82
Fix type check imports
ThrudPrimrose Dec 9, 2024
2408ad0
Improve validation and type checks and fix bugs
ThrudPrimrose Dec 10, 2024
62bc08c
Build on top of the GPU codegen hack
ThrudPrimrose Dec 11, 2024
f50382b
Improve proposal according to PR comments, improve support for more c…
ThrudPrimrose Dec 11, 2024
8c2f12d
Add tests, refactor, improve size calculation
ThrudPrimrose Dec 11, 2024
ede2704
Add array length checks to cutout test
ThrudPrimrose Dec 11, 2024
a6163c0
Refactor
ThrudPrimrose Dec 11, 2024
80f6b4a
Merge branch 'main' into deferred_allocation
ThrudPrimrose Dec 13, 2024
ae08459
Refactor and support CPU_Pinned
ThrudPrimrose Dec 13, 2024
bb04e1a
Refactor and fix GPU array index generation
ThrudPrimrose Dec 13, 2024
02a48e8
Fixes to size desc name checks
ThrudPrimrose Dec 13, 2024
da7ba8d
Fix to erronous assertion
ThrudPrimrose Dec 13, 2024
460b75b
Test script refactor
ThrudPrimrose Dec 13, 2024
0794638
Merge branch 'main' into deferred_allocation
ThrudPrimrose Dec 16, 2024
e0472dc
merge fix
ThrudPrimrose Dec 16, 2024
92717e1
Allocate array fix
ThrudPrimrose Dec 16, 2024
592336b
Add forgotten defined var add
ThrudPrimrose Dec 16, 2024
16f6e88
Merge branch 'main' into deferred_allocation
ThrudPrimrose Dec 19, 2024
02937e3
Make size arary alloc C99 std compliant instead of C++11
ThrudPrimrose Dec 19, 2024
b7e6125
Merge branch 'main' into deferred_allocation
ThrudPrimrose Dec 21, 2024
b9698fa
Allow reshaping by changing size desc shape
ThrudPrimrose Dec 21, 2024
9755810
Rm getters, move funcitonality to set shape only
ThrudPrimrose Dec 21, 2024
f379b3f
Merge branch 'main' into deferred_allocation
ThrudPrimrose Jan 15, 2025
2164daa
Deferred allocation does not support reallocation, more tests and imp…
ThrudPrimrose Jan 15, 2025
0db57c3
Improve validation to use paths betweens tates to support deferred al…
ThrudPrimrose Jan 15, 2025
020a525
Minor fix for dfs sort on return blocks
ThrudPrimrose Jan 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 34 additions & 3 deletions dace/codegen/dispatcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
@registry.extensible_enum
class DefinedType(aenum.AutoNumberEnum):
""" Data types for `DefinedMemlets`.

:see: DefinedMemlets
"""
Pointer = () # Pointer
Expand Down Expand Up @@ -159,6 +159,8 @@ class TargetDispatcher(object):
_state_dispatchers: List[Tuple[Callable, target.TargetCodeGenerator]]
_generic_state_dispatcher: Optional[target.TargetCodeGenerator]

_generic_reallocate_dispatchers: Dict[dtypes.StorageType, target.TargetCodeGenerator]

_declared_arrays: DefinedMemlets
_defined_vars: DefinedMemlets

Expand All @@ -181,6 +183,7 @@ def __init__(self, framecode):
self._node_dispatchers = []
self._generic_node_dispatcher = None
self._state_dispatchers = []
self._generic_reallocate_dispatchers = {}
self._generic_state_dispatcher = None

self._declared_arrays = DefinedMemlets()
Expand All @@ -189,7 +192,7 @@ def __init__(self, framecode):
@property
def declared_arrays(self) -> DefinedMemlets:
""" Returns a list of declared variables.

This is used for variables that must have their declaration and
allocation separate. It includes all such variables that have been
declared by the dispatcher.
Expand All @@ -199,7 +202,7 @@ def declared_arrays(self) -> DefinedMemlets:
@property
def defined_vars(self) -> DefinedMemlets:
""" Returns a list of defined variables.

This includes all variables defined by the dispatcher.
"""
return self._defined_vars
Expand Down Expand Up @@ -354,6 +357,15 @@ def register_copy_dispatcher(self, src_storage: dtypes.StorageType, dst_storage:

self._copy_dispatchers[dispatcher].append((predicate, func))

def register_reallocate_dispatcher(self, node_storage: dtypes.StorageType,
func: target.TargetCodeGenerator,
predicate: Optional[Callable] = None) -> None:

if not isinstance(node_storage, dtypes.StorageType): raise TypeError(node_storage, dtypes.StorageType, isinstance(node_storage, dtypes.StorageType))
dispatcher = node_storage
self._generic_reallocate_dispatchers[dispatcher] = func
return

def get_state_dispatcher(self, sdfg: SDFG, state: SDFGState) -> target.TargetCodeGenerator:
# Check if the state satisfies any predicates that delegate to a
# specific code generator
Expand Down Expand Up @@ -594,6 +606,14 @@ def get_copy_dispatcher(self, src_node: Union[nodes.CodeNode, nodes.AccessNode],

return target

def get_reallocate_dispatcher(self, node: Union[nodes.CodeNode, nodes.AccessNode],
edge: MultiConnectorEdge[Memlet],
sdfg: SDFG, state: SDFGState) -> Optional[target.TargetCodeGenerator]:
node_storage = sdfg.arrays[node.data].storage
target = self._generic_reallocate_dispatchers[node_storage]
return target


def dispatch_copy(self, src_node: nodes.Node, dst_node: nodes.Node, edge: MultiConnectorEdge[Memlet], sdfg: SDFG,
cfg: ControlFlowRegion, dfg: StateSubgraphView, state_id: int, function_stream: CodeIOStream,
output_stream: CodeIOStream) -> None:
Expand All @@ -609,6 +629,17 @@ def dispatch_copy(self, src_node: nodes.Node, dst_node: nodes.Node, edge: MultiC
self._used_targets.add(target)
target.copy_memory(sdfg, cfg, dfg, state_id, src_node, dst_node, edge, function_stream, output_stream)

def dispatch_reallocate(self, src_node: nodes.Node, node: nodes.Node, edge: MultiConnectorEdge[Memlet], sdfg: SDFG,
cfg: ControlFlowRegion, dfg: StateSubgraphView, state_id: int, function_stream: CodeIOStream,
output_stream: CodeIOStream) -> None:
state = cfg.state(state_id)
target = self.get_reallocate_dispatcher(node, edge, sdfg, state)
assert target is not None

self._used_targets.add(target)
target.reallocate(sdfg, cfg, dfg, state_id, src_node, node, edge, function_stream, output_stream)


# Dispatches definition code for a memlet that is outgoing from a tasklet
def dispatch_output_definition(self, src_node: nodes.Node, dst_node: nodes.Node, edge, sdfg: SDFG,
cfg: ControlFlowRegion, dfg: StateSubgraphView, state_id: int,
Expand Down
75 changes: 70 additions & 5 deletions dace/codegen/targets/cpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import itertools
import math
import numbers
import re
import sys
import warnings

Expand Down Expand Up @@ -231,6 +232,14 @@ def memlet_copy_to_absolute_strides(dispatcher: 'TargetDispatcher',
elif memlet.data == dst_node.data:
copy_shape, src_strides = reshape_strides(dst_subset, dst_strides, src_strides, copy_shape)

def replace_dace_defer_dim(string, arrname):
pattern = r"__dace_defer_dim(\d+)"
return re.sub(pattern, r"A_size[\1]", string)

# TODO: do this better?
dst_expr = replace_dace_defer_dim(dst_expr, dst_node.data) if dst_expr is not None else None
src_expr = replace_dace_defer_dim(src_expr, src_node.data) if src_expr is not None else None

return copy_shape, src_strides, dst_strides, src_expr, dst_expr


Expand Down Expand Up @@ -539,7 +548,8 @@ def ndcopy_to_strided_copy(
return None


def cpp_offset_expr(d: data.Data, subset_in: subsets.Subset, offset=None, packed_veclen=1, indices=None):
def cpp_offset_expr(d: data.Data, subset_in: subsets.Subset, offset=None,
packed_veclen=1, indices=None, deferred_size_names=None):
""" Creates a C++ expression that can be added to a pointer in order
to offset it to the beginning of the given subset and offset.

Expand Down Expand Up @@ -569,9 +579,62 @@ def cpp_offset_expr(d: data.Data, subset_in: subsets.Subset, offset=None, packed
if packed_veclen > 1:
index /= packed_veclen

return sym2cpp(index)
if deferred_size_names is not None:
access_str_with_deferred_vars = sym2cpp(index)
def replace_pattern(match):
number = match.group(1)
return deferred_size_names[int(number)]
pattern = r'__dace_defer_dim(\d+)'
access_str = re.sub(pattern, replace_pattern, access_str_with_deferred_vars)
return access_str
else:
return sym2cpp(index)


def _get_deferred_size_names(desc, name):
if (desc.storage not in dtypes.REALLOCATABLE_STORAGES and
not desc.transient):
return None
def check_dace_defer(elements):
for elem in elements:
if "__dace_defer" in str(elem):
return True
return False
deferred_size_names = None
if check_dace_defer(desc.shape):
if desc.storage in dtypes.REALLOCATABLE_STORAGES:
deferred_size_names = []
for i, elem in enumerate(desc.shape):
if "__dace_defer" in str(elem):
deferred_size_names.append(f"__{name}_dim{i}_size" if desc.storage == dtypes.StorageType.GPU_Global else f"{desc.size_desc_name}[{i}]")
else:
deferred_size_names.append(elem)
return deferred_size_names if deferred_size_names is not None and len(deferred_size_names) > 0 else None

def _get_realloc_dimensions(size_array_name:str, new_size_array_name:str, shape):
# Only consider the offsets with __dace_defer in original dim
mask_array = ["__dace_defer" in str(dim) for dim in shape]

# In case the size does not only consist of a "__dace_defer" symbol but from an expression involving "__dace_defer"
# The size array is only updated with the symbol, and while calculating the expression, we only replace the __dace_defer_dim pattern
# With the corresponding access from the size array
size_assignment_strs = []
new_size_strs = []
old_size_strs = []
for i, mask in enumerate(mask_array):
if mask:
new_size_str = sym2cpp(shape[i])
pattern = r'__dace_defer_dim(\d+)'
new_size_strs.append(re.sub(pattern, lambda m: f'{new_size_array_name}[{m.group(1)}]', new_size_str))
old_size_strs.append(re.sub(pattern, lambda m: f"{size_array_name}[{m.group(1)}]", new_size_str))
size_assignment_strs.append(
f"{size_array_name}[{i}] = {new_size_array_name}[{i}];"
)
else:
old_size_strs.append(sym2cpp(shape[i]))
new_size_strs.append(sym2cpp(shape[i]))
return size_assignment_strs, new_size_strs, old_size_strs

def cpp_array_expr(sdfg,
memlet,
with_brackets=True,
Expand All @@ -586,8 +649,10 @@ def cpp_array_expr(sdfg,
subset = memlet.subset if not use_other_subset else memlet.other_subset
s = subset if relative_offset else subsets.Indices(offset)
o = offset if relative_offset else None
desc = (sdfg.arrays[memlet.data] if referenced_array is None else referenced_array)
offset_cppstr = cpp_offset_expr(desc, s, o, packed_veclen, indices=indices)
desc : dace.Data = (sdfg.arrays[memlet.data] if referenced_array is None else referenced_array)
desc_name = memlet.data
deferred_size_names = _get_deferred_size_names(desc, desc_name)
offset_cppstr = cpp_offset_expr(desc, s, o, packed_veclen, indices=indices, deferred_size_names=deferred_size_names)

# NOTE: Are there any cases where a mix of '.' and '->' is needed when traversing nested structs?
# TODO: Study this when changing Structures to be (optionally?) non-pointers.
Expand Down Expand Up @@ -763,7 +828,7 @@ def is_write_conflicted_with_reason(dfg, edge, datanode=None, sdfg_schedule=None
Detects whether a write-conflict-resolving edge can be emitted without
using atomics or critical sections, returning the node or SDFG that caused
the decision.

:return: None if the conflict is nonatomic, otherwise returns the scope entry
node or SDFG that caused the decision to be made.
"""
Expand Down
Loading
Loading