-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deferred Allocation #1704
base: main
Are you sure you want to change the base?
Deferred Allocation #1704
Conversation
I would not use IN_size (i.e., a throughput connector), maybe use just |
About the connector, we can also read the size, then this will be "OUT_size" I had planned. Would still suggest both connectors to be named size? |
yes, exactly. Since those connectors are endpoints, and not flowing through the access node (as they would in a scope node). |
I was updating the implementation according to a discussion we had with Torsten and Lex. I also changed the implementation to use "size" for both in and out connectors. Now the problem is that this SDFG does not validate because the access node has duplicate connectors. I think it is better to make the size connectors distinct rather than changing the validation rules. Or should I update the validation procedures with access nodes being an exception? |
…omplex shapes, add tests
I want to share the design document. With the proposal, we support dynamic allocation and reallocation of GPU_Global and CPU_Heap arrays. On the CPU_Heap array, reallocation is performed through a call to realloc; on GPU_Global storage, it is implemented through a sequence of malloc, copy, free as CUDA does not support realloc. Reallocation is only allowed on the host-side code and can be triggered when the scope is None. (Reallocation inside a map or nested SDFGs is, for example, impossible, as realloc / malloc are not usually thread-safe, and it would not have good performance if it had a thread-safe implementation.) A deferred array is generated only upon the request of the user. This can be done by including the symbol __dace_defer in any of the expressions in the shape of the array when it is added. It is invalid to reshape an array that is deferred. If the expression has multiple appearances of the __dace_defer symbol, it is assumed to be the same symbol. The array size is tracked by a unique size_array connected to a deferred array. It is stored in the _arrays dictionary of the SDFG, just like other arrays—the constructor of dace.data.Array sets is_size_arary to true for size arrays and is_deferred_array to true for deferred arrays. The members are set only for arrays, not other DaCe data types. The size_array of an array is tracked through the size_desc_name variable of an array. The size array is created by appending the _size suffix to the end of the array’s name. The size array is always one-dimensional. Its length matches the number of dimensions (length of the shape) of the deferred array. No size array is created if the array is not deferred. The reallocation is triggered by writing to the special _write_size in the connector of an access node. As part of reallocation, the user provides only the size of the __dace_defer, and the new value of the __dace_defer symbol is used to compute the new to the size_array. If the new shape (of dimension 1) written by the user is 5, then for the A above, the latest value of the __dace_defer will be written as 5, and the size will be A_size[1] = 5. But the dimension of A is [2N][45], any call to the existing functions from the cpp module still calculates the dimensions using the shape member of the array. To support __dace_defer symbol and deferred allocation, some functions accept a variable called deferred_size_names or automatically generate these parameters by detecting __dace_defer symbols in the shape. The offset expressions are matched against the __dace_defer_dim(d+) pattern and switched with accesses to the size_array on the host to process. On GPU kernels, the pattern matching is slightly different. The length of the write to the _write_size in the connector always needs to match the size of the array's shape. The values on the dimensions that do now have _dace_defer in their expressions are ignored. The size of the array can be read from the _read_size out connector of an access node and used in maps. Even though the size arrays are allocated on the stack, they require a call to cudamemcpy to transfer the size array to GPU to be transferred as pointers. To mitigate the issue (and have a more performant implementation than needing one more memcpy before every kernel), the size array is unpacked into integers. The name mangling pattern is as follows: the current value of the DimensionId’th _dace_defer symbol from the array A is read from A_size[i] and mangled as ___dim_size for the example, name mangled from A_size[1] would be __A_dim1_size , the integer unpacked sizes of deferred symbols are passed as integers to the kernel. The function _get_deferred_size_names method of the cpp module handles the generation of mangled names for array and offset accesses, and the CUDA codegen handles unpacking and instantiating mangled variable (integer) names. My concerns: |
The bold text is not copied over I have google doc for the same purpose: https://docs.google.com/document/d/1fBinC5d0gpBnYD9C4M3e0zyxVkGD94LZFzzTCydM2sQ/edit?usp=sharing |
…location in if-else branches
This is possibly the start of a long PR that will evolve, and we should discuss it at every step.
The main idea is as follows (so far in the prototype):
Idea 1:
When an array is created, let's say "A", then also a size array is created "A_size". The Size array is always one-dimensional and contiguously stored. When there is an A_size -> A (IN connector needs to be _size), we trigger a reallocation operation and not a copy operation. I am not sure if we should have an A_size array, I would like to discuss this.
I have a small SDFG that shows what it looks like.
Generated code looks like follows:
And SDFG:
Next Step:
Not 100% of this array will always be used; for example, when calling realloc, only the "__dace_defer" dimensions will be read from the array. And the dimensions that are passed using symbols will be read from the respective symbols (this can't be changed throughout the SDFG). (I am currently working on this step)