PArray coherence issue #149

mebenstein · 2023-04-16T04:05:46Z

When doing a reduce operation using PArrays, the data in follow-up operations is not consistent or None.
The following example without a reduction works fine:

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    for i in range(n):
        @spawn(t[n+i],dependencies=[t[:n]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

The outputs are 1.0, 1.0, 1.0, 1.0.

Adding a reduce operation leads to wrong values, often 0.5, 0.0, 0.0, 0.0

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    @spawn(a[0],dependencies=[t[:n]], input=[arr],placement=gpu)
    def acc():
        print(arr.mean())

   for i in range(n):
        @spawn(t[n+i],dependencies=[a[0]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

When the acc operation binds the parameter via inout the values are sometimes None and yield runtime exceptions. This only happens on GPU, not on CPU.

The text was updated successfully, but these errors were encountered:

yinengy · 2023-04-16T04:21:17Z

Thanks for reporting this issues. Fine grained slicing in this branch has known bug which has been fixed in experiment-parla but has not sync back to this repo yet. Will make a PR to bring the patch back

yinengy · 2023-04-16T10:14:55Z

#150 is created which solves coherence bugs in PArray.

Whats more, your second exmaple doesn't work since it voilate the PArray's restriction that doesn't allow moving multiple overlapping subarrays at the system without a writeback. That also includes the same subarray on different device (e.g. task_a might create arr[0] on gpu 0 but task_b will read arr[0] to gpu 1). This is a TODO that will be supported in next parla release but not yet.

So to make you example work in current version, you need a writeback task that writeback task_a's changes to cpu before task_c begins. Which should be:

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    @spawn(a[0],dependencies=[t[:n]], inout=[arr],placement=gpu)  # here, inout is required to trigger writeback
    def writeback():
        pass # do nothing

   for i in range(n):
        @spawn(t[n+i],dependencies=[a[0]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

yinengy · 2023-04-16T21:22:03Z

The changes have been merged into the main branch and the tutorial is also updated based on that. Will leave the issues as open for enhancement to get rid of the requirement of using writeback task.

mebenstein · 2023-04-18T05:34:32Z

The merge does not resolve the issue, even with a writeback. Some objects are still None and the output is essentially the same.
Also, I think someone might have left some debug messages in the code because the program now prints the following:

write: NA::subarray::[0]
read: NA::subarray::[2]
write: NA::subarray::[3]
read: NA::subarray::[0]
write: NA::subarray::[1]
read: NA::subarray::[3]
read: NA::subarray::[1]
write: NA::subarray::[2]
write: NA

yinengy · 2023-04-18T05:39:54Z

My bad, has removed the debug string please pull the changes.

I have reproduced the bug and looks like there is still a bug in current version of parla but not in new parla. Will look at it.

nicelhc13 assigned yinengy Apr 16, 2023

yinengy added the bug Something isn't working label Apr 16, 2023

yinengy linked a pull request Apr 16, 2023 that will close this issue

import new parray code from parla-experiment main #150

Merged

yinengy removed a link to a pull request Apr 16, 2023

import new parray code from parla-experiment main #150

Merged

yinengy added enhancement New feature or request and removed bug Something isn't working labels Apr 16, 2023

yinengy added the bug Something isn't working label Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PArray coherence issue #149

PArray coherence issue #149

mebenstein commented Apr 16, 2023

yinengy commented Apr 16, 2023

yinengy commented Apr 16, 2023 •

edited

Loading

yinengy commented Apr 16, 2023

mebenstein commented Apr 18, 2023

yinengy commented Apr 18, 2023 •

edited

Loading

PArray coherence issue #149

PArray coherence issue #149

Comments

mebenstein commented Apr 16, 2023

yinengy commented Apr 16, 2023

yinengy commented Apr 16, 2023 • edited Loading

yinengy commented Apr 16, 2023

mebenstein commented Apr 18, 2023

yinengy commented Apr 18, 2023 • edited Loading

yinengy commented Apr 16, 2023 •

edited

Loading

yinengy commented Apr 18, 2023 •

edited

Loading