Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PArray coherence issue #149

Open
mebenstein opened this issue Apr 16, 2023 · 5 comments
Open

PArray coherence issue #149

mebenstein opened this issue Apr 16, 2023 · 5 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@mebenstein
Copy link

When doing a reduce operation using PArrays, the data in follow-up operations is not consistent or None.
The following example without a reduction works fine:

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    for i in range(n):
        @spawn(t[n+i],dependencies=[t[:n]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

The outputs are 1.0, 1.0, 1.0, 1.0.

Adding a reduce operation leads to wrong values, often 0.5, 0.0, 0.0, 0.0

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    @spawn(a[0],dependencies=[t[:n]], input=[arr],placement=gpu)
    def acc():
        print(arr.mean())

   for i in range(n):
        @spawn(t[n+i],dependencies=[a[0]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

When the acc operation binds the parameter via inout the values are sometimes None and yield runtime exceptions. This only happens on GPU, not on CPU.

@yinengy
Copy link
Contributor

yinengy commented Apr 16, 2023

Thanks for reporting this issues. Fine grained slicing in this branch has known bug which has been fixed in experiment-parla but has not sync back to this repo yet. Will make a PR to bring the patch back

@yinengy yinengy added the bug Something isn't working label Apr 16, 2023
@yinengy yinengy linked a pull request Apr 16, 2023 that will close this issue
@yinengy
Copy link
Contributor

yinengy commented Apr 16, 2023

#150 is created which solves coherence bugs in PArray.

Whats more, your second exmaple doesn't work since it voilate the PArray's restriction that doesn't allow moving multiple overlapping subarrays at the system without a writeback. That also includes the same subarray on different device (e.g. task_a might create arr[0] on gpu 0 but task_b will read arr[0] to gpu 1). This is a TODO that will be supported in next parla release but not yet.

So to make you example work in current version, you need a writeback task that writeback task_a's changes to cpu before task_c begins. Which should be:

def main():
    t = TaskSpace("tasks")
    a = TaskSpace("acc")

    n  = 4
    arr = asarray(np.zeros((n,256,256)))

    for i in range(n):
        @spawn(t[i], output=[arr[i]],placement=gpu)
        def task_a():
            arr[i] = 1

    @spawn(a[0],dependencies=[t[:n]], inout=[arr],placement=gpu)  # here, inout is required to trigger writeback
    def writeback():
        pass # do nothing

   for i in range(n):
        @spawn(t[n+i],dependencies=[a[0]], input=[arr[i]],placement=gpu)
        def task_b():
            print(arr[i].mean())
    
if __name__ == "__main__":
    with Parla():
        main()

@yinengy yinengy added enhancement New feature or request and removed bug Something isn't working labels Apr 16, 2023
@yinengy
Copy link
Contributor

yinengy commented Apr 16, 2023

The changes have been merged into the main branch and the tutorial is also updated based on that. Will leave the issues as open for enhancement to get rid of the requirement of using writeback task.

@mebenstein
Copy link
Author

The merge does not resolve the issue, even with a writeback. Some objects are still None and the output is essentially the same.
Also, I think someone might have left some debug messages in the code because the program now prints the following:

write: NA::subarray::[0]
read: NA::subarray::[2]
write: NA::subarray::[3]
read: NA::subarray::[0]
write: NA::subarray::[1]
read: NA::subarray::[3]
read: NA::subarray::[1]
write: NA::subarray::[2]
write: NA

@yinengy
Copy link
Contributor

yinengy commented Apr 18, 2023

My bad, has removed the debug string please pull the changes.

I have reproduced the bug and looks like there is still a bug in current version of parla but not in new parla. Will look at it.

@yinengy yinengy added the bug Something isn't working label Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants