Using Adams2019 autoscheduler for device offload cases #5239

shubhamp-ca · 2020-09-02T18:14:26Z

shubhamp-ca
Sep 2, 2020

I am working on a change in apps/autoscheduler that enables emitting schedule for hexagon offload case. It involves the user specifying which stage to be offloaded using the ".hexagon()" directive and autoscheduler picking the correct vector width while autoscheduling depending on whether a stage is offloaded or not. It is to be noted that it is possible for the autoscheduler to offload a stage without the user explicitly directing the offload of that stage if, it is computed within the loopnest of a stage offloaded by the user.

This change doesn't modify the featurization to accommodate device features separately and also, it uses a single weights file for the cost model as opposed to possibly different weights for host and device. Autotuner performance is good with this change but not so much when using autoscheduler in direct mode as hexagon trained weights aren't available. I am curious to know if there is already any effort going on for other device offload cases (GPU?) and how that is implemented, if at all. Any ideas for implementing device offload in autoscheduler will be very helpful. If interested, I will be happy to open a PR for this change.

shubhamp-ca · 2020-09-09T15:47:16Z

shubhamp-ca
Sep 9, 2020
Author

@abadams Any thoughts?

0 replies

abadams · 2020-09-09T17:50:24Z

abadams
Sep 9, 2020
Maintainer

There is effort going on the for the GPU, but it's for GPU-only schedules, so offloading isn't really considered. In that work the search space, featurization, and model are all somewhat different to the CPU. I don't think there's anything useful to transfer over to a hexagon autoscheduler that tries to do heterogeneous compute.

3 replies

pranavb-ca Sep 9, 2020
Collaborator

So, if I understand this correctly, the GPU effort executes every stage on the GPU like (homogenous compute). Is there a plan to make the GPU autoscheduler effort target heterogenous compute?

abadams Sep 9, 2020
Maintainer

No, I don't think that's something anyone is thinking about. The target is typically discrete GPUs where it doesn't make sense to go back and forth between the GPU and CPU

pranavb-ca Sep 9, 2020
Collaborator

I see, that makes sense. For us the heterogenous case makes sense because there are things (e.g Floating point) that we'd do on the CPU and some other things on the DSP. @shubhamp-ca is working on a way for the user to say if he'd like a particular stage on Hexagon and teh autoscheduler uses that as input. I'll let him chime in with the details (which may even come in the form of a PR). Would love your thoughts on the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using Adams2019 autoscheduler for device offload cases #5239

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using Adams2019 autoscheduler for device offload cases #5239

shubhamp-ca Sep 2, 2020

Replies: 2 comments · 3 replies

shubhamp-ca Sep 9, 2020 Author

abadams Sep 9, 2020 Maintainer

pranavb-ca Sep 9, 2020 Collaborator

abadams Sep 9, 2020 Maintainer

pranavb-ca Sep 9, 2020 Collaborator

shubhamp-ca
Sep 2, 2020

Replies: 2 comments 3 replies

shubhamp-ca
Sep 9, 2020
Author

abadams
Sep 9, 2020
Maintainer

pranavb-ca Sep 9, 2020
Collaborator

abadams Sep 9, 2020
Maintainer

pranavb-ca Sep 9, 2020
Collaborator