Using Adams2019 autoscheduler for device offload cases #5239
shubhamp-ca
started this conversation in
Ideas
Replies: 2 comments 3 replies
-
@abadams Any thoughts? |
Beta Was this translation helpful? Give feedback.
0 replies
-
There is effort going on the for the GPU, but it's for GPU-only schedules, so offloading isn't really considered. In that work the search space, featurization, and model are all somewhat different to the CPU. I don't think there's anything useful to transfer over to a hexagon autoscheduler that tries to do heterogeneous compute. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am working on a change in apps/autoscheduler that enables emitting schedule for hexagon offload case. It involves the user specifying which stage to be offloaded using the ".hexagon()" directive and autoscheduler picking the correct vector width while autoscheduling depending on whether a stage is offloaded or not. It is to be noted that it is possible for the autoscheduler to offload a stage without the user explicitly directing the offload of that stage if, it is computed within the loopnest of a stage offloaded by the user.
This change doesn't modify the featurization to accommodate device features separately and also, it uses a single weights file for the cost model as opposed to possibly different weights for host and device. Autotuner performance is good with this change but not so much when using autoscheduler in direct mode as hexagon trained weights aren't available. I am curious to know if there is already any effort going on for other device offload cases (GPU?) and how that is implemented, if at all. Any ideas for implementing device offload in autoscheduler will be very helpful. If interested, I will be happy to open a PR for this change.
Beta Was this translation helpful? Give feedback.
All reactions