Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSS] Introduce structured(sparse) data to TVM. #368

Closed
yzh119 opened this issue Apr 19, 2021 · 2 comments
Closed

[DISCUSS] Introduce structured(sparse) data to TVM. #368

yzh119 opened this issue Apr 19, 2021 · 2 comments

Comments

@yzh119
Copy link
Collaborator

yzh119 commented Apr 19, 2021

Note: This RFC is more of a research idea brainstorm rather than discussion on current status of TIR development.

Halide and TVM decouples compute and schedule description so that we can only focus on optimizing schedules to accelerate workloads (halide auto-scheduler, ansor). While TACO and Taichi decouples compute and data format description, and they can generate different kernels corresponding to different input data formats (usually represented as a tree-hierarchical representation, see TACO paper).

There was a RFC discussing whether we add TACO's format declaration and Sparse Tensor data structures for TVM, and it was not active for over a year. Without extra schedule primitive support, the benefit of introducing Sparse Tensor in TVM is not clear as we just need to maintain several dense tensors: (indptr, indices, val) for CSR representation and (indptr, indices, val, block_size) for BSR representation, we can then define common operators such as SpMM with these denses tensors as input in TVM, and manually write schedules for these operators (e.g. featgraph).

Recently, TACO team published an paper(we refer to it as TACO-sche) on schedule primitive support for structured data and (very limited) auto-scheduler on top of that. This paper is based on TACO, the key idea is we cannot directly rewrite array access information because the transformation depend on the non-zeros elements of sparse matrix, instead they keep a log of history transformations (and call it the provenance graph), and recompute indices/deriving loop boundaries/iteration guard by inference on the provenance graph (this brings some runtime overhead but necessary for sparse data).

image

The schedule primitives supported in TACO-sche is very similar to that of Halide/TVM's, except for two: pos/coord, pos binds a itervar to a sparse structure and coord unbinds a itervar attached to a sparse structures and make it a free itervar iterates over the coordinate iteration space. During the codegen phase, some statements (e.g. binary search for random access on sparse axes in sparse tensors) are inserted to derive indices/bounds after transformation. The following figure depicts some rules for loop bound propagation:

image

The experiment result mentioned in TACO-sche paper is not very promising (both SpMM/SDDMM are slower than existing standard libraries), however, their scheduler is very limited (no use of scratchpad memory, no tensorization), also, speed of sparse computation is highly data-dependent and they didn't utilize input data information.

I think provenance graph is a good design and we are expected to get much better performance if we can exploit larger schedule space (especially tensorization though I'm not sure if sparse algebra benefits much from tensorcore, I need to do some experiments verifying this point). Note that all TACO-sche compiler did is to transform iteration graphs (e.g. tensor IRs in this project) and they didn't change the original storage format of each tensor.

A promising direction is to support format declaration language in TVM(TIR) and data-dependent auto-tuning w/ tensorization support. Relay level, there is still room to optimize because we can tune the storage format (block size in blocksparse format, etc), and notice that format transformation is costly (actually format transformation should be viewed as operators like view), we can conduct graph-level optimization to globally optimize a end-to-end workload involving sparse algebras.

Potential applications are GNNs, transformers applied on sparse data structures(the patterns are usually fixed and I suppose auto-tuning can help us finding the best schedule), and physics simulation(see Taichi and AutoTaichi, they support autodiff for differentiable rendering).

This RFC also didn't cover codegen for recursive neural networks (e.g. Cortex), though interesting, I don't know how to unify the the workload of recursive models and sparse matrix operations yet.

@tqchen
Copy link
Contributor

tqchen commented Apr 19, 2021

I think having sparse data structure support is great. Would be useful to discuss how to embbed such iteration constraints and transformations as a special kind of Block in TensorIR, so we can leverage the same benefit of infra we have so far(meta-schedule, imperative scheduling, and mix-match with dense schedules)

@junrushao
Copy link
Member

Consolidated to #466

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants