Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[AMD] Improve ConvertToBufferOps with range analysis (#5563)
This PR adapts upstream's `IntegerRangeAnalysis` in order to infer non-negativity (for various ops). Specifically we extend/adapt upstream's inference on shaped types (specifically `tensor`here) in two ways: 1. Support `GetProgramIdOp`, `MakeRangeOp`, `SplatOp`, `ExpandDimsOp`: * `GetProgramIdOp` receives range `[0, 65536]` for now; * `MakeRangeOp` receives range `[start, end]`; * `SplatOp` receives range equal to its operand (if the operand itself has an inferred range); * `ExpandDimsOp` likewise; 2. Support inference in the body of a `LoopLikeOpInterface` op with statically known bounds: * This is accomplished by essentially driving `SparseForwardDataFlowAnalysis` to behave like an abstract interpreter, i.e., each loop body (again: given statically known loop bounds) is visited/propagated `loopTripCount` **for each argument lattice**; * for nested loops, each body is visited/propagated `prod_i(loopTripCount_i)` for each `i` enclosing loop; * **Note**: for loops with non-statically known loop bounds **and loops with trip counts larger than `kDefaultMaxTripCount`** (1024), we fall back to the upstream behavior, i.e., loop arg lattices and body value lattices are short-circuited ("snapped") to `IntegerValueRange::getMaxRange`. The net benefit is `buffer_load` is now inferred in additional cases; see `forOpWithHints`, `condBranch`, and `select` tests in the checked in lit test. Note, the checked in test is "downstream" of `canonicalize-pointers` because `canonicalize-pointers` is a pre-requisite for effectively using this new functionality since the range analysis operates on `tensor<...xi64>` or `tensor<...xi32>`. Note, also, the checked-in lit test tests both the "range analysis" (`__amdgpuconvertbufferops.output_range` attributes are preserved) and buffer ops inference.
- Loading branch information