Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize ops.triangular_solve() for 1x1 matrices #564

Merged
merged 2 commits into from
Oct 5, 2021

Conversation

fritzo
Copy link
Member

@fritzo fritzo commented Oct 5, 2021

Addresses #559

This adds a scalar special case for the torch implementation of ops.triangular_solve. This op is frequently used in Gaussian funsor variable elimination, and the scalar case is very common.

Profiling results

I've tested on both a profiling jig (big speedup!) and the full pyro-cov model (minor speedup). The jig is invoked via

FUNSOR_PROFILE=100 python -m tests.infer.autoguide.test_gaussian -s 700 -n 10 --cuda --no-jit

The full model is invoked with the jit enabled.

Before: full model speed = 11.14 iters/sec

     count triangular_solve(b.shape, A.shape)
       682 total
       341 ((1, 1), (1, 1))
        88 ((701, 702, 1, 1), (701, 702, 1, 1))
        88 ((702, 1, 1), (702, 1, 1))
        44 ((50, 701, 1, 1), (50, 701, 1, 1))
        44 ((703, 1), (703, 703))
        22 ((701, 702, 1, 2), (701, 702, 1, 1))
        11 ((50, 701, 1, 2), (50, 701, 1, 1))
        11 ((702, 1, 2), (702, 1, 1))
        11 ((702, 1, 705), (702, 1, 1))
        11 ((703, 3), (703, 703))
        11 ((1, 2), (1, 1))
      time triangular_solve(b.shape, A.shape)
  1.359707 total
  0.889114 ((702, 1, 705), (702, 1, 1))
  0.213148 ((701, 702, 1, 1), (701, 702, 1, 1))
  0.125360 ((702, 1, 1), (702, 1, 1))
  0.050415 ((1, 1), (1, 1))
  0.048711 ((701, 702, 1, 2), (701, 702, 1, 1))
  0.013856 ((50, 701, 1, 1), (50, 701, 1, 1))
  0.009279 ((703, 1), (703, 703))
  0.003506 ((50, 701, 1, 2), (50, 701, 1, 1))
  0.002740 ((703, 3), (703, 703))
  0.001800 ((702, 1, 2), (702, 1, 1))
  0.001777 ((1, 2), (1, 1))

After: full model speed = 11.75 iters/sec

     count triangular_solve(b.shape, A.shape)
       682 total
       341 ((1, 1), (1, 1))
        88 ((701, 702, 1, 1), (701, 702, 1, 1))
        88 ((702, 1, 1), (702, 1, 1))
        44 ((50, 701, 1, 1), (50, 701, 1, 1))
        44 ((703, 1), (703, 703))
        22 ((701, 702, 1, 2), (701, 702, 1, 1))
        11 ((50, 701, 1, 2), (50, 701, 1, 1))
        11 ((702, 1, 2), (702, 1, 1))
        11 ((702, 1, 705), (702, 1, 1))
        11 ((703, 3), (703, 703))
        11 ((1, 2), (1, 1))
      time triangular_solve(b.shape, A.shape)
  0.035639 total
  0.012077 ((1, 1), (1, 1))
  0.009859 ((703, 1), (703, 703))
  0.003245 ((702, 1, 1), (702, 1, 1))
  0.003106 ((701, 702, 1, 1), (701, 702, 1, 1))
  0.003078 ((703, 3), (703, 703))
  0.001585 ((50, 701, 1, 1), (50, 701, 1, 1))
  0.000910 ((701, 702, 1, 2), (701, 702, 1, 1))
  0.000463 ((702, 1, 705), (702, 1, 1))
  0.000459 ((50, 701, 1, 2), (50, 701, 1, 1))
  0.000438 ((702, 1, 2), (702, 1, 1))
  0.000419 ((1, 2), (1, 1))

Tested

  • added a unit test

@fritzo fritzo requested a review from eb8680 October 5, 2021 16:59
@fritzo fritzo added the easy label Oct 5, 2021
@eb8680 eb8680 merged commit eb6c077 into master Oct 5, 2021
@eb8680 eb8680 deleted the optimize-triangular-solve branch October 5, 2021 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants