Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start using DifferentiationInterface.jl #415

Closed
avik-pal opened this issue Apr 24, 2024 · 24 comments · Fixed by #468
Closed

Start using DifferentiationInterface.jl #415

avik-pal opened this issue Apr 24, 2024 · 24 comments · Fixed by #468

Comments

@avik-pal
Copy link
Member

Seeing Optimization.jl attempt to use DI, it seems like we have most of the features in place. We might be able to drop a lot of unnecessary code my making that shift!

@avik-pal
Copy link
Member Author

@gdalle I am nearly done integrating DI into NonlinearSolve. The only feature I don't know how to use is specifying

  1. A sparse jacobian as the direct prototype. SparseDiffTools has a JacPrototypeSparsityDetection(; jac_prototype)
  2. If I already have the matrix coloring PrecomputedJacobianColorvec(; jac_prototype, f.colorvec, partition_by_rows = ADTypes.mode(ad) isa ADTypes.ReverseMode)

@avik-pal
Copy link
Member Author

Currently the code is like:

@inline function __sparsity_detection_alg(f::NonlinearFunction, ad::AutoSparse)
    if f.sparsity === nothing
        if f.jac_prototype === nothing
            is_extension_loaded(Val(:Symbolics)) && return SymbolicsSparsityDetection()
            return ApproximateJacobianSparsity()
        else
            jac_prototype = f.jac_prototype
        end
    elseif f.sparsity isa AbstractSparsityDetection
        f.jac_prototype === nothing && return f.sparsity
        jac_prototype = f.jac_prototype
    elseif f.sparsity isa AbstractMatrix
        jac_prototype = f.sparsity
    elseif f.jac_prototype isa AbstractMatrix
        jac_prototype = f.jac_prototype
    else
        error("`sparsity::typeof($(typeof(f.sparsity)))` & \
               `jac_prototype::typeof($(typeof(f.jac_prototype)))` is not supported. \
               Use `sparsity::AbstractMatrix` or `sparsity::AbstractSparsityDetection` or \
               set to `nothing`. `jac_prototype` can be set to `nothing` or an \
               `AbstractMatrix`.")
    end

    if SciMLBase.has_colorvec(f)
        return PrecomputedJacobianColorvec(; jac_prototype,
            f.colorvec,
            partition_by_rows = ADTypes.mode(ad) isa ADTypes.ReverseMode)
    else
        return JacPrototypeSparsityDetection(; jac_prototype)
    end
end

ApproximateJacobianSparsity would map to DenseSparsityDetector, and I will clean up the above logic a bit, but we do allow users to pass in a prototype matrix and a colorvec.

@gdalle
Copy link
Collaborator

gdalle commented Sep 26, 2024

For passing a known sparsity pattern, you can use ADTypes.KnownJacobianSparsityDetector instead of a detector from Symbolics.jl or SparseConnectivityTracer.jl. Just put it inside AutoSparse and it should work fine.

For passing the color vector I don't have a good answer right now, because this logic is handled by SparseMatrixColorings.jl and the coloring result types there are a bit more complex than just a vector of colors (this is especially true for Hessian coloring, which didn't use to be a thing in SparseDiffTools.jl). Let me think about it a little.

Also, ApproximateJacobianSparsity doesn't exactly map to DI.DenseSparsityDetector but it might be a good drop-in replacement.

@avik-pal
Copy link
Member Author

avik-pal commented Sep 26, 2024

Thanks. I am fine with some manual combination of SparseMatrixColorings and DI if that is easy. This functionality is not exposed to the user so we have some leeway here.

@gdalle
Copy link
Collaborator

gdalle commented Sep 26, 2024

The main issue I see is that this code mixes sparsity detection and coloring, like SparseDiffTools does, whereas the AutoSparse interface separates those two. Any idea how to get around that?

@avik-pal
Copy link
Member Author

the sparsity_detection_alg can be re-written to return an AutoSparse type. Essentially like this:

  1. ad isa dense type.
    1. if sparsity is provided -> AutoSparse(ad;. ....) and some other cases with known sparsity patterns and such
    2. if no sparsity info provided use dense ad
  2. if ad is already a sparse type
    1. Ideally there is no sparsity information
    2. If there is in the problem definition, try resolving it. (the details here are not super important, but I will document it at some point)

@avik-pal
Copy link
Member Author

The only part that doesn't fit in here is that we can have the colorvec, so we just need to do the AD in that case

@gdalle
Copy link
Collaborator

gdalle commented Sep 26, 2024

Maybe I could add a KnownColorsColoringAlgorithm to SparseMatrixColorings, mirroring the KnownJacobianSparsityDetector?

@avik-pal
Copy link
Member Author

I think that would do it

@ChrisRackauckas
Copy link
Member

Approximate colorings can be used in order to give an accelerated Jacobian. Is there a way to do approximate colorings?

Also, using colorings on a dense Jacobian can be useful if the sparsity would is not enough and would slow down the LU. Can you sparse fill a dense matrix?

@gdalle
Copy link
Collaborator

gdalle commented Sep 27, 2024

At the moment there is no approximate coloring but I could add it to DI.

As for filling matrices with the results of sparse AD, SparseMatrixColorings should work with any mutable AbstractMatrix. However it is only optimized for SparseMatrixCSC right now. More structured matrix types could be added in package extensions.

@ChrisRackauckas
Copy link
Member

I think the main case of it is using a chosen color vector with a dense matrix. The reason for that is again if sparse LU is slow on the matrix (sparse matrices really need to be "sufficiently large" for sparse factorization to make sense) but coloring still makes sense to cut down the Jacobian cost.

@avik-pal avik-pal pinned this issue Sep 27, 2024
@avik-pal avik-pal linked a pull request Sep 27, 2024 that will close this issue
@avik-pal
Copy link
Member Author

Is it possible to query DI to get a Jacobian structure (basically the result of DI.jacobian but not computing the actual jacobian)? It is needed to ensure that we can create a concrete cache on init.

Note that similar doesn't work because jacobian might give a SMatrix.

@amontoison
Copy link

amontoison commented Sep 27, 2024

Currently, we have four variants of coloring in SparseMatrixColorings.jl.
For some of them, we can easily use a user-provided coloring (vector of integers) and perform the decompression without loss of performance.

  • Row coloring (optimal)
  • Column coloring (optimal)
  • Star coloring (possible but slow without the structure StarSet)
  • Acyclic coloring (not possible; the structure TreeSet is needed)

StarSet and TreeSet are structures generated during the coloring.

Star and acyclic coloring are generally used on Hessians, while row and column coloring do not exploit symmetry and are mostly used for Jacobians.
I think just supporting colorvec as input for an expected row or column coloring is the best alternative.
You can use it for both Jacobians and Hessians if you don't want to perform a new coloring.
You will keep all the features from SparseDiffTools.jl.

@gdalle
Copy link
Collaborator

gdalle commented Sep 27, 2024

Is it possible to query DI to get a Jacobian structure (basically the result of DI.jacobian but not computing the actual jacobian)? It is needed to ensure that we can create a concrete cache on init.

Not at the moment no. Are we talking about sparse AD or general AD?
If you're creating a cache that means you're gonna run jacobian many times, so I think a good strategy is just to actually run it once to get a cache?

Note that similar doesn't work because jacobian might give a SMatrix.

But then what kind of cache do you want to get in this case? similar will give you an MMatrix, which seems like the only reasonable choice?

@avik-pal
Copy link
Member Author

Not at the moment no. Are we talking about sparse AD or general AD?
If you're creating a cache that means you're gonna run jacobian many times, so I think a good strategy is just to actually run it once to get a cache?

Sorry I meant cache for NonlinearSolve. I am currently running it once to get the structure as you mention.

My structure I mean, if the returned J is mutable then DI.jacobian! should work on it and if it is immutable then I shouldn't be able to do DI.jacobian!

@gdalle
Copy link
Collaborator

gdalle commented Sep 27, 2024

Then I would recommend you just copy the output of DI.jacobian? But keep in mind that I make no guarantee about the exact return type of the Jacobian matrix, because the backends don't either. So even if you give SVectors as input, with one backend you may get SMatrix, with another MMatrix, and with a third one a plain Matrix. The only thing that is guaranteed is the correctness of the values inside.

@avik-pal
Copy link
Member Author

Right. This is what I will do (am doing on master) and seems like a fine solution since users caring too much about performance (1 jac call) should be using the caching interface anyways

@gdalle
Copy link
Collaborator

gdalle commented Sep 27, 2024

@avik-pal does this do what you want in terms of fixing the vector of colors? And it is explained clearly enough?
gdalle/SparseMatrixColorings.jl#127

@avik-pal
Copy link
Member Author

I think so. Do you need to make the conversion to sparse array? If you have a BandedMatrix it seems like a bad idea to make it Sparse because then the jacobian function with presumably return a sparse array instead of a banded matrix and linear solve will hit a regression

@gdalle
Copy link
Collaborator

gdalle commented Sep 27, 2024

At the moment yes, there will be a conversion to SparseMatrixCSC because custom matrix types (like banded) are not yet supported by SparseMatrixColorings.jl. However, with ConstantColoringAlgorithm, this conversion to SparseMatrixCSC will only happen once (at creation).

What matters to me right now is getting the necessary functionality up and running, then we can see what the benchmarks tell us about performance regressions. But a fair warning that may not be obvious to everyone: when switching to the DI ecosystem, there will be performance regressions in some cases, and performance improvements in others. A good example is the Brusselator AD benchmark, where sparsity detection is 100x faster, coloring 10x faster, and differentiation 2x slower (although this has probably improved with the latest DI and SMC releases, I'll have to run it again).
To adapt to your use cases and milk the last drops of performance, first we need to get DI into NonlinearSolve, and then we'll identify the bottlenecks together. Most functionalities in DI have never been benchmarked on realistic use cases, so I'll definitely need your help for that once the switch passes correctness tests.

@avik-pal
Copy link
Member Author

Let me make a branch using DI for sparse jacobians, but I will hold off on merging that for now.

@gdalle
Copy link
Collaborator

gdalle commented Sep 28, 2024

FYI the new version of SparseMatrixColorings that includes ConstantColoringAlgorithm is out

@gdalle
Copy link
Collaborator

gdalle commented Oct 1, 2024

I re-ran the Brusselator sparse AD benchmark with the latest versions of everything, and DI is now on par with SparseDiffTools: https://docs.sciml.ai/SciMLBenchmarksOutput/dev/AutomaticDifferentiationSparse/BrusselatorSparseAD/

@avik-pal avik-pal linked a pull request Oct 2, 2024 that will close this issue
6 tasks
@avik-pal avik-pal unpinned this issue Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants