Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sign in adjoint derivative calculation #35

Open
spenrich opened this issue May 18, 2020 · 3 comments
Open

Sign in adjoint derivative calculation #35

spenrich opened this issue May 18, 2020 · 3 comments

Comments

@spenrich
Copy link

I've been following the paper "Differentiating Through a Cone Program" and the code side-by-side, and I'm having trouble figuring out if there is a sign error in the adjoint derivative code or if I've misunderstood something.

dw = -(x @ dx + y @ dy + s @ ds)
dz = np.concatenate(
[dx, D_proj_dual_cone.rmatvec(dy + ds) - ds, np.array([dw])])
if np.allclose(dz, 0):
r = np.zeros(dz.shape)
elif mode == "dense":
r = _diffcp._solve_adjoint_derivative_dense(M, MT, dz)
else:
r = _diffcp.lsqr(MT, dz).solution
values = pi_z[cols] * r[rows + n] - pi_z[n + rows] * r[cols]
dA = sparse.csc_matrix((values, (rows, cols)), shape=A.shape)
db = pi_z[n:n + m] * r[-1] - pi_z[-1] * r[n:n + m]
dc = pi_z[:n] * r[-1] - pi_z[-1] * r[:n]
return dA, db, dc

It seems like, when compared to the paper, the code solves M.T @ r = dz for r, whereas the paper solves M.T @ g = -dz for g. So r = -g. But then the equations used in the code to compute (dA, db, dc) seem to match those in the paper, when they should all differ by a negative sign.

Similarly, for the forward-mode derivative, you solve M @ dz = dQ @ pi_z for dz, use the same equations as in the paper despite the sign difference, but you multiply (dx, dy, dz) by -1 before returning, so this is fine.

Is this a sign error in the adjoint derivative, or did I get something wrong?

@spenrich spenrich changed the title Possible sign error in adjoint derivative calculation Sign in adjoint derivative calculation May 18, 2020
@sbarratt
Copy link
Collaborator

Good question!

The minus sign makes its way to this line:

values = pi_z[cols] * r[rows + n] - pi_z[n + rows] * r[cols]

Because otherwise it would be

values = -pi_z[cols] * r[rows + n] + pi_z[n + rows] * r[cols] 

Sorry that it's a bit confusing. We should probably just move it up to the solve line. This minus sign was certainly a pain for us when we were initially debugging.

@spenrich
Copy link
Author

I've included some example code (test_diffcp_vjp.txt), where I've just copied in solve_and_derivative_internal and edited the returned DT function to also output r and pi_z. I then run this on the cone program from the readme, explicitly compute -dQ_12.T + dQ_21 (from the paper) with dQ = np.outer(g, pi_z) (where g = -r), and compare this to the value of dA given by solve_and_derivative_internal. For a fixed random seed, I get:

-dQ_12.T + dQ_21 =
[[-2.8563  0.7865 -1.6522  0.0864  1.2953]
 [-0.6479  0.1784 -0.3748  0.0196  0.2938]
 [-1.5848  0.4364 -0.9167  0.0479  0.7187]
 [-0.      0.     -0.      0.      0.    ]
 [ 0.     -0.      0.     -0.     -0.    ]
 [ 1.5824 -0.4357  0.9153 -0.0479 -0.7176]
 [-0.      0.     -0.      0.      0.    ]
 [ 0.     -0.      0.     -0.     -0.    ]
 [ 0.     -0.      0.     -0.     -0.    ]
 [ 0.     -0.      0.     -0.     -0.    ]
 [ 0.     -0.      0.     -0.     -0.    ]]

dA from solve_and_derivative_internal =
[[ 2.8563 -0.7865  1.6522 -0.0864 -1.2953]
 [ 0.6479 -0.1784  0.3748 -0.0196 -0.2938]
 [ 1.5848 -0.4364  0.9167 -0.0479 -0.7187]
 [ 0.     -0.      0.     -0.     -0.    ]
 [-0.      0.     -0.      0.      0.    ]
 [-1.5824  0.4357 -0.9153  0.0479  0.7176]
 [ 0.     -0.      0.     -0.     -0.    ]
 [-0.      0.     -0.      0.      0.    ]
 [-0.      0.     -0.      0.      0.    ]
 [-0.      0.     -0.      0.      0.    ]
 [-0.      0.     -0.      0.      0.    ]]

There seems to be a difference in sign. Am I computing dQ correctly here?

@sbarratt sbarratt reopened this May 20, 2020
@sbarratt
Copy link
Collaborator

It looks like the paper is wrong but the code is right! We're in the process of updating the equations for dA, db, and dc. They should be

image

Thanks so much for finding this and contacting us about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants