Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JOSS review #21

Open
gdalle opened this issue Nov 10, 2024 · 4 comments
Open

JOSS review #21

gdalle opened this issue Nov 10, 2024 · 4 comments

Comments

@gdalle
Copy link
Contributor

gdalle commented Nov 10, 2024

Hi, and congrats on your JOSS submission! I'm one of your reviewers and I'll be writing my remarks below as the review progresses. You can find my checklist here.

Paper

  • The paper is longer than 1000 words, which is the recommended limit. However I don't think shortening it would be helpful, because it does contain useful mathematical background.
  • There are missing and invalid DOIs among the references.
  • L15: The right citation for the Julia language is usually the SIAM Review paper, not the Arxiv preprint you cited.
  • L30-32: How do the following ingredients come into play inside your package? This is not at all explained in the paper.
    • symbolic differentiation
    • optimization
    • numerically stable computation
  • L30-32: Why do you use symbolic differentiation (with Symbolics.jl) and not algorithmic differentiation (e.g. with Zygote.jl or Enzyme.jl)?
  • L32: The citation you picked for "numerically stable computation" does not seem related to Julia but to R? In fact, I would assume that this third ingredient is also widely available in other languages as well.
  • L33: What kind of benchmarks allow you to state that "ExpFamilyPCA.jl delivers speed"?
  • L54: It would be useful to define the exponential family in general (as well as its natural and mean parameter) before diving into the details of the link function.
  • L61: The link called "appendix" points to a documentation page, which may become out of date if you change the structure of your docs. The same goes for other external links in the paper.
  • L81: Is it standard to introduce regularization around a $\mu_0$, or did you suggest it yourselves? How do you pick $\mu_0$ in practice?
  • L88: Which figure do you recreate? Can you give more details on what these "belief profiles" represent?
  • L88: Is the figure recreation reproducible?
  • L91: "PCA struggles even with 10 basis" is missing the word "components".
  • L104: It would help to clarify the types of the objects that fit!, compress and decompress work with.
@gdalle
Copy link
Contributor Author

gdalle commented Nov 10, 2024

Code

  • You should probably run CI on the latest Julia version (called 1 in the GitHub action you use) in addition to 1.10.
  • You may want to turn your example from the index into a README test to make sure it doesn't get outdated.
  • It is useful to display what percentage of your code is covered by the test suite. At the moment, the Codecov part of the test workflow is failing because you did not specify a token. See this tutorial to set it up properly.
  • Did you profile the code to see where it spends most of its time and avoid classic performance pitfalls (type instability for example)? If you care about optimal speed, you can take a look at the Julia performance tips for numerous ways to accelerate your code.
  • Are there any benchmarks of your package against competitors?
  • For optimization, you seem to be using the default algorithm of Optim.jl, which is the zero-order Nelder-Mead algorithm. Is there any reason not to pick a first- or second-order method? With automatic differentiatin you can get at least gradients basically for free.
  • The documentation on EPCA objectives mentions 7 ways to construt the EPCA object, but the src/constructors folder only shows 4. Any reason why those four get special treatment?
  • It seems like CompressedBeliefMDPs.jl functionality is not useful for the majority of potential users, so maybe it could be a package extension instead of a hard dependency?
  • Different results for two consecutive calls to fit! on the same data #22

@gdalle
Copy link
Contributor Author

gdalle commented Nov 10, 2024

Documentation

  • You may want to turn your example from the home page (as well as any other bits of code in the docs) into Documenter doctests to make sure they don't get outdated.

Math

Bregman divergences

$f(\mu) = \nabla_\mu F(\mu)$ is the convex conjugate (defined later) of $F$

I'm not sure I understand. Isn't this just the gradient?

Similarly, we also have $\theta = f(\mu)$

How do $f$ and $F$ relate to the exponential family defined by $h$ and $G$? This is not yet specified.

The last line is equivalent to $B_F(x \Vert \mu)$ up to a constant

Why is that the case?

EPCA objectives

Recall from the introduction] that the regularized EPCA objective aims to minimize the following expression:

The hyperparameter $\mu$ was called $\mu_0$ in other places.

Given that $g$ is strictly increasing (as $G$, the conjugate of $F$, is strictly convex and differentiable), we can compute $g$ numerically.

You probably mean $g^{-1}$?

In summary, the EPCA objective function and the decompression function $g$ can be derived from various components.

It would be nice to add a table summing up how specific versions like BernoulliEPCA are defined (using one of the combinations 1 to 7).

Constructors

Gamma

A_upper: The upper bound for the matrix A, default is -eps().

Why is the default upper bound negative? Wouldn't typemax make more sense as a catch-all upper bound?

API documentation

  • It would be good to use doctests to prevent docstring examples from getting out of sync with the code.
  • The package DocStringExtensions.jl can also be helpful here.

metaprogramming::Bool: Enables metaprogramming for symbolic calculus conversions. Default is true.

This is not very clear to an average user, even to one familiar with autodiff.

@FlyingWorkshop
Copy link
Collaborator

FlyingWorkshop commented Nov 18, 2024

Thank you again for all the incredibly detailed feedback @gdalle! Working on addressing the feedback as I go.

  • L15: Verify the citation for the Julia language; it should be the SIAM Review paper, not the Arxiv preprint.
  • L30-32: Explain how the following ingredients are integrated into your package: Added a footnote of explanation in the paper
    • Symbolic differentiation
    • Optimization
    • Numerically stable computation
  • L30-32: Justify the use of symbolic differentiation (with Symbolics.jl) instead of algorithmic differentiation (e.g., Zygote.jl or Enzyme.jl). Also addressed in the new footnote
  • L32: Review the citation for "numerically stable computation"; it appears to reference R instead of Julia and might be more broadly applicable to other languages. Also addressed in the new footnote
  • L33: Provide benchmarks to substantiate the claim that "ExpFamilyPCA.jl delivers speed."
  • L54: Define the exponential family in general, including its natural and mean parameters, before discussing the link function.
  • L61: Update the "appendix" link to avoid pointing to documentation that may become outdated; ensure other external links are similarly reviewed.
  • L81: Clarify if introducing regularization around a specific $\mu_0 $
    is standard practice or your own contribution. Explain how $\mu_0$ is chosen in practice.
  • L88: Provide more details about the figure recreation, including:
    • Which figure is being recreated
    • What the "belief profiles" represent
    • Whether the figure recreation is reproducible
  • L91: Add the missing word "components" to the sentence "PCA struggles even with 10 basis."
  • L104: Clarify the object types that fit!, compress, and decompress work with.

@FlyingWorkshop
Copy link
Collaborator

Answering these questions, as I have time. Will continue to update this post as I answer more questions.

  • You should probably run CI on the latest Julia version (called 1 in the GitHub action you use) in addition to 1.10.
  • You may want to turn your example from the index into a README test to make sure it doesn't get outdated.
  • It is useful to display what percentage of your code is covered by the test suite. At the moment, the Codecov part of the test workflow is failing because you did not specify a token. See this tutorial to set it up properly.

Agree to all the above, working on these.

  • Did you profile the code to see where it spends most of its time and avoid classic performance pitfalls (type instability for example)? If you care about optimal speed, you can take a look at the Julia performance tips for numerous ways to accelerate your code.
  • Are there any benchmarks of your package against competitors?

No. Besides traditional PCA, there are no other EPCA implementations in Julia, so not sure if benchmarking would make sense for most of the distributions. That said, I did do some testing with MultivariateStats.jl's implementation of traditional PCA which is faster than our implementation of Gaussian EPCA. I haven't looked at their source code, but I suspect it's because they use the closed form solution for PCA whereas we use the same general iterative optimization procedure for Gaussian EPCA that we use for all EPCA objectives. I suspect it would be very hard (and not entirely the package's focus) to implement a faster version of PCA than MultivariateStats.jl's.

  • For optimization, you seem to be using the default algorithm of Optim.jl, which is the zero-order Nelder-Mead algorithm. Is there any reason not to pick a first- or second-order method? With automatic differentiatin you can get at least gradients basically for free.

I did some crude benchmarking using BenchmarkTools while designing the optimization pipeline. Surprisingly, I found that higher order methods performed roughly the same as or much slower than NM.

  • The documentation on EPCA objectives mentions 7 ways to construt the EPCA object, but the src/constructors folder only shows 4. Any reason why those four get special treatment?

There are in fact many ways to derive the EPCA objectives, more than the 7 ways I listed in the documentation. The 4 I ended up picking where the ones that I believed would be most useful and efficient in practice. EPCA1 through EPCA4 each represent an 'archetypal' specification of the EPCA objective. For example, EPCA1 represents the EPCA objective using $F$ and $g$. However, we can derive $g$ from $G$, so we can equally-well specify the EPCA objective from $F$ and $G$, but this specification is marginally slower than the former because it requires symbolic differentiation. Thus, $F$ and $g$ is a better 'archetypal' struct than $F$ and $G$. We can make similar arguments for the other numbered EPCA structs. In short, each numbered EPCA struct is a way to specify the EPCA objective that is quick in practice. The other methods housed in each file are just more more involved ways to specify the same archetype.

Let me know if this explanation makes sense. Happy to clarify.

  • It seems like CompressedBeliefMDPs.jl functionality is not useful for the majority of potential users, so maybe it could be a package extension instead of a hard dependency?

Agree, looking into this.

Will try to add a test for this. Also saw your previous comment on changing similar to copy. That was a great catch and your analysis was very helpful; it's highly appreciated. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants