Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Describe how to include parameter sizes & shapes #423

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

namurphy
Copy link
Contributor

@namurphy namurphy commented Aug 12, 2022

This pull request describes how to include size & shape information for a parameter in the type line of the Parameters section of docstrings. I adapted the example from numpy.outer. This is following a discussion in #419.

I decided to place the size & shape information at the beginning of the type information because:

  • It is currently used this way in NumPy, at least occasionally.
  • Aligning this information at the beginning makes it easier to see the relationships between the different parameters when skimming through the list. This also has the benefit that the parameter name and size information are right next to each other, which makes it easier to grasp the relationships between the sizes & shapes of the different parameters and return values.

Some of the alternatives I considered were:

  • Include sizes & shapes at the end of the type description rather than at the beginning. I didn't choose this because the size & shapes were less aligned this way, which made it harder to grasp the relationships between the different parameters.
  • Allow the size & shape info to be included either at the beginning or the end of the type information. I didn't do this because having one way to do it will lead to greater standardization, which will lead to greater consistency.
  • Have the size & shape be specified in the parameter description. I didn't do this because the size & shape info would be further separated from the parameter name which would make it harder to grasp the relationships between different parameters.
  • Not include size & shape info be included in the standard. I don't prefer this because it's pretty common to have to do this, and in my experience this info ends up being included fairly inconsistently, even in a single project, and especially across multiple projects. Having a specification for this would also make it easier to write linters and autoformatters for docstrings in the numpydoc style.

I am open to alternative possibilities. It's pretty common to need to include this information in docstrings. It's more important that it is included in the standard than the specific details of how it is included.

Thanks!

@namurphy namurphy changed the title [DOC] Describe how to include parameter sizes & shapes DOC: Describe how to include parameter sizes & shapes Aug 13, 2022
@OriolAbril
Copy link

OriolAbril commented Aug 13, 2022

I also think this style of including shape info when relevant is what numpydoc should recommend. Adding two extra notes for completeness:

Extra pro: It would follow matplotlib's docstring guide (that extends numpydoc) and is followed also (at least) throughout the linalg modules in both numpy and scipy. Reference: https://matplotlib.org/stable/devel/documenting_mpl.html#parameter-type-descriptions, relevant quote:

Use array-like for homogeneous numeric sequences, which could typically be a numpy.array. Dimensionality may be specified using 2D, 3D, n-dimensional. If you need to have variables denoting the sizes of the dimensions, use capital letters in brackets ((M, N) array-like). When referring to them in the text they are easier read and no special formatting is needed.

Extra con: it would make other modules like the polinomyal or some of the random distributions not follow numpydoc anymore because they use the 2-D array_like, of shape (N, N) approach.

@larsoner
Copy link
Collaborator

@rossbar do you want to look since you were involved in the discussions in #419 ?

@rossbar
Copy link
Contributor

rossbar commented Aug 16, 2022

The proposal seems sensible to me, but I'd be hesitant to actually adopt any new suggestions/rules without first having some sense of how much churn the proposed rule would cause in e.g. the numpy & scipy docstrings themselves. The fact that the rule is already included in matplotlib's docstring style guide is indeed a strong point in favor though!

Another angle I think should be considered is how well this will mesh with the work in NumPy's type annotations re: how array shapes are specified. That's not to say that they have to be (or even should be) the same, but we should at least have an idea of where that's going so we don't end up with two completely divergent ways of specifying shapes for type annotations vs. parameter type descriptions in docstrings.

IMO, a great way to push this forward would be to try to systematically evaluate the current state of parameter type descriptions in the docstrings. @OriolAbril notes that this pattern seems to already be followed by the numpy/scipy linalg packages - if there were a listing that showed what set of changes would be necessary to make at least numpy/scipy compliant (preferably a script that could be run on the docs of any library), this would go a long way in supporting the idea that the rule should be enshrined as a style/standard.

Then again, I tend to be conservative when it comes to changing/introducing new standards :)

@OriolAbril
Copy link

I still have many modules and pages to go over, but here are my observations so far:

  • https://numpy.org/doc/stable/reference/routines.linalg.html and https://docs.scipy.org/doc/scipy/reference/linalg.html follow a superset of the proposal and a subset of matplotlib's advise. When the shape is relevant, it is indicated with upercase letters, also using ... to indicate operations that operate over an arbitrary number of batch dimensions.
    • It does not use the 1D, 3D, n-dimensional in the types description, this information is in the type description or notes section, and in general 1-D style is used instead (example norm)

      Arguably, adding that information too would be benefitial, especially in cases where it is in the notes section like kron where there are no restrictions on the shape but both a and b must have the same number of dimensions.

    • It extends matplotlib's guidance when a subset of multiple shapes are allowed, using {(M,), (M, K)} in lstsq for example. But this is not unique nor very extended, (M,) or (M, N) is also used and seems more extended within scipy, most of the solve_ functions use it

    • Some docstrings also use operations in the shape part of the type description: https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.solve_banded.html. In general however, an alias is used, defined somewhere in the description (example qr)

    • Not very consistent with the "no special formatting" last sentence in matplotlib though, both no formatting and monospace formatting are used when referring to shapes in the parameter descriptions. Also uses NxN in some cases

Need to leave now, will continue at some other time.

@jarrodmillman jarrodmillman added this to the 1.5.0 milestone Aug 23, 2022
@jarrodmillman jarrodmillman modified the milestones: 1.5.0, 1.6.0 Oct 6, 2022
@jarrodmillman jarrodmillman modified the milestones: 1.6.0, 1.7.0 Jun 28, 2023
@lucascolley
Copy link
Contributor

lucascolley commented Feb 11, 2024

We just had an interesting case in scipy/scipy#20072 where there was confusion about the fact that the style of

"""
a : (M, N) ndarray
"""

implies that a must be 2-dimensional. Admittedly/embarrassingly, it also took me a second look before I noticed it and remembered that it indicates the shape of the array.

I wonder if there is any way to be clearer about this, since it probably trips up quite a few new users. Would adding the word 'shape' anywhere be possible (and not overkill / worse overall)?

@stefanv
Copy link
Contributor

stefanv commented Feb 12, 2024

Would (M, N) array of floats have made it clearer?

@lucascolley
Copy link
Contributor

Would (M, N) array of floats have made it clearer?

I think so, although the confusion seemed to stem from not understanding that (M, N) refers to the shape at all.

@lucascolley
Copy link
Contributor

I think I prefer how it is done here for example

image

@namurphy
Copy link
Contributor Author

Thank you for bringing this up!

My main goal is that the style guide addresses this situation in a consistent and understandable way, so I'm totally open to including shape in the type description.

Since I originally made this PR, we ended up adding guidelines on type specifications in PlasmaPy's documentation guide (permalink). A lot of it came from matplotlib, plus a few other projects. Looking back at this again, I'm thinking that it'd be helpful for the guidelines to address uncommon edge cases too (like when there are multiple distinct shapes an array could have, or when different types have different shapes).

Thank you again!

@jarrodmillman jarrodmillman removed this from the 1.7.0 milestone Mar 6, 2024
@ricardoV94
Copy link

ricardoV94 commented Mar 7, 2024

Would (M, N) array of floats have made it clearer?

I think so, although the confusion seemed to stem from not understanding that (M, N) refers to the shape at all.

For context, that was not in fact the source of confusion. The confusion was that it was not mentioned that batch dimensions are not supported, and the function silently returning nonsense for >2D cases.

The syntax like (..., M, N) usually clarifies core vs batch cases, but the ommission of ellipsis to specify batch dimensions are not allowed/undefined behavior is imo too subtle/insufficient. I think this should always be stated explicitly if not checked by the function, given the widespread expectation that numpy-like code will handle batch dimensions correctly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants