Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code style changes #3

Open
LiamPattinson opened this issue Sep 26, 2022 · 2 comments
Open

Code style changes #3

LiamPattinson opened this issue Sep 26, 2022 · 2 comments

Comments

@LiamPattinson
Copy link
Contributor

Code style is inherently subjective, so let me know if any of these proposals are objectionable to you!

Function comments -> docstrings

A lot of functions have descriptions written as:

def func(args):
    # description

Converting to docstrings means that help(func) will return useful info to the user, and they can be used for automated documentation generation:

def func(args):
    """description"""

Replace debugging bits with exception handling

Instead of calling pdb.set_trace() when things go wrong, we should just raise an exception with a helpful message. This will aid with automated testing, and it will be more helpful for users that aren't so familiar with the code, as most users won't know how to fix the internal state of the code if there's a mistake in their inputs.

(Also, since Python 3.7, you can call breakpoint() instead of pdb.set_trace(), with no need to import pdb)

Remove commented code

There are some places where code has been left commented out without a description for why the code was left in, e.g.:

VMEC2GK/src/utils.py

Lines 387 to 390 in 60ed3da

#theta_arr1 = np.interp(np.linspace(b_arr[extremum.item()+1], b_arr[-1], int((len1-1)/2)), b_arr[extremum.item():], arr[extremum.item():])
theta_arr1 = np.interp(b_arr[extremum.item()+1:][::spacing], b_arr[extremum.item():], arr[extremum.item():])
#theta_arr2 = np.interp(np.linspace(b_arr[0], b_arr[extremum.item()-1], int((len1-1)/2)), b_arr[:extremum.item()][::-1], arr[:extremum.item()][::-1])
theta_arr2 = np.interp(b_arr[0:extremum.item()][::spacing], b_arr[:extremum.item()+1][::-1], arr[:extremum.item()+1][::-1])

For a new developer coming to the code, it isn't clear if the commented out lines are simply an older version of the code that isn't needed anymore, or if the intention is for users to toggle which lines are commented in order to change behaviour. If the former, it would be best to write a test that confirms the function is behaving as intended, and then to delete the unnecessary code. If the latter, the function should take an optional boolean argument to toggle between the versions.

Formatting

I like to use black to automatically format my code. By default, it restricts line length to 88 characters, and it uses a consistent whitespace scheme to ensure the code is readable and looks professional. It can sometimes result in strange looking code if a one-liner gets too long or if nesting too deeply, but this is usually a good sign that a refactor is necessary anyway! I like to use the default 88 characters, as it means I can comfortably fit three files side-by-side in my editor, and it keeps things easily readable, but it's common for people to choose 120 characters instead. You can configure black for an individual project once it's set up as a pip installable package (Issue #2).

Linting

I also like to use flake8 to catch code style errors that black isn't able to correct (usually issues with comments and docstrings), and to warn of potential errors. For example, if you run the following:

$ pip install flake8
$ flake8 --extend-ignore=E,W src # ignore E,W means we ignore style warnings, only seeing possible code errors

there are a lot of issues such as libraries that are imported but never used, variables that are created but not used afterwards, and a few variables that are used before assignment. It doesn't look like any of these are actual errors, but it's worth cleaning these issues up anyway, as they make it harder to spot real bugs when they do appear.

f-strings

There are some places where we can replace string formatting with f-strings:

fn = "{0}/output_files/GX_nc_files/gx_out_postri_surf_{1}_nperiod_{2}_nt{3}.nc".format(parnt_dir_nam, int(surf_idx), nperiod, ntheta2)

This could be rewritten:

fn = f"{parnt_dir_nam}/output_files/GX_nc_files/gx_out_postri_surf_{int(surf_idx)}_nperiod_{n_period}_nt{ntheta2}.nc"

f-strings are usually a bit faster than regular string formatting, but they're mostly just syntactic sugar. They achieve the same thing in fewer characters, and they're a bit easier to read, as you don't need to keep track of which argument in the format list corresponds to which set of curly braces.

Aside: we can improve it further using Pathlib Paths and spreading it out over a few lines:

filename = (
    Path(parnt_dir_nam)
    / "output_files" 
    / "GX_nc_files"
    / f"gx_out_postri_surf_{int(surf_idx)}_nperiod_{n_period}_nt{ntheta2}.nc"
)

With this, we can do things like test whether the files exists using filename.exists() and throw an exception if it doesn't.

Type hints

We can add type hints to our function calls to make it clearer which types we expect the user to provide. They're mostly just a form a documentation, making it easier for other developers to look at a function and understand what it does at a glance, though some static analysis tools and libraries such as Pydantic do use them to enforce type correctness. For example:

# before
def half_full_combine(arrh, arrf):
    ... function contents ...

# after
def half_full_combine(arrh: np.ndarray, arrf: np.ndarray) -> np.ndarray:
    ... function contents ...
@LiamPattinson
Copy link
Contributor Author

Some further things I could work on...

Simplification of control flow

There are some areas of the code where both parts of an if/else block are very similar, and with some refactoring we could merge the two. This would cut down on repeated code, reducing the likelihood of bugs creeping in and overall leading to a more compact project. For example:

VMEC2GK/src/utils.py

Lines 33 to 44 in bfc8024

if char == 'e': #even array
for i in range(colms):
for j in range(N):
for k in range(rows):
angle = xm[k]*theta[j]
arr_ifftd[i, j] = arr_ifftd[i, j] + np.cos(angle)*arr[i][k]
else: #odd array
for i in range(colms):
for j in range(N):
for k in range(rows):
angle = xm[k]*theta[j]
arr_ifftd[i, j] = arr_ifftd[i, j] + np.sin(angle)*arr[i][k]

The only difference is the use of cos vs sin, so we could refactor as:

    f = np.cos if char == "e" else np.sin
    for i in range(colms):
        for j in range(N):
            for k in range(rows):
                angle = xm[k]*theta[j]
                arr_ifftd[i, j] = arr_ifftd[i, j] + f(angle)*arr[i][k] 

We can also eliminate the slow triple-nested for-loop with the following...

Further leverage NumPy for faster, more compact code

A lot of for loops could be replaced with some NumPy magic. For the section above:

f = np.cos if char == "e" else np.sin
angles = np.outer(xm, theta)
arr_ifftd = np.matmul(arr, f(angles))
# or we could avoid creating angles altogether and do it as a one-liner,
# but then I think we'd lose readability

The main challenge with this sort of refactor is verifying that we aren't changing the results, so I think it would be best to get some unit tests up and running before I start messing with things too much, and that'll have to wait until after a Python packaging system is up and running.

(For this bit of code in particular, I think scipy.fftpack.dct and scipy.fftpack.dst might be worth looking at, but it'd take me a while to work out exactly how to map those functions onto this)

@LiamPattinson
Copy link
Contributor Author

Convert reused scripting code to new functions

e.g. input file reader in eikcoefs_final.py and profile_plotter.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant