Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include MACOS arm64 compatibility in manage.sh #929

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jwaiton
Copy link
Collaborator

@jwaiton jwaiton commented Feb 19, 2025

Currently, when initialising IC with manage.sh on modern mac's (M1 onwards), the function install_conda() will fail as the installation is architecture specific.

This PR adds functionality that allows for differing versions of miniconda to be installed based on whether or not your architecture is x86_64 or arm64. This requires an update of miniconda from 4.9.2 to 4.12.0, but I found no compatibility issues with this change.

It should be noted that (in principle) this could be extended to allow for miniconda installation on linux-based arm64 machines as well, but this isn't implemented at the moment.

manage.sh Outdated
Comment on lines 60 to 74
case "$(uname -m)" in

x86_64)
export ARCH=x86_64
;;

arm64)
export ARCH=arm64
;;

*)
echo "Installation only supported on x86_64 and arm architecture"
exit 1
;;
esac
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two things:

  • you don't need to export the variable
  • this looks like you could simply do ARCH=$(uname -m) and then check that ARCH is either of the allowed values. We won't probably find any other architecture, but it doesn't hurt.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, i was just matching the formatting from the same function. I guess it could all be rewritten in this manner?

This would then also open it up to allow for arm64 architecture on linux (reads out as aarch64, which miniconda has a version for).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That other function needs to give a different value to CONDA_OS for macs. But it might be worth keeping the current format to allow for easy extension. I would shorten it, though, so it looks like the case statement at the end of the file.

This would then also open it up to allow for arm64 architecture on linux

Of course, if it comes for free...

@@ -0,0 +1,31 @@
# This workflow will install IC on arm based system.

name: Arm compatibility test
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try to extend the existing workflow (test_suite.yml) to include macos-latest? Perhaps the tests fail, in that case we should probably do something about it.

Copy link
Collaborator Author

@jwaiton jwaiton Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests do fail, but only because it doesn't recognise the conda environment, tries reinstalling it, and fails when it realises it already exists. You can see me trying it here. I could be wrong about the exact reasoning, but this was what I understood.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the problem is that conda is not set up when work_in_python_version is run for the second time1 as you can see from line 6:

manage.sh: line 77: conda: command not found

so it tries to install it again and then if finds that the miniconda folder already exists.

Footnotes

  1. first time is during installation, second time when executing the test suite

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, but trying to set up conda came with a new error as seen here. This is (as far as I understand) due to the userspace directories being configured slightly differently between linux and MACOSX. I'll try reconfiguring this to work now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've never noticed that. I don't see where this variable is set. I certainly don't have it set neither before nor after setting up conda... I guess those instructions are outdated, or designed exclusively for GHA.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll quickly test if sourcing it like so fixes it for macs

If it does, then it's just a matter of setting CONDA in a os-dependent manner.

I'm unsure how to fix these errors, they all appear to be quite similar though.

They are not trivial errors. I've seem this in macs, but I've haven't had the time or the energy to deal with it. I believe the root of the problem is that the interpolation is different between mac and linux (or perhaps between arm and x86). Then, the rest of the errors are consequences of that. Can you add aarch64 to see if we see the same failures?

Copy link
Collaborator Author

@jwaiton jwaiton Feb 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing this currently in this branch, GHA for linux on arm found here

EDIT: Same errors by the looks of it:

FAILED invisible_cities/cities/beersheba_test.py::test_beersheba_exact_result_with_satkill
FAILED invisible_cities/cities/beersheba_test.py::test_beersheba_exact_result[DeconvolutionMode.joint]
FAILED invisible_cities/cities/beersheba_test.py::test_beersheba_exact_result[DeconvolutionMode.separate]
FAILED invisible_cities/cities/isaura_test.py::test_isaura_exact - AssertionE...
FAILED invisible_cities/database/download_test.py::test_tables_exist[DEMOPPDB]
FAILED invisible_cities/reco/deconv_functions_test.py::test_interpolate_signal
FAILED invisible_cities/reco/deconv_functions_test.py::test_deconvolution_input_exact_result
FAILED invisible_cities/reco/deconv_functions_test.py::test_deconvolve - asse...
FAILED invisible_cities/reco/deconv_functions_test.py::test_richardson_lucy
====== 9 failed, 1269 passed, 15 skipped, 5 xfailed in 527.33s (0:08:47) =======

It must be a difference in some package wrt architecture, I can look into it at some point, should I open an issue? In the meantime, until I can figure out the issue, should we put the PR on hold? The functionality doesn't matter too much if IC can't then be used for certain tasks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To continue on from this, I'm pretty confident after looking at this error (specifically the array output):

        ref_interpolation = np.array([0.   , 0.   , 0.   , 0.   , 0.   , 0    , 0.   , 0.   , 0.   ,
                                      0.   , 0.   , 0.   , 0.   , 0.17 , 0.183, 0.188, 0.195, 0.202,
                                      0.201, 0.202, 0.188, 0.181, 0.168, 0.   , 0.   , 0.308, 0.328,
                                      0.344, 0.354, 0.362, 0.365, 0.357, 0.347, 0.326, 0.308, 0.   ,
                                      0.   , 0.5  , 0.531, 0.569, 0.585, 0.593, 0.592, 0.58 , 0.566,
                                      0.543, 0.514, 0.   , 0.   , 0.693, 0.751, 0.786, 0.825, 0.833,
                                      0.827, 0.816, 0.79 , 0.757, 0.703, 0.   , 0.   , 0.818, 0.886,
                                      0.924, 0.957, 0.965, 0.973, 0.963, 0.925, 0.882, 0.818, 0.   ,
                                      0.   , 0.82 , 0.884, 0.93 , 0.958, 0.969, 0.965, 0.954, 0.928,
                                      0.883, 0.818, 0.   , 0.   , 0.698, 0.752, 0.793, 0.819, 0.832,
                                      0.836, 0.826, 0.794, 0.752, 0.699, 0.   , 0.   , 0.509, 0.535,
                                      0.57 , 0.582, 0.6  , 0.592, 0.585, 0.568, 0.536, 0.51 , 0.   ,
                                      0.   , 0.311, 0.328, 0.347, 0.36 , 0.359, 0.361, 0.356, 0.343,
                                      0.328, 0.312, 0.0  , 0.0  , 0.169, 0.182, 0.196, 0.196, 0.201,
                                      0.197, 0.194, 0.192, 0.182, 0.169, 0.   , 0.   , 0.   , 0.   ,
                                      0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ])
    
        g = multivariate_normal((0.5, 0.5), (0.05, 0.5))
    
        points          = np.meshgrid(np.linspace(0, 1, 6), np.linspace(0, 1, 6)) # Coordinates where g is known
        points          = (points[0].flatten(), points[1].flatten())
        values          = g.pdf(list(zip(points[0], points[1]))) # Value of g at the known coordinates.
        n_interpolation = 12 # How many points to interpolate
        grid            = shift_to_bin_centers(np.linspace(-0.05, 1.05, n_interpolation + 1))# Grid for interpolation
    
        out_interpolation = interpolate_signal(values, points,
                                               [[-0.05, 1.05], [-0.05, 1.05]],
                                               [grid, grid],
                                               InterpolationMethod.cubic)
        inter_charge      = out_interpolation[0].flatten()
        inter_position    = out_interpolation[1]
    
>       assert np.allclose(ref_interpolation, np.around(inter_charge, decimals=3))
E       assert False
E        +  where False = <function allclose at 0xffa204933dc0>(array([0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,\n       0.   , 0.   , 0.   , 0.   , 0.17 , 0.183,...0.192, 0.182, 0.169, 0.   , 0.   , 0.   , 0.   ,\n       0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ]), array([0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,\n       0.   , 0.   , 0.   , 0.   , 0.174, 0.185,...0.189, 0.18 , 0.168, 0.   , 0.   , 0.   , 0.   ,\n       0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ]))
E        +    where <function allclose at 0xffa204933dc0> = np.allclose
E        +    and   array([0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,\n       0.   , 0.   , 0.   , 0.   , 0.174, 0.185,...0.189, 0.18 , 0.168, 0.   , 0.   , 0.   , 0.   ,\n       0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ]) = <function around at 0xffa204925700>(array([0.        , 0.        , 0.        , 0.        , 0.        ,\n       0.        , 0.        , 0.        , 0.      ...    0.        , 0.        , 0.        , 0.        , 0.        ,\n       0.        , 0.        , 0.        , 0.        ]), decimals=3)
E        +      where <function around at 0xffa204925700> = np.around

It seems to be due to something similar to what's discussed here, numpy operations differ between architectures enough that they can compound and produce not-insignificant differences.

Not sure whether to refactor the code to try and avoid these issues with the interpolation functions, make the tests more lenient, or something else. Do you have a preference?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a matter of not using enough precision, I would go for ensuring we use float64 in those functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this test, the interpolate_signal seems to already use np.float64 for its output from scipy.interpolate.griddata. I can't see where else the imprecision could come from in this specific case.

I'll hopefully have an arm setup to test more explicitly soon in my spare time (as creating arm VMs on x86 isn't working for me, annoyingly.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants