Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyImath Bindings fail to build when using Clang on Windows (Release branch, 3.17, issue still present on Main) #324

Open
ZTKlein opened this issue May 31, 2023 · 10 comments

Comments

@ZTKlein
Copy link
Contributor

ZTKlein commented May 31, 2023

OS: Windows 10 22H2, 19045.2965
Tools:

  • bundled with Visual Studio 2022 Community Edition
    • CMake 3.26.0-msvc3
    • clang-cl 15.0.1
    • MSVC cl version 19.36.32532 for x64
  • Visual Studio Code (text editor)
    • CMake Tools

Libraries:

  • Boost 1.82.0 (compiled locally using cmake)
  • Python 3.10
  • Numpy 1.24.3

cmake configuration:

cmake --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:STRING=C:/lib/imath \
-DCMAKE_VERBOSE_MAKEFILE:BOOL=TRUE \
-DBUILD_TESTING:BOOL=TRUE \
-DBoost_ROOT:STRING=C:/lib/boost \
-DBoost_NO_BOOST_CMAKE:BOOL=FALSE \
-DBoost_USE_STATIC_LIBS:BOOL=FALSE \
-DBoost_USE_MULTITHREADED:BOOL=TRUE \
-DBoost_USE_STATIC_RUNTIME:BOOL=FALSE \
-DPYTHON:BOOL=TRUE \
-DCMAKE_EXPORT_COMPILE_COMMANDS:BOOL=TRUE \
-DCMAKE_C_COMPILER=clang-cl \
-DCMAKE_CXX_COMPILER=clang-cl \
-SD:/Code/lib/Imathfork \
-Bd:/Code/lib/Imathfork/build \
-G "Ninja Multi-Config"

I've tested using both the Ninja generator and the Visual Studio generator, as well as specifying cxx standards 11 and 17.

Using clang (or clang-cl) to compile the release branch (commit a4f9d5c) will error out on the python bindings. I've truncated the log to one of the relevant sections. Apologies if it's messy, it's from VS Code's output window.

[build] FAILED: src/python/PyImath/CMakeFiles/PyImath_Python3_10.dir/Release/PyImathAutovectorize.cpp.obj 
[build] "C:\PROGRA~1\Microsoft Visual Studio\2022\Community\VC\Tools\Llvm\x64\bin\clang-cl.exe"   -TP -DBOOST_CONTAINER_DYN_LINK -DBOOST_CONTAINER_NO_LIB -DBOOST_PYTHON_DYN_LINK -DBOOST_PYTHON_NO_LIB -DIMATH_DLL -DPYIMATH_BUILD -DPyImath_Python3_10_EXPORTS -DCMAKE_INTDIR=\"Release\" -ID:\Code\lib\Imath\build\src\python\PyImath -ID:\Code\lib\Imath\src\python\PyImath -ID:\Code\lib\Imath\src\Imath -ID:\Code\lib\Imath\build\config -imsvcC:\lib\boost\include\boost-1_82 -imsvcC:\Python310\include /clang:-march=native /MD /O2 /Ob2 /DNDEBUG /EHsc -std:c++17 /showIncludes /Fosrc\python\PyImath\CMakeFiles\PyImath_Python3_10.dir\Release\PyImathAutovectorize.cpp.obj /Fdsrc\python\PyImath\CMakeFiles\PyImath_Python3_10.dir\Release\ -c -- D:\Code\lib\Imath\src\python\PyImath\PyImathAutovectorize.cpp
[build] In file included from D:\Code\lib\Imath\src\python\PyImath\PyImathAutovectorize.cpp:8:
[build] In file included from D:\Code\lib\Imath\src\python\PyImath/PyImathAutovectorize.h:34:
[build] In file included from D:\Code\lib\Imath\src\python\PyImath/PyImathFixedArray.h:17:
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(43,20): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyAcquireLock(const PyAcquireLock& other) = delete;
[build]                    ^
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(44,36): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyAcquireLock & operator = (PyAcquireLock& other) = delete;
[build]                                    ^
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(45,20): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyAcquireLock(PyAcquireLock&& other) = delete;
[build]                    ^
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(46,36): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyAcquireLock & operator = (PyAcquireLock&& other) = delete;
[build]                                    ^
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(67,20): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyReleaseLock(const PyReleaseLock& other) = delete;
[build]                    ^
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(68,36): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyReleaseLock & operator = (PyReleaseLock& other) = delete;
[build]                                    ^
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(69,20): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyReleaseLock(PyReleaseLock&& other) = delete;
[build]                    ^
[build] D:\Code\lib\Imath\src\python\PyImath/PyImathUtil.h(70,36): error: attribute 'dllexport' cannot be applied to a deleted function
[build]     PYIMATH_EXPORT PyReleaseLock & operator = (PyReleaseLock&& other) = delete;
[build]                                    ^
[build] 8 errors generated.

The issue is tied to src/python/PyImath/PyImathUtil.h specifically, where PYIMATH_EXPORT is assigned to the deleted copy and move constructors/operators. Clang on Windows will provide a _MSC_VER definition, so this resolves to __declspec(dllexport) in PyImathExport.h. The issue is with the declarations of PyAcquireLock and PyReleaseLock.

This

class PyAcquireLock
{
  public:
    PYIMATH_EXPORT PyAcquireLock();
    PYIMATH_EXPORT ~PyAcquireLock();

    PYIMATH_EXPORT PyAcquireLock(const PyAcquireLock& other) = delete;
    PYIMATH_EXPORT PyAcquireLock & operator = (PyAcquireLock& other) = delete;
    PYIMATH_EXPORT PyAcquireLock(PyAcquireLock&& other) = delete;
    PYIMATH_EXPORT PyAcquireLock & operator = (PyAcquireLock&& other) = delete;
    
  private:
    PyGILState_STATE _gstate;
};

ends up compiling as

class PyAcquireLock
{
  public:
    __declspec(dllexport) PyAcquireLock();
    __declspec(dllexport) ~PyAcquireLock();

    __declspec(dllexport) PyAcquireLock(const PyAcquireLock& other) = delete;
    __declspec(dllexport) PyAcquireLock & operator = (PyAcquireLock& other) = delete;
    __declspec(dllexport) PyAcquireLock(PyAcquireLock&& other) = delete;
    __declspec(dllexport) PyAcquireLock & operator = (PyAcquireLock&& other) = delete;
    
  private:
    PyGILState_STATE _gstate;
};

Clang explicitly disallows dllexport to be assigned to deleted functions; while clang-cl in general attempts to mimic the behavior of MSVC, it would appear that it is rigid in this scenario, while MSVC will still compile. My understanding is that dllexports only apply to non-deleted functions, and as such it's not good practice to be declaring them on deleted functions regardless. Looking through the dumpbin on the generated library file from MSVC, it appears that MSVC's behavior is to not export the deleted functions (the constructors/destructors for PyAcquireLock and PyReleaseLock are still properly exported), implying it sees the declaration and ignores it for the deleted copy/move constructor/operator functions. Performing a diff on the dumpbins compiled using MSVC with and without the PYIMATH_EXPORT declarations on the deleted functions shows that output is equivalent.

Changing the code to

class PyAcquireLock
{
  public:
    PYIMATH_EXPORT PyAcquireLock();
    PYIMATH_EXPORT ~PyAcquireLock();

    PyAcquireLock(const PyAcquireLock& other) = delete;
    PyAcquireLock & operator = (PyAcquireLock& other) = delete;
    PyAcquireLock(PyAcquireLock&& other) = delete;
    PyAcquireLock & operator = (PyAcquireLock&& other) = delete;
    
  private:
    PyGILState_STATE _gstate;
};

works.

The next issue has to do with the exports declared in an anonymous namespace in PyImathStringTable.cpp -- these have internal linkage, and so Clang errors out because __declspec(dllexport) was used on functions without external linkage. I'm actually unsure as to how MSVC handles this case (I haven't checked a dump for these symbols), but I'm assuming that it's similar to the above scenario.

PyImathStringTable.cpp lines 91-94

namespace {
template class PYIMATH_EXPORT StringTableDetailT<std::string>;
template class PYIMATH_EXPORT StringTableDetailT<std::wstring>;
}

I'm currently working on a fork to fix compile errors when using Clang on Windows. I've managed to get the program to compile; I still need to run tests to verify that I didn't inadvertently break the python bindings. Before I propose the removal of PYIMATH_EXPORT being assigned to the deleted functions, I did want to ask if there's a reason for it to be there -- I'll admit I'm not very familiar with building libraries for sharing, and know little about how symbol visibility and exports are handled beyond what I've managed to look up in response to this error, or how __attribute__((visibility("default"))) behaves in environments where that is declared instead. If it wouldn't cause issues (and assuming it's not too minor for a PR), I'll submit a PR once I've got things working -- currently I'm fixing my PATH to include the required DLL's from boost for the python tests to run.

@ZTKlein
Copy link
Contributor Author

ZTKlein commented May 31, 2023

I'm having trouble getting the python binding tests to run; ctest is not properly setting up the environment to load all required dll's for the module. This is probably because I'm making use of python 3.10, and if I recall correctly python 3.8+ require some extra steps to load dll's that aren't in the same directory (it no longer checks PATH). That said, setting up the directories in a test script I wrote appears to properly load the module.

@kmilos
Copy link

kmilos commented Jun 1, 2023

See also #312 (comment) and further discussion. There is indeed something not quite right w/ the exports on Windows...

@ZTKlein
Copy link
Contributor Author

ZTKlein commented Jun 1, 2023

An update; I have managed to get the numpy test and python3 test to run (still no luck on PyImathTestC, I'll have to figure out what's up with that). PyImathTest_Python3 fails on M44

[ctest] Running testM44
[ctest] M44f
[ctest] ok
[ctest] M44d
[ctest] Traceback (most recent call last):
[ctest]   File "D:\Code\lib\Imathfork\src\python\PyImathTest\pyImathTest.in", line 10327, in <module>
[ctest]     test[1]()
[ctest]   File "D:\Code\lib\Imathfork\src\python\PyImathTest\pyImathTest.in", line 6323, in testM44
[ctest]     testM44x (M44d, V3d)
[ctest]   File "D:\Code\lib\Imathfork\src\python\PyImathTest\pyImathTest.in", line 6153, in testM44x
[ctest]     assert sInq.equalWithAbsError(s, sInq.baseTypeEpsilon())
[ctest] AssertionError

I'll probably go back through and add some prints to the test to figure out why that part of the test fails. Currently it appears that at that part of the test,

s = V3d(1, 2, 3)
sInq = V3d(1, 2, 3)
sInq.baseTypeEpsilon() = 2.220446049250313e-16

Currently in the process of figuring out why the assert is failing; I don't think it has to do with the base imath implementation, considering those tests pass.

@cary-ilm
Copy link
Member

Belated thanks for the investigation. The PYIMATH_EXPORT on both the deleted members and in the anonymous namespace are almost certainly oversights. If you have a PR you can submit for that, we'll happily accept it.

It looks like we need to re-examine the other exports in PyImath. We have a bunch of internal code that relies on this library, but we don't compile it with either clang or Windows, so I can't easily reproduce #312.

Regarding the failed assertion in testM44, could you also print out s-Sinq? I suspect it's close to baseTypeEpsilon but would like to see for sure.

@ZTKlein
Copy link
Contributor Author

ZTKlein commented Jun 30, 2023

Apologies for the large delay @cary-ilm; I've had a hectic couple of weeks. I'll go ahead and submit a PR, details on the changes will be in the request (not many lines were changed; I didn't spot any other exports that were clearly not needed. At the very least, no other lines caused errors).

Edit: PR #331 submitted. I closed #330 because I had to revise the commit to contain a signoff and use the correct e-mail.

Here's the values on the failed M44d test:

[ctest] M44d
[ctest] s: V3d(1, 2, 3)
[ctest] sInq: V3d(1, 2, 3)
[ctest] sInq.baseTypeEpsilon(): 2.220446049250313e-16
[ctest] s-sInq: V3d(0, -4.44089e-16, 0)
[ctest] s[1]: 2.0
[ctest] sInq[1]: 2.0000000000000004

It looks like the precision error is due to sInq[1] being the next represent-able double value after 2, which means the difference is going to be greater than the epsilon returned by the standard library's std::numeric_limits<double>::epsilon(). Is the intent of these tests to check that there is no precision error, or is it just to check that precision error is minimal? Interestingly, this doesn't seem to be an issue for the extracted shear, rotation, or translation vectors; ignoring the assert on s and sInq, all other asserts pass in PyImathTest_Python3.

I did manage to get PyImathTestC running -- it fails on Plane. I haven't had a chance to dig into that yet.

Should I open a separate issue for the failed tests?

@cary-ilm
Copy link
Member

cary-ilm commented Jul 3, 2023

Thanks for that analysis. I'm pretty sure the purpose of that test is to confirm that m.extractSHRT is performing the right calculation, not to test floating-point precision, although it's a little concerning that it fails on your machine and not on the many others it's been executed over the years. That file has a long history and has a variety of validation strategies, many of which are substantially looser, as in the use of the eps=10*FLT_EPSILON at top. I'm also not sure of the purpose behind the 2* in the comparison; the bounds on the shear and translation comparison are epsilon, but the scale and rotation are twice that, for no obvious reason. If you change the 2 to a 4, does the test pass? If so, I'm fine with that, can you just submit that as a PR? Thanks.

@ZTKlein
Copy link
Contributor Author

ZTKlein commented Jul 3, 2023

I can believe I'm an edge case on the tests; my environment has caused some strange behavior on other projects in the past. Turns out compiling with -march=native for a release build causes PyImathTestC to pass (that would be targeting Haswell feature set, I'm on an i7-4770k). The thought of digging into exactly which enabled target feature causes the difference is giving me a headache already; I'm inclined to just accept that a release build will pass tests and not worry about that further, since you've mentioned that the tests seem to pass fine on other machines. That said, I'd be willing to dig further in to see which section is failing on a build not compiled with target features. No such luck with PyImathTest itself, though. I'm going to double check what compilation flags are being enabled in my environment, though I'm fairly certain no unsafe fp optimizations are enabled.

The assert that fails is on line 6244 of pyImathTest.in, which I believe is the scale; it doesn't have a multiplier on the epsilon. I think I know which code you're referring to, though -- it looks like in the M33x tests there is a multiplier on the epsilon for the scale; perhaps a similar issue came up in the past and that extra wiggle room was added there, but the M44x test wasn't changed. Looking at the revision history it looks like both tests were added 12 years ago and this section hasn't been touched much since then; looks like I'm the lucky one where the difference became relevant, haha.

    # The M33x test
    b = m.extractSHRT(sInq, hInq, rInq, tInq)

    assert sInq.equalWithAbsError(s, 2 * sInq.baseTypeEpsilon())
    assert hInq.equalWithAbsError(h, hInq.baseTypeEpsilon())
    assert rInq.equalWithAbsError((-a, 0), 2 * rInq.baseTypeEpsilon())
    assert tInq.equalWithAbsError(t, tInq.baseTypeEpsilon())
    # The M44x test
    b = m.extractSHRT(sInq, hInq, rInq, tInq)

    assert sInq.equalWithAbsError(s, sInq.baseTypeEpsilon()) # line 6244, this is the assert that fails
    assert hInq.equalWithAbsError(h, hInq.baseTypeEpsilon())
    assert rInq.equalWithAbsError((0, 0, -a), 2 * rInq.baseTypeEpsilon())
    assert tInq.equalWithAbsError(t, tInq.baseTypeEpsilon())

Adding a multiplier to be 2*basetypeepsilon should work, since what I'm getting is the next representable value after 2 -- the difference should be exactly twice the base epsilon for double. I'll check in a bit and edit this comment afterwards.

edit: Yep. Adding a multiplier of two works fine. This has me thinking about how the floating point values are compared; I'll be posting a comment shortly on that. PR #333 submitted, one line change.

@ZTKlein
Copy link
Contributor Author

ZTKlein commented Jul 3, 2023

As a follow-up before I submit a PR to just change the test, a quick note:
Most of the tests seem to use baseTypeEpsilon() as the epsilon value, occasionally with a multiplier. Looking at where this is declared in the Imath files, these all return std::numeric_limits<T>::epsilon(). I'll admit I'm unfamiliar with best practicies regarding floating point comparisons in tests, but using the epsilon from <limits> doesn't seem very well-suited to comparisons for values above 1.0 if it's acceptable to have precision error. It returns the difference between 1.0 and the next representable value after 1.0. Each power of two left of the decimal is going to lead to a doubling of the difference between the integer values used in tests and the next representable double. To highlight what I mean, here's a quick c++ showcase:

#include <iostream>
#include <iomanip>
#include <limits>
#include <cmath>

using namespace std;

int main()
{
    double baseEp = std::numeric_limits<double>::epsilon();
    double deltaone = std::nextafter(1.0, 2.0) - 1.0;
    double deltatwo = std::nextafter(2.0, 3.0) - 2.0;
    double deltathree = std::nextafter(3.0, 4.0) - 3.0;
    double deltafour = std::nextafter(4.0, 5.0) - 4.0;
    double deltaseven = std::nextafter(7.0, 8.0) - 7.0;
    double deltaeight = std::nextafter(8.0, 9.0) - 8.0;
    cout << std::setprecision(17)
        << "numeric_limits<double>::epsilon(): " << baseEp << endl
        << "Delta after one: " << deltaone << endl
        << "Delta after two: " << deltatwo << endl
        << "Delta after three: " << deltathree << endl
        << "Delta after four: " << deltafour << endl
        << "Delta after seven: " << deltaseven << endl
        << "Delta after eight: " << deltaeight << endl
        << "Deltatwo / deltaone: " << deltatwo / deltaone << endl
        << "Deltafour / deltaone: " << deltafour / deltaone << endl
        << "Deltaeight / deltaone: " << deltaeight / deltaone << endl
        ;

    return 0;
}

The output of this is:

numeric_limits<double>::epsilon(): 2.2204460492503131e-16
Delta after one: 2.2204460492503131e-16
Delta after two: 4.4408920985006262e-16
Delta after three: 4.4408920985006262e-16
Delta after four: 8.8817841970012523e-16
Delta after seven: 8.8817841970012523e-16
Delta after eight: 1.7763568394002505e-15
Deltatwo / deltaone: 2
Deltafour / deltaone: 4
Deltaeight / deltaone: 8

This means that in the tests when checking values >= 2.0, unless there's a multiplier used in the test for the epsilon value, [object].equalWithAbsError([comparison], [object].baseTypeEpsilon()) is checking for exact floating point equivalence. In many tests this is fine -- what's being checked is equivalence on constructors or copies, or values checked are < 2.0. In many other cases, the operations don't cause precision errors in the first place. This isn't a big issue or anything, just something I figured I'd point out.

@ZTKlein
Copy link
Contributor Author

ZTKlein commented Jul 3, 2023

@cary-ilm I believe I understand why the bounds are raised for scale but not for the shear or the translation, looking at the matrix the vectors are extracted from and the values being compared. This is for M44x tests, but the same holds true for the M33x tests where you saw the difference in bounds.

    s = Vec(1, 2, 3)
    h = Vec(0.5, 1, 0.75)
    a = pi/4
    t = Vec(4, 5, 6)

    ...

    # this isn't in the code, but this is the value of m before extraction
    m = M44d((0.70710678118654757, -0.70710678118654757, 0, 0), 
        (2.1213203435596428, 0.70710678118654757, 0, 0),
        (3.7123106012293747, -0.5303300858899106, 3, 0),
        (4, 5, 6, 1))

S contains values >= 2.0 but < 4.0, so multiplying the baseTypeEpsilon value by 2 allows for bounds up to one representable value away. The values for shear are all less than 2.0, so this multiplier isn't needed. The values expected for translation are unaltered, and there shouldn't be precision error introduced, so it's fine to check for exact equivalence. I am mildly surprised to see that there's a multiplier for the rotation, since the comparison is on values less than one; I wouldn't expect there to be error greater than one value away.

@cary-ilm
Copy link
Member

cary-ilm commented Jul 4, 2023

Thank you for that analysis and the fix. I'm not all that familiar with the history of the various sections of that test, many people have contributed to it over the years, but it certainly looks like many of the comparisons aren't especially scientific or precise, both the ones that are looser than necessary as well as in the case of your failure, too tight. But it's certainly appropriate for the M33 and M44 tests to be similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants