-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Patches for clean Fedora RPM builds that pass unit tests #123
base: master
Are you sure you want to change the base?
Conversation
Having version.hpp throw an exception when built against a malformed version number (due to, eg, not having the git tag available in an automated package build from source bundle) results in _init_ failing during an otherwise apparently successful dlopen(), as well as the netgen-mesher executable crashing before hitting main(). This patch both forces an abort and prints a helpful message alerting the builder/user that something went wrong instead of handing over either a coredump or a weirdly malfunctioning library with no apparent cause. Hopefully this will avoid a repeat of automated Fedora builds shipping nonfunctional Netgen RPMs for several years!
Add explicit defaults where missing for clarity. Support passing in NETGEN_GIT_VERSION for use when building, eg, RPM packages using rpkg, which requires building from an untagged source bundle, that is, not directly from a git checkout. Add option to choose build against internal or external Pybind11; add cmake module to find and configure against system Pybind. re-add versioning of shared library files. Add section for GCC-specific options setting (right now, to turn off harmless warnings). Warn if pybind11_stubgen is not found. Adjust stubgen paths to work when build is not in-place in the source tree.
The name of the 'netgen' executable collides with an older UNIX pcb trace routing application also named 'netgen'. Fedora, for this reason, renames this mesher package to 'netgen-mesher' and the executable, likewise, to 'netgen-mesher'. I propose the same change here. Ironically though, the current python module is already named 'netgen-mesher', which complicates loading of the module as 'netgen', often resulting in it being unfindable. For this reason, I similarly reverse the usage, renaming the python module from 'netgen-mesher' to 'netgen'.
Adjust path setup of the python modules in __init__ such that it's possible to run pytests from a mock root during package build/test. Also add checks to pytest/CMakelists.txt to make sure pytest abd check are present instead of failing with an inscrutible traceback if they're not installed or usable.
Hardened toolchains (eg, mainline Fedora) no longer tolerate null derefs that were silently trapped/ignored in earlier versions. This eliminates a nullptr deref in archive.hpp that was failing several unit tests (and causing crashes) when trying to manipulate default-constructed archive objects.
A default-constructed (or just empty) ngarray will have 'ownmem' set despite not having an allocated data array. Destructor would then trigger an abort.
Don't delete memory that wasn't allocated. Similarly, we can't deref the first element of an array that was never allocated.
Several occurrences of int -> cgsize_t
Add DLL_HEADER to Surface, Primitive, and STLTopology in order to make typeinfo available to link stage (otherwise, link fails with GNU ld).
I'm uncertain about earlier/alternate versions, but the current version of mainline pybind11_stubgen requires a number of changes in the python binding cpp files to succeed: __repr__ implementation added for some basic types Added default descriptions via py::arg_v variant syntax for those cases where a __repr__ implementation made no sense (eg, for a default sonstructor of a complex type) Moved class declarations to precede first use in arguments/returns Added trailing default arguments, so that mandatory args did not follow optional args (without resorting to kwargs trickery).
Build all Netgen RPMs from cli via: rpkg local --spec package/fedora/netgen-mesher-bleed.spec from the toplevel source directory
if (ownmem) | ||
delete [] data; | ||
if (data) | ||
if (ownmem) | ||
delete [] data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change does not make any sense. When data is a nullptr, then the delete []
is a noop.
Dito below.
Maybe the actual problem was fixed with 5423242
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++ has somehow continued to make behavior of delete nullptr ambiguous, and gcc/glibc crash on this if they're built with object instance nullptr checking disabled. Unfortunately, that's now the default config on RedHat-style distros since F35.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an object instance nullptr (*this), but a regular delete []
.
If this breaks on Fedora, they are doing something wrong.
https://isocpp.org/wiki/faq/freestore-mgmt#delete-handles-null
Do I need to check for null before delete p?
No!
vstring = vstring.substr(1,vstring.size()-1); | ||
auto dot = vstring.find('.'); | ||
mayor_ = std::stoi(vstring.substr(0,dot)); | ||
if(dot == size_t(-1)) vstring = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compare with std::string::npos, instead of type-casting.
auto dot = vstring.find('.'); | ||
mayor_ = std::stoi(vstring.substr(0,dot)); | ||
if(dot == size_t(-1)) vstring = ""; | ||
else vstring = vstring.substr(dot+1, vstring.size()-dot-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
substr() returns up to the end when the second argument is omitted (i.e. uses the std::string::npos default argument).
@@ -0,0 +1,202 @@ | |||
# - Find python libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, no vendored and totally outdated cmake stuff.
Pybind11 supports cmake config mode for ages, and does not require any external helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a patch that has been making the rounds for years as part of distro-packaged netgen-- I agree, it's pretty heavy/ugly. I used it only because I knew it worked. I'll do something better.
The general problem with CMake is that it is mostly unaware that RedHat distributions exist at all, and there does not seem to be any release testing against RH distros. A lot of the trouble is RedHat's own fault, of course, and CMake authors have been very responsive to suggested fixes. But that does mean that a bleeding-edge CMake is needed to build things, so I have generally opted for workarounds if they're not too awful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #127
Do(&v[0], size); | ||
if(size > 0) // can't deref v[0] if it doesn't exist | ||
Do(&v[0], size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just v.data()
instead? Thats always valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just keeping the pattern. I'll change it.
This needs a rebase and general cleanup taking comments into account. I'll have an updated patchset soon after a little more discussion (eg, I name my experimental packages with 'bleed' and I had meant to undo that before submitting but forgot). |
memcpy (p, line.col, line.maxsize * elsize); | ||
delete [] (char*)line.col; | ||
|
||
if (line.size > 0 && line.col != NULL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is incorrect.
Via SetSize2(row, 5); SetSize2(row, 0);
, line.size becomes 0, but that does not mean the backing memory should not be freed.
Though, the allocation/deallocation is considerably flawed. The memory pointed to by line.col must only be freed when it does not point into oneblock. Unfortunately, that is not easy to check, as there is no end pointer.
With github, this is typically noted by prefixing the PR title with |
WIP: added to both PRs, as both will need more revision/discussion. The biggest cause of latency on my side is looking into the nature of the nullptr checking changes in GCC and Glibc recently. In fact there has been some very deep churn there and I believe you're correct that something was very much not right in F35 (and may have been corrected already in F36/37). |
@xiphmont - any plans to finish this? |
funny timing.... |
} catch(...) { | ||
std::cerr << "Malformed NETGEN_VERSION (" << save <<"\n"; | ||
std::cerr << "Micompiled/mispackaged Netgen library\n"; | ||
abort(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO this goes into the totally wrong direction. netgen is (also) a library, and libraries should not abort if possible. VersionInfo is also e.g. used by netgens Archive class, i.e. on user provided data, i.e. the error message would be incorrect and misleading.
If you want to catch these errors:
- Make the VersionInfo(string) constructor constexpr
- add a
static_assert(VersionInfo(NETGEN_VERSION) > VersionInfo("0"));
Fedora has had a bad run past few years of shipping nonfunctional or only partially functional netgen builds despite carrying a number of downstream patches. This is an effort to get Fedora (my native development platform) support up to parity with Ubuntu/Win/Mac.
The issues addressed here are varied and wide-ranging; I'd not be surprised if my approach at various points isn't considered the best option. However, it's a complete minimal cut to get working packages that build and pass internal tests, and a sort of base-minimum I needed to continue on other pieces (like SALOME reconciliation).
One thing that's especially bugging me: Netgen has been at 6.2 for a comparatively long time now, and I'm not sure any of the 'tweak' releases have actually been cross-abi compatible. Is it perhaps time to beak to 6.3, or maybe even 7? The internal and external changes seem to warrant a step of that magnitude.