Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static-link Legion into compiled Regent programs #341

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

manopapad
Copy link
Contributor

@manopapad manopapad commented Jan 16, 2018

This is really a proof-of-concept. There's definitely a cleaner way to propagate linking information to the Regent compiler. Also, this won't pass the test suite as-is.

With these changes, circuit (with SAVEOBJ=1) and soleil-x (without a custom mapper) compile and run correctly, and without dynamic-linking liblegion_terra.so. I've tested this locally and on Titan.

I've tried to avoid dynamic-linking during compilation as well. This should only be necessary if you want to run the code on-the-fly instead of emitting an executable (which is why I do terralib.linklibrary in regentlib.start), or want to use functions from Legion during compilation (which is what happens in the test suite, and in -fdebug 1 mode, both of which use at least the timer code from Legion).

@lightsighter
Copy link
Contributor

Adding @streichler for review as well. I'm fine with this change as it seems like static linking libraries is necessary for some machines like Titan (although we've found we can cheat in some circumstances too).

@manopapad What prevents this from passing the test suite right now?

@elliottslaughter
Copy link
Contributor

Please don't merge this yet. There are two different things going on here, and I want to sort them out.

One issue is that Cray systems don't like dynamic libraries. In theory, I'm fine with static linking Regent applications. My main objection is that it doesn't actually fix @manopapad's issue; Cray doesn't like dynamic libraries but it does allow them, and I've been running extensively on Titan in this way.

The actual issue that is motivating this patch is a link error on Titan. The fix is the second part of this PR: dumping linker flags to a file in runtime.mk so that they can be read and used in Regent. Right now, the user remember to add these flags on their own when they call saveobj. This is obviously a poor user experience, and when you forget it leads to an especially bad error message (/usr/bin/ld: cannot find -llegion_terra). We've been fixing this in each Regent application one at a time, which clearly is the wrong answer.

For the moment, I think I can fix the second issue by teaching Regent's saveobj about the Cray flags that are needed to link correctly. Longer term, it would be nice to have a mechanism to keep these in sync, but I'd like feedback from @lightsighter and @streichler on how to accomplish this.

While tangentially related, it would also be nice to do some cleanup of the build process in general. E.g. if we could teach the Makefile to build a liblegion.so and librealm.so (which liblegion_terra.so can link to), that would give us parity between Make and CMake, and indirectly help lead us to being able to use Regent on CMake. And I suspect CMake might have an answer to these sorts of link flag issues, so it's worth thinking about this up front.

@lightsighter
Copy link
Contributor

For reference, for most Legion-only applications right now we actually already build liblegion.a and librealm.a (or liblegion.so and librealm.so) libraries and then link them agains the application code, so I think moving to a model where we have liblegion and librealm separate from liblegion_terra (maybe rename to libregent) should be just fine.

In terms of keeping things in-sync for linker flags, I at least would be alright adding some runtime support to provide a string of linker flags that were set when the runtime was compiled. This would stay consistent wherever a particular build of the runtime is used and would be updated appropriately whenever the runtime is rebuilt. I'm guessing this needs to be a Legion thing since it will need to know about both Legion and Realm libraries.

@manopapad
Copy link
Contributor Author

To summarize the discussion on this PR, here's a list of (orthogonal) improvements that have been suggested so far:

  1. Don't bundle liblegion and librealm in liblegion_terra/libregent, instead compile them as separate .so's: It sounds like nobody objects to this, and it looks easy to achieve.

  2. Don't dynamic-link to liblegion_terra.so during Regent compilation, unless absolutely necessary: The reason I suggested this is so we can support cross-compilation properly: if we never have to use liblegion_terra.so during Regent compilation, then it doesn't need to run on the compile node. If my understanding is correct, we can work around this on Titan because there exists some architecture that works for both the login and compute nodes. Of course we still have to link to liblegion_terra.so if the programmer wants to run the code on-the-fly. And we also need to link if we need to use Legion functionality during compilation, e.g. the timing code, which is used on -fdebug 1 and in the test suite. @elliottslaughter can comment on how easy it would be to remove such uses, or at least locate them in the Regent codebase, and do the loading of liblegion_terra.so on demand when those come up.

  3. Communicate link flags to the Regent compiler: Runtime.mk figures out what link flags to use on each system, and this information is also needed during Regent compilation. Right now the Regent compiler (or the programmer) has to duplicate runtime.mk's work to derive these flags. In the PR I just print the link flags out to a file, but @lightsighter is suggesting we instead bake them in to liblegion (presumably passing them in during compilation with -DLINK_FLAGS). I'd just like to note that this alternative conflicts with suggestion 2, as it would require always dynamic-linking with liblegion_terra.so.

  4. Support static-linking of Legion bindings into Regent applications: At this point I'm not blocked on this feature, since @elliottslaughter has helped me run with dynamic linking on Titan. The PR shows how you can change runtime.mk to support this, but it's up to you to decide if this is worth doing; maybe Titan is the only machine where dynamic linking is a problem, and we can provably work around Titan's limitations.

At this point I'm happy to reduce the scope of this PR to a subset of the above features, or close it and tackle each feature in a separate issue/PR. Awaiting instructions.

@lightsighter
Copy link
Contributor

  1. Yes, this seems like a completely natural thing to do and it should be fairly easy since most of our C++ only programs work like this right now.

  2. Having a stand-alone mode for the Regent compiler for more traditional "offline" compilation seems like it would be a nice feature for Regent. My guess is that it wouldn't be too hard to do. @elliottslaughter and @magnatelee will have to say for sure. At least when it comes to Titan, we definitely know how to build dynamic libraries and then link and run against them on the compute nodes. Really the only secret is that you have to copy the dynamic libraries into Lustre file system so that they are available on the compute nodes and then point the binaries at them when you start the job.

  3. Actually I was suggesting adding a call to the runtime interfaces that allows a complier to ask about Realm and Legion link flags so that they can be queried at runtime and therefore there is no need for an environment variable. However, as was noted, this would not work for option 2 if we do want to build a truly offline-only version of Regent.

  4. This tends to be an HPC only problem and to-date I've only seen it be required on Titan. Every other HPC machine we've used has used the same architecture on the login nodes as the compute nodes so cross-compilation hasn't been an issue. Unless we encounter more use cases I think the Titan cross-compilation program should be treated as a special case for now.

@elliottslaughter
Copy link
Contributor

Re: 3. I'd prefer that this be done via a constant or a define in legion_config.h. That way we can query it without needing to load a shared library. I.e. one of:

#define LEGION_LINK_FLAGS "..."

or

const char * const LEGION_LINK_FLAGS = "...";

We'll have to test which of these Terra actually supports.

@elliottslaughter
Copy link
Contributor

Re: 1. No, our C++ codes don't work like this right now. We create .as but not .sos. (That's why we went down this path of including legion.a and realm.a in liblegion_terra.so in the first place.) Anyway, it's not a huge amount of work and it would give us compatibility with CMake (which already is capable of doing this), so I'm in favor. My main request is that this not be done in a one-off way for Regent but in a way that works for our Makefile build system more generally. This would also get us closer to getting a working CMake build for Regent.

Re: 2. I'm fine with this and think the patch already basically solves this. We can rewrite the timing functions in Terra if necessary.

Re: 4. I think this gets easier if we factor out (3), so I'm also probably fine with this in practice.

Given the separate concerns, I'd be happier accepting one patch at a time. Part (2) can basically go now. The other parts are probably not that bad, but we want to make sure we do them right.

@elliottslaughter
Copy link
Contributor

I've been doing some refactoring of the code such that Regent now has support for CMake. With CMake, Regent already has support for (1). Item (3) would also be straightforward since CMake already generates a file called legion_defines.h. I'm not sure how much effort we want to put into the Makefile build system, but we can backport that if someone cares enough to do it.

@streichler
Copy link
Contributor

@elliottslaughter, what's the plan for this PR?

@elliottslaughter
Copy link
Contributor

I think pieces of this PR may still be useful, but we're probably not going to merge it as-is. Some of this would be easier if we were willing to declare CMake the official build system. As it is there will be additional hacks (beyond what is in this PR) needed to implement some of the proposed changes.

I've also thought about some of the points and think we should revisit some of the implementation details, but that can be discussed if we decide to move forward with this.

I think the bottom line is that it's not super high priority so no one's been giving it the time it needs to push all the way through.

@streichler
Copy link
Contributor

@elliottslaughter another poke on this - this PR appears to be bit-rotting. Do we want to clean it up, or close it and replace with a more general RFE, or just close it with prejudice?

@elliottslaughter
Copy link
Contributor

I guess I stand by what I said before. This PR really needs a champion to figure out what to keep and what to cut and what needs to be reworked. It's not high priority enough to have gotten much attention, but it's still valuable enough I think we wouldn't want to toss it completely. (Though, I suppose if it continues to bitrot we might eventually get to the point of tossing it and recreating it rather than slicing and rebasing or whatever.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants