-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of Free Events Error Handling #172
Comments
Thanks Neil;
Yes - we could. However, what we need to implement is the Cancelback
Protocol which does exactly as you say - it enables the reclaiming of event
memory that has been optimistically scheduled. See the attached IEEE TPDS
paper from 1997 by Das and Fujimoto :-).
The set of events you have to keep are those scheduled prior to the current
GVT but would not be executed until after GVT.
I believe the LLNL folks may have implemented a form of Cancelback in their
branched version of ROSS. It's at least a user level event retraction
capability which might be useful for implementing Cancelback. We can touch
base with them on the status of their implementation.
thanks again!!,
Chris
…On Fri, Jan 3, 2020 at 3:37 PM Neil McGlohon ***@***.***> wrote:
In its current state, when the simulation runs out of free event buffers,
ROSS throws an error suggesting increasing --extramem= and exiting the
simulation, requiring that the user try increasing this parameter and
restarting.
Would it be possible to instead force a premature GVT update at this point
to do some stale event recollection to see if this resolves the issue and
then resume the simulation?
There should still probably be some stdout warning about what happened so
that the user can know why their simulation is taking a lot of time if this
forced GVT update happens really frequently. Maybe make this an opt-in
feature via a command line argument so that a user who knows the risk of
turning their optimistic simulation into something potentially worse than
conservative if --extramem isn't set appropriately. But it might be
better than killing a potentially 10 hour long running simulation.
There will need to be a check to see if the time since the last GVT is 0
to prevent the endless loop of "Out of events, perform GVT to recollect,
still out of events, perform GVT to recollect..."
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#172?email_source=notifications&email_token=AAHVJE6PVJJWXZSFBDV2TILQ36OYVA5CNFSM4KCRU7O2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4ID6MDTA>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHVJE67IOGQZKZJ7REHTL3Q36OYVANCNFSM4KCRU7OQ>
.
--
----------------------------------------------------------------------------------------------
Christopher D. Carothers
Director, Center for Computational Innovations
Professor, Department of Computer Science
Rensselaer Polytechnic Institute
110 8th Street
Troy, New York 12180-3590
e-mail: [email protected]
web page: www.cs.rpi.edu/~chrisc <http://www.cs.rpi.edu/%7Echrisc>
phone: (518) 276-2930
fax: (518) 276-4033
----------------------------------------------------------------------------------------------
|
We (here at LLNL) are looking at lazy rollback. But we would be very interested in a cancelback impelmentation if you wanted to tackle that @nmcglohon 😄 |
I could probably knock it out not too long after my next paper deadline. Assigning to myself. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In its current state, when the simulation runs out of free event buffers, ROSS throws an error suggesting increasing
--extramem=
and exiting the simulation, requiring that the user try increasing this parameter and restarting.Would it be possible to instead force a premature GVT update at this point to do some stale event recollection to see if this resolves the issue and then resume the simulation?
There should still probably be some stdout warning about what happened so that the user can know why their simulation is taking a lot of time if this forced GVT update happens really frequently. Maybe make this an opt-in feature via a command line argument so that a user who knows the risk of turning their optimistic simulation into something potentially worse than conservative if
--extramem
isn't set appropriately. But it might be better than killing a potentially 10 hour long running simulation.There will need to be a check to see if the time since the last GVT is 0 to prevent the endless loop of "Out of events, perform GVT to recollect, still out of events, perform GVT to recollect..."
The text was updated successfully, but these errors were encountered: