-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
erpc:call
behavior is different on local and remote nodes
#8641
Comments
The documentation explicitly warns that no assumptions can be made about the process in which the function is called. There is no bug here. (We've been bitten by this ourselves, but that was our fault.) |
Running the function in the context of the calling process is quite unexpected for RPC. If there is one expectation one should be able to have it's that the code will execute in a separate process than the caller. As it stands it is unfortunately dangerous to use |
Mnesia uses this, i.e. it doesn't need to check that the table is on a local node when reading the ets table for example. |
It makes sense to want to fast-track if you know nothing can go wrong. But not otherwise. If in the executed function I open a file and have a crash after that, on the local node the file will remain open until the calling process exits. If I'm not aware of the subtleties of The fast-track is there for a good reason but it should be requested explicitly and the default should be safe not fast. The current default is surprising and error-prone. |
Both parties are somewhat right:
Generally, I'm inclined to agree with @juhlig and apply the suggested change, together with documentation. However it might be more complex in terms of compatibility, because it changes existing contract (existing users expect to "be bitten by that"). |
I don't think there is a one-size-fits-all definition of "safe" here. The "obvious" approach of always using a fresh process to call the target function and then terminate is not safe. I've seen rpcs cause failures exactly due to that. It all depends on what the target function does. So the options then would be to (1) add options to (e)rpc to provide additional guarantees about the target execution environment, or (2) you establish that in your target function (or a wrapper for it). (1) is doable but takes time to become widely available, (2) you can do today. |
There's no blocker for us, we just wanted to bring this up as it was surprising (even if it is documented). We can set a large enough integer timeout explicitly and avoid this issue.
It is both unusual... and the default! Unfortunately. |
I have no strong feelings about this matter, as @essen said, we can work around it one way or another. So, just a few IMOs and musings regarding the comments.
I know, however, I would not have taken the meaning of this paragraph as far as "... it may even be the calling process itself". The fact that you (and probably others) needed to be bitten to realize this shows that I'm not alone in this regard.
So Mnesia does make a very strong assumption about the process in which the function is called, right? This is pretty dangerous IMO, the circumstances in which the calling process is the process executing the function are undocumented.
I wouldn't say that there is a contract. Mnesia relies on a little-known internal fact here, which actually violates what the documentation urges, ie to not make any assumptions regarding the executing process.
It depends on what safeties you expect. A safety I would expect is that it is not possible that the function may accidentially corrupt or leak stuff into the calling process if the stars are right (ie, local node and
Indeed. Skimming the results of a quick grep, I found only 2 instances of |
The following note is present in the
This has also been the case for the legacy |
Ok, I stand corrected. |
This does't mean that it is a good thing 😐 |
Another option which would not break any existing code (like the linked PR #8642 could) would be to provide a way to force the call to be always executed in a separate process, @mikpe already pointed this out. This could be done by way of a new |
Cleaner than messing with "magic" timeout values would be to add an |
Of course. But that would put two semantics on |
Specifically...
|
You don't have to change But these are minor details that could be ironed out during review. It is however clear that there is a way to introduce new behaviour in |
Ok, sounds good. I'll change the linked PR accordingly. I'll go the |
@rickard-green, what does "a server" mean here? Is this some legacy in the documentation from the old |
I'm not Rickard, but there is or used to be a |
Good enough 😁 j/k
There still is, this seems to be the server involved in the old |
#8642 has been updated according to recent suggestions. |
Describe the bug
erpc:call
usually runs the given function in a spawned process. However, if the givenNode
is the local node and if the givenTimeout
isinfinity
(implicit inerpc:call/2
and/4
), an optimization is used that instead useserlang:apply
:otp/lib/kernel/src/erpc.erl
Lines 255 to 260 in d05de4c
otp/lib/kernel/src/erpc.erl
Lines 1267 to 1270 in d05de4c
Using
apply
means that the given function is executed in the context of the process callingerpc:call
, which can have a row of unintended consequences:trap_exit
flag, linking/monitoring (or unlinking/demonitoring) other processes, etc. All of this can not happen if the function executes in a separate process.To Reproduce
Exemplified by creating an
private
ets
table. The distinction between executing the function in the calling process (1> and 2>) vs a separate one (3> and 4>) is forced by the timeout.As can be seen at 2>, the
ets
table created in the call at 1> still exists, and is also accessible by the calling process. Conversely at 4>, theets
table created in the call at 3> is gone.Exemplifying the interruption of the function when the calling process crashes. A function sets a timer to kill itself after 1s, then uses
erpc:call
to execute a function which waits 2s before sendingdone
back to the shell.As can be seen at 3>, the
done
message never arrives when the calling process exits in the wait period of the function that was started at 2>. Conversely, at 5> we get thedone
message even though the calling process exited.Expected behavior
erpc
call should behave the same, no matter the node or timeout. Specifically, it should always execute the given function in a separate process.Affected versions
OTP 27, but probably going back all the way to OTP 23 when
erpc
was introduced.Additional context
The optimization seems pointless to me. When the function is executed at the local node, it already is fast, no need to try to shave off a few microseconds. Also, it only takes place when the given
Timeout
isinfinity
, which indicates that the caller is prepared to wait for however long it takes, not in a hurry.The text was updated successfully, but these errors were encountered: