-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prior PB operation timeout can trigger error log spam later [JIRA: RIAK-2111] #16
Comments
@slfritchie To me this would indicate that the timeout is not terminating the request FSM. |
Just noting that this is likely to eventually cause someone the same problems as seen by @joedevivo on basho/riak_test#79. Namely, this unhandled message may be received during an operation being handled by |
The timeout doesn't terminate the request FSM, no. It seems to me that you have to handle it anyway, though, since even if you're terminating it, there's a race there. PR#20 doesn't address Bryan's comment. |
This same exact issue is happening to me; sample message: [error] <0.20657.797>@riak_api_pb_server:handle_info:170 Unrecognized message {pipe_result,#Ref<0.0.842.47868>,0,[<<"ranktracker_summary_1499">>,<<"2013-01-02_248152">>]} Followed by hundreds of thousands of lines of: [error] <0.534.0>@riak_pipe_vnode:new_worker:766 Pipe worker startup failed:fitting was gone before startup |
updating to 1.2.1 should help with the last one, but it's good to know there's another path there. Note that that's typically a symptom of sending too many MR jobs for your cluster to handle. |
My whole cluster has been upgraded to 1.2.1 as of three weeks ago... |
My bad; the patch that I was thinking of will land in 1.3. |
On 1/7/13 5:13 PM, Evan Vigil-McClanahan wrote:
|
This can still occur -- we have not addressed the root problem which is incomplete termination (and race-conditions around there) of request FSMs. |
This is still occurring. The customer is using Riak 1.4.12 and Java client 1.4.4. Please see Zendesk ticket #11687 for more details. Example log output:
|
I'm not positive but I may have just experienced this with riak TS 1.3.1:
We are getting a bunch of these while load testing our cluster to tune our configuration (specifically the value for "timeseries_max_concurrent_queries"). |
If a prior PB operation times out, a late-arriving answer can trigger default clause
handle_info()
error log spam, e.g.We've a customer who's had intra-cluster communication problems (including
busy_dist_port
warnings) where 40 or more of these messages would be logged per second.The text was updated successfully, but these errors were encountered: