Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better handling of problematic sets #119

Merged
merged 1 commit into from
Mar 30, 2017

Conversation

ScottMansfield
Copy link
Contributor

Previously, things like an out of memory error may mean that there is an inconsistent state in L1 after a set operation. On any kind of error during a set operation a delete will be sent to L1 afterwards and the operation will succeed even though L1 had an error.

This PR is the beginning of fixing #102

Previously, things like an out of memory error may mean that there is an inconsistent state in L1 after a set operation. On any kind of error during a set operation a delete will be sent to L1 afterwards and the operation will succeed even though L1 had an error.
@ScottMansfield
Copy link
Contributor Author

Did some more local testing on this in the mean time, got some good results for the set operation in l1l2:

rend_cmd_set_errors_l1                          42046
rend_cmd_set_errors_l2                          0
rend_cmd_set_errors_oom_l1                      0
rend_cmd_set_errors_oom_l2                      0
rend_cmd_set_errors_oom                         0
rend_cmd_set_errors                             0
rend_cmd_set_l1_error_delete_errors_l1          0
rend_cmd_set_l1_error_delete_hits_l1            3
rend_cmd_set_l1_error_delete_l1                 42046
rend_cmd_set_l1_error_delete_misses_l1          42043
rend_cmd_set_l1                                 597973
rend_cmd_set_l2                                 598053
rend_cmd_set_replace_errors_l1                  0
rend_cmd_set_replace_l1_error_delete_errors_l1  0
rend_cmd_set_replace_l1_error_delete_hits_l1    0
rend_cmd_set_replace_l1_error_delete_l1         0
rend_cmd_set_replace_l1_error_delete_misses_l1  0
rend_cmd_set_replace_l1                         0
rend_cmd_set_replace_not_stored_l1              0
rend_cmd_set_replace_stored_l1                  0
rend_cmd_set_success_l1                         597961
rend_cmd_set_success_l2                         597973
rend_cmd_set_success                            597961
rend_cmd_set                                    598053

@ScottMansfield
Copy link
Contributor Author

batch handler looks good too:

rend_cmd_set_errors_l1                          0
rend_cmd_set_errors_l2                          0
rend_cmd_set_errors_oom_l1                      0
rend_cmd_set_errors_oom_l2                      0
rend_cmd_set_errors_oom                         0
rend_cmd_set_errors                             0
rend_cmd_set_l1_error_delete_errors_l1          0
rend_cmd_set_l1_error_delete_hits_l1            0
rend_cmd_set_l1_error_delete_l1                 0
rend_cmd_set_l1_error_delete_misses_l1          0
rend_cmd_set_l1                                 0
rend_cmd_set_l2                                 343217
rend_cmd_set_replace_errors_l1                  1597
rend_cmd_set_replace_l1_error_delete_errors_l1  0
rend_cmd_set_replace_l1_error_delete_hits_l1    0
rend_cmd_set_replace_l1_error_delete_l1         1597
rend_cmd_set_replace_l1_error_delete_misses_l1  1597
rend_cmd_set_replace_l1                         343186
rend_cmd_set_replace_not_stored_l1              341589
rend_cmd_set_replace_stored_l1                  0
rend_cmd_set_success_l1                         0
rend_cmd_set_success_l2                         343186
rend_cmd_set_success                            343186
rend_cmd_set                                    343217

@ScottMansfield ScottMansfield merged commit b275c03 into master Mar 30, 2017
Copy link
Contributor

@vuzilla vuzilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will you make the same changes for Add?

if err == common.ErrKeyNotFound {
metrics.IncCounter(MetricCmdSetL1ErrorDeleteMissesL1)
} else if err != nil {
metrics.IncCounter(MetricCmdSetL1ErrorDeleteErrorsL1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this code branch return err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it depends on how optimistic we want to be. We could ignore any error here and then wait for the next request to fail, or we can fail requests when e.g. the connection has been severed (memcached has crashed). In the first case, we are returning success for an L2 success regardless of what happens in L1. If the latter case, we fail a request if L1 has catastrophically failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now, we're returning success, but really don't know what's going to happen on the next request.

I'm not sure what the type of errors we can expect on Delete operation. If it's only memcached crash/disconnect, I think returning error here should be safe.

@@ -615,7 +636,33 @@ func (l *L1L2Orca) Get(req common.GetRequest) error {

if err != nil {
metrics.IncCounter(MetricCmdGetSetErrorsL1)
return err

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change in Get may not be necessary since a Set failure has this consequences, which seem acceptable: (1) no record in memcached, (2) new record actually there, (3) some other thread concurrently had set the same key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a consequence because there may be inconsistency between L1 and L2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not returning an error is the right thing.

However, the Delete operation may not be necessary. The Get operation never modified L2. And presumably L1 get failed because it wasn't there. Inconsistency would only happen if there was some error that would cause the Set (after Get) to fail but in reality succeeded, while another concurrent modify operation had already completed. We don't handle the concurrent modify operation condition anyway, so this really only reacts to a more rare condition.

All this makes me wonder if Add operation should have been used instead of Set after the Get Miss.

if err == common.ErrKeyNotFound {
metrics.IncCounter(MetricsCmdSetReplaceL1ErrorDeleteMissesL1)
} else if err != nil {
metrics.IncCounter(MetricsCmdSetReplaceL1ErrorDeleteErrorsL1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this code branch return err?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer as above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants