Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nexus handler error translation #1626

Merged
merged 3 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 96 additions & 0 deletions internal/internal_nexus_task_handler.go
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ import (
"go.temporal.io/api/common/v1"
nexuspb "go.temporal.io/api/nexus/v1"
"go.temporal.io/api/workflowservice/v1"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"

"go.temporal.io/sdk/converter"
"go.temporal.io/sdk/internal/common/metrics"
Expand Down Expand Up @@ -231,6 +233,8 @@ func (h *nexusTaskHandler) handleStartOperation(
},
}, nil, nil
}
// Default to expose details for now. We may make this configurable eventually.
err = convertServiceError(convertApplicationError(err), true)
var handlerErr *nexus.HandlerError
if errors.As(err, &handlerErr) {
return nil, nexusHandlerErrorToProto(handlerErr), nil
Expand Down Expand Up @@ -302,6 +306,8 @@ func (h *nexusTaskHandler) handleCancelOperation(ctx context.Context, nctx *Nexu
return nil, nil, ctx.Err()
}
if err != nil {
// Default to expose details for now. We may make this configurable eventually.
err = convertServiceError(convertApplicationError(err), true)
var handlerErr *nexus.HandlerError
if errors.As(err, &handlerErr) {
return nil, nexusHandlerErrorToProto(handlerErr), nil
Expand Down Expand Up @@ -416,3 +422,93 @@ func (p *payloadSerializer) Serialize(v any) (*nexus.Content, error) {
}

var emptyReaderNopCloser = io.NopCloser(bytes.NewReader([]byte{}))

// statusGetter represents Temporal serviceerrors which have a Status() method.
type statusGetter interface {
Status() *status.Status
}

// convertServiceError converts a serviceerror into a Nexus HandlerError if possible.
// If exposeDetails is true, the error message from the given error is exposed in the converted HandlerError, otherwise,
// a default message with minimal information is attached to the returned error.
// Roughly taken from https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto
// and
// https://github.com/grpc-ecosystem/grpc-gateway/blob/a7cf811e6ffabeaddcfb4ff65602c12671ff326e/runtime/errors.go#L56.
func convertServiceError(err error, exposeDetails bool) error {
Copy link
Member

@cretz cretz Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem like there's much value in splitting convertServiceError and convertApplicationError, might as well just have convertError. Also doesn't seem like exposeDetails parameter has any value since it is never set to false on this private function. Can wait for the future need to hide details before adding that functionality, it has no value currently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree there's not much value, for the split. Initially there was more logic in the application error conversion since I thought we may want to handle other TemporalErrors but I can merge.

I already know that we'll want to allow hiding details and had this logic in the server codebase (where I ported this status error conversion logic from) so I figured I'd keep it in here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restructured slightly.

var st *status.Status
var stGetter statusGetter
if !errors.As(err, &stGetter) {
// Not a serviceerror, passthrough.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the error is wrapping a service error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like only checking the outer one. I don't think we should treat errors the same just because they may have the error somewhere in their chain.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, switched to errors.As instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating what to do here but I think most users will probably wrap just to provide more context but will expect to maintain the semantics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but internal is retryable right? Feels like it should be an UnsuccessfulOperationError?

Yes, internal is retryable. What do you mean by "it"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"it" meaning the workflow error . If I have an operation handler that gets the result of a workflow and the workflow has failed wouldn't it make sense to not retry that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to make it non retryable but I couldn't use UnsuccessfulOperationError because that's only usable in Start and GetResult. It's going to be a bad request for lack of better outcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also decided to not use errors.As and only consider errors directly returned by client methods.
If users wrap the errors, they should decide what error type they want.
Let's get user feedback on this and verify it matches their expectation before we remove the experimental label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I would really like to get feedback from users here if this is helpful and behaves how they expect.

return err
}

st = stGetter.Status()
errMessage := err.Error()

switch st.Code() {
case codes.AlreadyExists, codes.Canceled, codes.InvalidArgument, codes.FailedPrecondition, codes.OutOfRange:
if !exposeDetails {
errMessage = "bad request"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeBadRequest, errMessage)
case codes.Aborted, codes.Unavailable:
if !exposeDetails {
errMessage = "service unavailable"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeUnavailable, errMessage)
case codes.DataLoss, codes.Internal, codes.Unknown:
if !exposeDetails {
errMessage = "internal error"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeInternal, errMessage)
case codes.Unauthenticated:
if !exposeDetails {
errMessage = "authentication failed"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeUnauthenticated, errMessage)
case codes.PermissionDenied:
if !exposeDetails {
errMessage = "permission denied"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeUnauthorized, errMessage)
case codes.NotFound:
if !exposeDetails {
errMessage = "not found"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeNotFound, errMessage)
case codes.ResourceExhausted:
if !exposeDetails {
errMessage = "resource exhausted"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeResourceExhausted, errMessage)
case codes.Unimplemented:
if !exposeDetails {
errMessage = "not implemented"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeNotImplemented, errMessage)
case codes.DeadlineExceeded:
if !exposeDetails {
errMessage = "request timeout"
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeDownstreamTimeout, errMessage)
}

// Default to internal error. This should only happen for codes.OK, which is unexpected for serviceerrors.
if !exposeDetails {
return nexus.HandlerErrorf(nexus.HandlerErrorTypeInternal, "internal error")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dead code. Can you wait until you need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather keep this because I know I will need it later when we add the ability to control the level of details exposed from handlers (and callers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no issue with the dead code, but It should be tested

}
return err
}

// convertApplicationError converts a Temporal ApplicationError to a Nexus HandlerError, respecting the non_retryable
// flag.
func convertApplicationError(err error) error {
var appErr *ApplicationError
if errors.As(err, &appErr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally the Go SDK explicitly does does not check the error chain for ApplicationError

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same with the Server

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... so yeah, we should be consistent here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this.

if appErr.NonRetryable() {
return nexus.HandlerErrorf(nexus.HandlerErrorTypeBadRequest, appErr.Error())
}
return nexus.HandlerErrorf(nexus.HandlerErrorTypeInternal, appErr.Error())
}
return err
}
57 changes: 55 additions & 2 deletions test/nexus_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ import (
historypb "go.temporal.io/api/history/v1"
nexuspb "go.temporal.io/api/nexus/v1"
"go.temporal.io/api/operatorservice/v1"
"go.temporal.io/api/serviceerror"

"go.temporal.io/sdk/client"
"go.temporal.io/sdk/interceptor"
Expand Down Expand Up @@ -184,6 +185,14 @@ var syncOp = temporalnexus.NewSyncOperation("sync-op", func(ctx context.Context,
}
case "handlererror":
return "", nexus.HandlerErrorf(nexus.HandlerErrorTypeBadRequest, s)
case "already-started":
return "", serviceerror.NewWorkflowExecutionAlreadyStarted("faking workflow already started", "dont-care", "dont-care")
case "retryable-application-error":
return "", temporal.NewApplicationError("fake app error for test", "FakeTestError")
case "non-retryable-application-error":
return "", temporal.NewApplicationErrorWithOptions("fake app error for test", "FakeTestError", temporal.ApplicationErrorOptions{
NonRetryable: true,
})
case "panic":
panic("panic")
}
Expand Down Expand Up @@ -213,9 +222,8 @@ func TestNexusSyncOperation(t *testing.T) {

w := worker.New(tc.client, tc.taskQueue, worker.Options{})
service := nexus.NewService("test")
require.NoError(t, service.Register(syncOp, workflowOp))
require.NoError(t, service.Register(syncOp))
w.RegisterNexusService(service)
w.RegisterWorkflow(waitForCancelWorkflow)
require.NoError(t, w.Start())
t.Cleanup(w.Stop)

Expand Down Expand Up @@ -263,6 +271,51 @@ func TestNexusSyncOperation(t *testing.T) {
}, time.Second*3, time.Millisecond*100)
})

t.Run("already-started", func(t *testing.T) {
_, err := nexus.ExecuteOperation(ctx, nc, syncOp, "already-started", nexus.ExecuteOperationOptions{})
var unexpectedResponseErr *nexus.UnexpectedResponseError
require.ErrorAs(t, err, &unexpectedResponseErr)
require.Equal(t, http.StatusBadRequest, unexpectedResponseErr.Response.StatusCode)
require.Contains(t, unexpectedResponseErr.Message, `"400 Bad Request": faking workflow already started`)

require.EventuallyWithT(t, func(t *assert.CollectT) {
tc.requireTimer(t, metrics.NexusTaskEndToEndLatency, service.Name, syncOp.Name())
tc.requireTimer(t, metrics.NexusTaskScheduleToStartLatency, service.Name, syncOp.Name())
tc.requireTimer(t, metrics.NexusTaskExecutionLatency, service.Name, syncOp.Name())
tc.requireCounter(t, metrics.NexusTaskExecutionFailedCounter, service.Name, syncOp.Name())
}, time.Second*3, time.Millisecond*100)
})

t.Run("retryable-application-error", func(t *testing.T) {
_, err := nexus.ExecuteOperation(ctx, nc, syncOp, "retryable-application-error", nexus.ExecuteOperationOptions{})
var unexpectedResponseErr *nexus.UnexpectedResponseError
require.ErrorAs(t, err, &unexpectedResponseErr)
require.Equal(t, http.StatusInternalServerError, unexpectedResponseErr.Response.StatusCode)
require.Contains(t, unexpectedResponseErr.Message, `"500 Internal Server Error": fake app error for test`)

require.EventuallyWithT(t, func(t *assert.CollectT) {
tc.requireTimer(t, metrics.NexusTaskEndToEndLatency, service.Name, syncOp.Name())
tc.requireTimer(t, metrics.NexusTaskScheduleToStartLatency, service.Name, syncOp.Name())
tc.requireTimer(t, metrics.NexusTaskExecutionLatency, service.Name, syncOp.Name())
tc.requireCounter(t, metrics.NexusTaskExecutionFailedCounter, service.Name, syncOp.Name())
}, time.Second*3, time.Millisecond*100)
})

t.Run("non-retryable-application-error", func(t *testing.T) {
_, err := nexus.ExecuteOperation(ctx, nc, syncOp, "non-retryable-application-error", nexus.ExecuteOperationOptions{})
var unexpectedResponseErr *nexus.UnexpectedResponseError
require.ErrorAs(t, err, &unexpectedResponseErr)
require.Equal(t, http.StatusBadRequest, unexpectedResponseErr.Response.StatusCode)
require.Contains(t, unexpectedResponseErr.Message, `"400 Bad Request": fake app error for test`)

require.EventuallyWithT(t, func(t *assert.CollectT) {
tc.requireTimer(t, metrics.NexusTaskEndToEndLatency, service.Name, syncOp.Name())
tc.requireTimer(t, metrics.NexusTaskScheduleToStartLatency, service.Name, syncOp.Name())
tc.requireTimer(t, metrics.NexusTaskExecutionLatency, service.Name, syncOp.Name())
tc.requireCounter(t, metrics.NexusTaskExecutionFailedCounter, service.Name, syncOp.Name())
}, time.Second*3, time.Millisecond*100)
})

t.Run("panic", func(t *testing.T) {
_, err := nexus.ExecuteOperation(ctx, nc, syncOp, "panic", nexus.ExecuteOperationOptions{})
var unexpectedResponseErr *nexus.UnexpectedResponseError
Expand Down
Loading