Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminator chaos #1788

Merged
merged 1 commit into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,44 @@
# Release 0.33.0

# What's New

* SDK Terminator stability improvements
* Minor feature updates and bug fixes

## SDK Terminator stability improvements

This release was focused on creating a chaos test for SDK terminators, running it and fixing any issues found.
The test repeatedly and randomly restarts the controller, routers and tunnelers then verifies that terminators
end up in the correct state.

The following tools were also used/added to aid in diagnosing and fixing issues:

* `ziti fabric validate router-sdk-terminators`
* Compares the controller state with the router state
* `ziti fabric validate terminators`
* Checks each selected terminator to ensure it's still valid on the router and/or sdk
* `ziti fabric inspect sdk-terminators`
* Allows inspecting each routers terminator state
* `ziti fabric inspect router-messaging`
* Allows inspecting what the controller has queued for router state sync and terminator validations
* `ziti edge validate service-hosting`
* Shows how many terminators each identity which can host a service has

Several changes were made to the terminator code to ensure that terminators are properly created and cleaned up.
The routers now use an adaptive rate limiter to control how fast they send terminator related requests to the
controller. For this to work properly, the rate limiting on the controller must be enabled, so it can report
back to the routers when it's got too much work.

## Component Updates and Bug Fixes

* github.com/openziti/edge-api: [v0.26.10 -> v0.26.12](https://github.com/openziti/edge-api/compare/v0.26.10...v0.26.12)
* github.com/openziti/ziti: [v0.32.2 -> v0.33.0](https://github.com/openziti/ziti/compare/v0.32.2...v0.33.0)
* [Issue #1794](https://github.com/openziti/ziti/issues/1794) - Add SDK terminator chaos test and fix any bugs found as part of chaos testing
* [Issue #1369](https://github.com/openziti/ziti/issues/1369) - Allow filtering by policy type when listing identities for service or services for identity
* [Issue #1204](https://github.com/openziti/ziti/issues/1204) - ziti cli identity tags related flags misbehaving
* [Issue #987](https://github.com/openziti/ziti/issues/987) - "ziti create config router edge" doesn't know about --tunnelerMode proxy
* [Issue #652](https://github.com/openziti/ziti/issues/652) - Update CLI script M1 Support when github actions allows

# Release 0.32.2

## What's New
Expand Down
4 changes: 4 additions & 0 deletions common/ctrl_msg/messages.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,10 @@ const (
CreateCircuitRespCircuitId = 11
CreateCircuitRespAddress = 12
CreateCircuitRespTagsHeader = 13

HeaderResultErrorCode = 10

ResultErrorRateLimited = 1
)

func NewCircuitSuccessMsg(sessionId, address string) *channel.Message {
Expand Down
18 changes: 18 additions & 0 deletions common/handler_common/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package handler_common
import (
"github.com/michaelquigley/pfxlog"
"github.com/openziti/channel/v2"
"github.com/openziti/ziti/common/ctrl_msg"
"time"
)

Expand Down Expand Up @@ -39,3 +40,20 @@ func SendOpResult(request *channel.Message, ch channel.Channel, op string, messa
log.WithError(err).Error("failed to send result")
}
}

func SendServerBusy(request *channel.Message, ch channel.Channel, op string) {
log := pfxlog.ContextLogger(ch.Label()).WithField("operation", op)
log.Errorf("%v error performing %v: (%s)", ch.LogicalName(), op, "server too busy")

response := channel.NewResult(false, "server too busy")
response.ReplyTo(request)
response.Headers.PutUint32Header(ctrl_msg.HeaderResultErrorCode, ctrl_msg.ResultErrorRateLimited)
if err := response.WithTimeout(5 * time.Second).SendAndWaitForWire(ch); err != nil {
log.WithError(err).Error("failed to send result")
}
}

func WasRateLimited(msg *channel.Message) bool {
val, found := msg.GetUint32Header(ctrl_msg.HeaderResultErrorCode)
return found && val == ctrl_msg.ResultErrorRateLimited
}
41 changes: 41 additions & 0 deletions common/inspect/router_message_inspections.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
Copyright NetFoundry Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package inspect

type RouterMessagingState struct {
RouterUpdates []*RouterUpdates `json:"routerUpdates"`
TerminatorValidations []*TerminatorValidations `json:"terminatorValidations"`
}

type RouterInfo struct {
Id string `json:"id"`
Name string `json:"name"`
}

type RouterUpdates struct {
Router RouterInfo `json:"router"`
Version uint32 `json:"version"`
ChangedRouters []RouterInfo `json:"changedRouters"`
SendInProgress bool `json:"sendInProgress"`
}

type TerminatorValidations struct {
Router RouterInfo `json:"router"`
Terminators []string `json:"terminators"`
CheckInProgress bool `json:"checkInProgress"`
LastSend string `json:"lastSend"`
}
39 changes: 39 additions & 0 deletions common/inspect/terminator_inspections.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
/*
Copyright NetFoundry Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package inspect

type SdkTerminatorInspectResult struct {
Entries []*SdkTerminatorInspectDetail `json:"entries"`
Errors []string `json:"errors"`
}

type SdkTerminatorInspectDetail struct {
Key string `json:"key"`
Id string `json:"id"`
State string `json:"state"`
Token string `json:"token"`
ListenerId string `json:"listenerId"`
Instance string `json:"instance"`
Cost uint16 `json:"cost"`
Precedence string `json:"precedence"`
AssignIds bool `json:"assignIds"`
V2 bool `json:"v2"`
SupportsInspect bool `json:"supportsInspect"`
OperationActive bool `json:"establishActive"`
CreateTime string `json:"createTime"`
LastAttempt string `json:"lastAttempt"`
}
2 changes: 1 addition & 1 deletion common/pb/cmd_pb/cmd.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

69 changes: 44 additions & 25 deletions common/pb/ctrl_pb/ctrl.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions common/pb/ctrl_pb/ctrl.proto
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ message Terminator {
string id = 1;
string binding = 2;
string address = 3;
uint64 marker = 4;
}

message ValidateTerminatorsRequest {
Expand All @@ -130,6 +131,7 @@ message RouterTerminatorState {
bool valid = 1;
TerminatorInvalidReason reason = 2;
string detail = 3; // inspect info if valid
uint64 marker = 4;
}

message ValidateTerminatorsV2Response {
Expand Down
2 changes: 1 addition & 1 deletion common/pb/edge_cmd_pb/edge_cmd.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

33 changes: 19 additions & 14 deletions common/pb/edge_ctrl_pb/edge_ctrl.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions common/pb/edge_ctrl_pb/edge_ctrl.proto
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ enum CreateTerminatorResult {
FailedIdConflict = 1;
FailedOther = 2;
FailedBusy = 3;
FailedInvalidSession = 4;
}

message CreateTerminatorV2Response {
Expand Down
Loading
Loading