Releases: temporalio/temporal
v1.25.0
Schema changes
Before upgrading your Temporal Cluster to v1.25.0, you must upgrade your core and visibility schemas to the following:
- Core:
- MySQL schema v1.14
- PostgreSQL schema v1.14
- Cassandra schema v1.11
- Visibility:
- Elasticsearch schema v7
- MySQL schema 1.6
- PostgreSQL schema v1.6
Please see our upgrade documentation for the necessary steps to upgrade your schemas.
Release Highlights
1. Nexus
Nexus RPC is an open-source service framework for arbitrary-length operations whose lifetime may extend beyond a traditional RPC. It is an underpinning connecting durable executions within and across namespaces, clusters and regions – with an API contract designed with multi-team collaboration in mind. A service can be exposed as a set of sync or async Nexus operations – the latter provides an operation identifier and a uniform interface to get the status of an operation or its result, receive a completion callback, or cancel the operation.
Temporal uses the Nexus RPC protocol to allow calling across namespace and cluster boundaries. The Go SDK Nexus proposal explains the user experience and shows sequence diagrams from an external perspective.
Read more here on how to enable Nexus and how to operate it here.
2. Workflow Update (public preview)
Workflow Update enables a gRPC client of a Workflow Execution to issue requests to that Workflow Execution and receive a response. These requests are delivered to and processed by a client-specified Workflow Execution. Updates are differentiated from Queries in that the processing of an Update is allowed to modify the state of a Workflow Execution. Updates are different from Signals in that an Update returns a response.
Any gRPC client can invoke Updates via the WorkflowService.UpdateWorkflowExecution
. Additionally, past Update requests can be observed via the WorkflowService.PollWorkflowExecutionUpdate
API. The wait stage option determines whether they respond once the Update was accepted or completed.
Note that an Update only becomes durable when it was accepted, until then, it will not appear in the Workflow history. SDKs will automatically retry to ensure Update requests complete.
The execution and retention of Updates is configured via two optional dynamic configuration values:
history.maxTotalUpdates
controls the total number of Updates that a single Workflow Execution can support. The default is 2000.history.maxInFlightUpdates
controls the number of Updates that can be “in-flight” (that is, concurrently executing, not having completed) for a given Workflow Execution. The default is 10.
Since the 1.21 release, the feature was heavily tested and several bug fixes as well as performance optimizations were made.
You can find more information at this link.
3. Host level MutableState cache
The MutableState
cache has been updated to operate as a host-level cache by default. Previously, this cache was managed at the shard level, with each shard cache holding 512 MutableState
entries. Now, the host-level cache, enabled by default (history.enableHostHistoryCache = true
), will be shared across all shards on a given host.
The size of the host-level cache is controlled by the history.hostLevelCacheMaxSize
configuration, which is set to 128,000 entries by default. This change may impact the memory usage of the history service, but it can be adjusted by modifying the history.hostLevelCacheMaxSize
value.
4. Visibility Enhancement
Enhanced the Nexus CLI to support query filtering for the schedule list
command. The --query
or -q
(string) option allows filtering of results using a specified list filter.
5. Task Queue Statistics
Provide stats for Task Queue backlogs to be used for worker scaling decisions.
User DescribeTaskQueue
API in enhanced mode (with report_stats=true
) to get the following info about the Task Queue:
- Approximate backlog count
- Approximate backlog age
- Approximate rate of adding tasks to the task queue
- Approximate rate of dispatching tasks from the task queue
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release
Server (use the tag 1.25.0
)
Server With Auto Setup ([what is Auto-Setup?] (https://docs.temporal.io/blog/auto-setup)) (use the tag 1.25.0
)
Admin-Tools (use the tag 1.25.0-tctl-1.18.1-cli-1.0.0
)
Full Changelog: v1.24.0...v1.25.0
v1.24.2
What's Changed
- Acquire workflow lock during backfill history by @yux0 in #6102
- Next retry delay should not be bounded by retry policy MaximumInterval by @gow in #6063
- Handle NextRetryDelay option in workflow failures by @gow in #5946
- Remove build id from sticky recordWorkflowTaskStarted by @ShahabT in #6096
- Scheduler Bugfix: getFutureActionTimes should ignore actions prior to update time, respect RemainingActions counter by @lina-temporal in #6122
Full Changelog: v1.24.1...v1.24.2
v1.24.1
Schema changes
If you are using SQL based visibility, before upgrading your Temporal Cluster to v1.24.0, you must upgrade your visibility schemas to the following:
- MySQL schema 1.6
- PostgreSQL schema v1.6
Please see our upgrade documentation for the necessary steps to upgrade your schemas.
Release Highlights
This release contains schema fix for SQL based visibility introduced in 1.24.0
. For full set of new features please check v1.24.0 release notes.
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release
Server (use the tag 1.24.1
)
Server With Auto Setup (what is Auto-Setup?) (use the tag 1.24.1
)
Admin-Tools (use the tag 1.24.1-tctl-1.18.1-cli-0.12.0
)
Full Changelog: v1.24.0...v1.24.1
v1.24.0
Caution
This release introduces a bug in SQL visibility. Please DO NOT use it if you are using SQL-based (PostgreSQL, MySQL, or sqlite) visibility. Elasticsearch based visibility is not affected. Update directly to v1.24.1.
Schema changes
Before upgrading your Temporal Cluster to v1.24.0, you must upgrade your core and visibility schemas to the following:
- Core:
- MySQL schema v1.12
- PostgreSQL schema v1.12
- Cassandra schema v1.10
- Visibility:
- Elasticsearch schema v7
- MySQL schema 1.5
- PostgreSQL schema v1.5
Please see our upgrade documentation for the necessary steps to upgrade your schemas.
Breaking changes
Standard Visibility
As planned, standard visibility is no longer supported in this version. Please, upgrade to advanced visibility as well as the config keys to setup visibility before upgrading to this version. Refer to v1.20.0 release notes for upgrade instructions, and also check the v1.21.0 release notes for config key changes.
Note that you also need to update the plugin name for the main store, ie., if you are using mysql
plugin name for the main store, then you need to change to mysql8
. Similarly, if it's postgres
, then change it to postgres12
.
Deprecation Announcements
Worker Versioning APIs [Experimental]
The following changes were made to Worker Versioning APIs:
- Deprecated
UpdateWorkerBuildIdCompatibility
in favor of the newUpdateWorkerVersioningRules
API. - Deprecated
GetWorkerBuildIdCompatibility
in favor of the newGetWorkerVersioningRules
API. - Deprecated
GetWorkerTaskReachability
in favor ofDescribeTaskQueue
enhanced mode (api_mode=ENHANCED
)
Together with the old APIs, the Version Set concept is also deprecated and replaced with “versioning rules” which are more powerful and flexible. More details can be found in https://github.com/temporalio/api/blob/master/temporal/api/taskqueue/v1/message.proto#L153.
For using these experimental APIs you need to enable the following configs:
frontend.workerVersioningRuleAPIs
frontend.workerVersioningWorkflowAPIs
Release Highlights
Namespace Burst Ratio
Removing frontend.namespaceBurst
and adding frontend.namespaceBurstRatio
config. Similarly replacing frontend.namespaceBurst.visibility
and frontend.namespaceBurst.namespaceReplicationInducingAPIs
with frontend.namespaceBurstRatio.visibility
and frontend.namespaceBurstRatio.namespaceReplicationInducingAPIs
.
The old values are used to specify the burst rate as number of requests per second. New values will specify burst as a ratio of their respective RPS limit. This ratio will be applied to calculated RPS limit from global and per-instance rate limits.
Visibility: Parent workflow execution
We added two new system search attributes: RootWorkflowId
and RootRunId
. If you have previously created custom search attributes with one of these names, attempts to set them will start to fail. We suggest updating your workflows to not set those search attributes, delete those search attributes and then upgrade Temporal to this version. Alternatively, you can also set the dynamic config system.supressErrorSetSystemSearchAttribute: true
. When this dynamic config is set to true
, your workflow will not fail when trying to set a value on a system search attribute, and it will ignore your input for those system search attributes.
OpenAPI HTTP API Documentation
OpenAPI v2 docs are served at /api/v1/swagger.json
while v3 is at /api/v1/openapi.yaml
when our HTTP API is enabled.
Shard Info Update Optimizations
Operators can now configure how often we update shard info (tracking how many tasks have been acked, etc). This improves recovery speed by persisting shard data more frequently.
This can be configured through the following dynamic config values:
history.shardUpdateMinTasksCompleted
- the minimum number of tasks which must be completed (across all queues) before the shard info can be updatedhistory.shardUpdateMinInterval
- the minimum amount of time between shard info updates unlessshardUpdateMinTasksCompleted
tasks have been acked
Note that once history.shardUpdateMinInterval amount
of time has passed we'll update the shard info regardless of the number of tasks completed
Interpolate MySQL query parameters by default (#5428)
We now interpolate parameters into queries client-side for MySQL main databases but not visibility.
When interpolateParams
is false (the default) the driver will prepare parameterized statements before executing them, meaning we need two round-trips to the database for each query. By setting interpolateParams
to true the DB driver will handle interpolation and send the query just once to the database, halving the number of round trips necessary. This should improve the performance of all Temporal deploys using MySQL.
OpenTelemetry env variables
Support for enabling OpenTelemetry for tracing gRPC requests via environment variables. See develop/docs/tracing.md
for details.
Matching task queue handover
Various improvements were made to task queue handover when adding/removing/restarting matching nodes. This should improve tail latency for task dispatch during those situations. To enable the improvements, operators should set the dynamic config matching.alignMembershipChange
to a value like 10s
after fully deploying v1.24 to the entire cluster. This may become the default in future versions.
UTF-8 validation in protobuf messages
When we migrated Temporal from the deprecated gogoproto fork of Google’s protobuf library to the official version in v1.23, we disabled protobuf’s default utf-8 validation to ensure a smooth deployment, since gogoproto did not validate fields for utf-8, and turning on validation immediately would have broken applications that accidentally used invalid utf-8.
This was a temporary measure and we will eventually re-enable validation. As the first step, we’ve added tools to detect and warn about invalid utf-8 without breaking applications. There are two sets of dynamic config settings to use.
The sample
settings are set to a floating point value between 0.0 and 1.0 (default 0.0), and control what proportion of either RPC requests, responses, or data read from persistence, is validated for utf-8 in strings. If invalid utf-8 is found, warnings will be sent to logs, and the counter metric utf8_validation_errors
will be incremented.
The fail
settings (boolean, default false) control whether a validation error will be turned into a RPC failure or data corruption error.
system.validateUTF8.sample.rpcRequest
system.validateUTF8.sample.rpcResponse
system.validateUTF8.sample.persistence
system.validateUTF8.fail.rpcRequest
system.validateUTF8.fail.rpcResponse
system.validateUTF8.fail.persistence
If you think your application may be using invalid utf-8, we suggesting turning on the sample settings without the fail settings and running for a while. In a future version, validation and errors will be turned on by default (effectively sample set to 1.0 and fail set to true).
admin-tools
docker image versioning
We separated admin-tools
docker image release process. Version tag now includes versions of tctl
(deprecated but still supported CLI) and temporal
(modern CLI) binaries. This image is released every time whenever new version of any of these component is released. Current latest tag is 1.24.0-tctl-1.18.1-cli-0.12.0.
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release
Server (use the tag 1.24.0
)
Server With Auto Setup (what is Auto-Setup?) (use the tag 1.24.0
)
Admin-Tools (use the tag 1.24.0-tctl-1.18.1-cli-0.12.0
)
Full Changelog: v1.23.1...v1.24.0
v1.23.1
Release Highlights
- Dependencies version upgrade for addressing security vulnerabilities
- Bug fixes for Schedule and replication
All Changes
2024-04-30 - fad6bdc - Bump Server version to 1.23.1
2024-04-26 - 99b6e0c - Fix schedule workflow to CAN after signals (#5799)
2024-04-26 - 1bb03b7 - Update dependencies (#5789)
2024-04-26 - 9701ef0 - Recalculate schedule times from previous action on update (#5381)
2024-04-26 - dd4323a - Handle data corruption in history resend (#5398)
2024-04-26 - 9b1981c - Do schedule backfills incrementally (#5344)
2024-04-26 - a520df2 - Use proto encoding for scheduler workflow next time cache (#5277)
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release (use the tag 1.23.1
)
Server
Server With Auto Setup (what is Auto-Setup?)
Admin-Tools
Full Changelog: v1.23.0...v1.23.1
v1.22.7
Release Highlights
This release mitigates a problem where invalid UTF-8 data could be supplied to the history service, causing a denial of service
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release (use the tag 1.22.7
)
Server
Server With Auto Setup (what is Auto-Setup?)
Admin-Tools
Full Changelog: v1.22.6...v1.22.7
v1.21.6
Release Highlights
This release mitigates a problem where invalid UTF-8 data could be supplied to the history service, causing a denial of service
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release (use the tag 1.21.6
)
Server
Server With Auto Setup (what is Auto-Setup?)
Admin-Tools
Full Changelog: v1.21.5...v1.21.6
v1.20.5
Release Highlights
This release mitigates a problem where invalid UTF-8 data could be supplied to the history service, causing a denial of service
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release (use the tag 1.20.5
)
Server
Server With Auto Setup (what is Auto-Setup?)
Admin-Tools
Full Changelog: v1.20.4...v1.20.5
v1.23.0
Release Highlights
Breaking Changes
github.com/gogo/protobuf has been replaced with google.golang.org/protobuf
We've fully replaced the use of gogo/protobuf with the official google protobuf runtime. This has both developmental and operational impacts as prior to Server version v1.23.0 our protobuf code generator allowed invalid UTF-8 data to be stored as proto strings. This isn't allowed by the proto3 spec, so if you're running a custom-built temporal server and think some tenant may store arbitrary binary data in our strings you should set -tags protolegacy
when compiling the server. If you use our Makefile
this is already done.
If you don't and see an error like grpc: error unmarshalling request: string field contains invalid UTF-8
then you will need to enable this when building the server. If you're unsure then you should specify it anyways as there's no harm in doing so unless you relied on the protobuf compiler to ensure all strings were valid UTF-8.
Developers using our protobuf-generated code will notice that:
time.Time
in proto structs will now be[timestamppb.Timestamp](https://pkg.go.dev/google.golang.org/[email protected]/types/known/timestamppb#section-documentation)
time.Duration
will now be[durationpb.Duration](https://pkg.go.dev/google.golang.org/protobuf/types/known/durationpb)
- V2-generated structs embed locks, so you cannot dereference them.
- Proto enums will, when formatted to JSON, now be in SCREAMING_SNAKE_CASE rather than PascalCase.
- If trying to deserialize old JSON with PascalCase to proto use
[go.temporal.io/api/temporalproto](https://pkg.go.dev/go.temporal.io/api/temporalproto)
- If trying to deserialize old JSON with PascalCase to proto use
- google/protobuf objects, or objects embedding these protos, cannot be compared using
reflect.DeepEqual
or anything that uses it. This includestestify
andmock
equality testers!- If you need
reflect.DeepEqual
for any reason you can usego.temporal.io/api/temporalproto.DeepEqual
instead - If you need
testify
require
/assert
compatible checkers you can use thego.temporal.io/server/common/testing/protorequire
, go.temporal.io/server/common/testing/protoassert packages - If you need matchers for gomock we have a helper under
go.temporal.io/server/common/testing/protomock
- If you need
New System Search Attributes
We added two new system search attributes: ParentWorkflowId
and ParentRunId
. If you have previously created custom search attributes with one of these names, attempts to set them will start to fail. We suggest updating your workflows to not set those search attributes, delete those search attributes and then upgrade Temporal to this version.
Alternatively, you can also set the dynamic config system.supressErrorSetSystemSearchAttribute
to true. When this dynamic config is set values for system search attributes will be ignored instead of causing your workflow to fail. Please use this as temporary workaround, because it could hide real issue in users workflows.
Schema changes
Before upgrading your Temporal Cluster to v1.23.0, you must upgrade your core and visibility schemas to the following:
- Core:
- MySQL schema v1.11
- PostgreSQL schema v1.11
- Cassandra schema v1.9
- Visibility:
- Elasticsearch schema v6
- MySQL schema 1.4
- PostgreSQL schema v1.4
Please see our upgrade documentation for the necessary steps to upgrade your schemas.
Deprecation Announcements
- We've replaced all individual metrics describing Commands (e.g.
complete_workflow_command
,continue_as_new_command
etc.) with a single metric calledcommand
which has a tag “commandType” describing the specific command type (see #4995) - Standard visibility will be deprecated in the next release v1.24.0 along with old config key names, i.e. this is the last minor version to support them. Please refer to v1.20.0 release notes for upgrade instructions, and also refer to v1.21.0 release notes for config key changes.
- In advanced visibility, the
LIKE
operator will no longer be supported in v1.24.0. It never did what it was meant to do, and only added confusing behavior when used with Elasticsearch.
Golang version
- Upgraded golang to 1.21
Batch workflow reset by Build ID
For situations where an operator wants to handle a bad deployment using workflow
reset, the batch reset operation can now reset to before the first workflow task
processed by a specific build id. This is based on reset points that are created
when build id changes between workflow tasks. Note that this also applies across
continue-as-new.
This operation is not currently supported by a released version of the CLI, but
you can use it through the gRPC API directly, e.g. using the Go SDK:
client.WorkflowService().StartBatchOperationRequest(ctx, &workflowservice.StartBatchOperationRequest{
JobId: uuid.New(),
Namespace: "my-namespace",
// Select workflows that were touched by a specific build id:
VisibilityQuery: fmt.Sprintf(`BuildIds = "unversioned:bad-build"`),
Reason: "reset bad build",
Operation: &workflowservice.StartBatchOperationRequest_ResetOperation{
ResetOperation: &batch.BatchOperationReset{
Identity: "bad build resetter",
Options: &commonpb.ResetOptions{
Target: &commonpb.ResetOptions_BuildId{
BuildId: "bad-build",
},
ResetReapplyType: enumspb.RESET_REAPPLY_TYPE_SIGNAL,
},
},
},
})
History Task DLQ
We've added a DLQ to history service to handle poison pills in transfer / timer queues and other history task queues including visibility and replication queues. You can see our operators guide for more details.
If you want tasks experiencing unexpected errors to go to the DLQ after a certain number of failures you can set the history.TaskDLQUnexpectedErrorAttempts
dynamic config.
Approximately FIFO Task Queueing
Once this feature is enabled, our task queues will be roughly FIFO.
This is disabled by default in 1.23, as we continue testing it but expect that it’ll be enabled by default in 1.24. To enable it the following config should be set to a short duration (e.g. 5sec) from its current default value (10yrs): "matching.backlogNegligibleAge"
We've added the following metrics as part of this effort:
poll_latency
- this is a per-task-queue histogram of the duration between worker poll request and response (with or without task) calculated from the Matching server’s perspectivetask_dispatch_latency
- this is a histogram of schedule_to_start time from Matching's perspective, broken down by task queue and task source (backlog vs history)
Global Persistence Rate Limiting
We've added the ability to specify global (cluster level) rate limiting value for the persistence layer. You can configure by specifying the following dynamic config values:
frontend.persistenceGlobalMaxQPS
history.persistenceGlobalMaxQPS
matching.persistenceGlobalMaxQPS
worker.persistenceGlobalMaxQPS
You can also specify this on the per-namespace level using
frontend.persistenceGlobalNamespaceMaxQPS
history.persistenceGlobalNamespaceMaxQPS
matching.persistenceGlobalNamespaceMaxQPS
worker.persistenceGlobalNamespaceMaxQPS
Please be aware that this functionality is experimental. This global rate limiting isn’t workload aware but shard-aware; we currently allocate this QPS to each pod based on the number of shards they own rather than the demands of the workload, so pods with many low-workload shards will have a higher allocation of this limit than pods with fewer but more active workloads. If you plan to use this you will want to set the QPS value with some headroom (like 25%) to account for this.
Renamed Deadlock-detector metrics
The metrics exported by the deadlock detector were renamed to use a dd_
prefix to avoid confusion with other lock latency metrics. Affected metrics: dd_cluster_metadata_lock_latency
, dd_cluster_metadata_callback_lock_latency
, dd_shard_controller_lock_latency
, dd_shard_lock_latency
, dd_namespace_registry_lock_latency
.
Visibility Prefix Search
Visibility API now supports prefix search by using the keyword STARTS_WITH
. Eg: WorkflowType STARTS_WITH 'hello_world'
. Check the Visibility documentation for additional information on supported operators.
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release (use tag v1.23.0
)
Server
Server With Auto Setup (what is Auto-Setup?)
Admin-Tools
New Contributors
v1.22.6
Release Highlights
This release mitigates a rollback problem introduced into one of our v1.23.0 release candidates. This has no impact on OSS users using official releases.
All Changes
2024-02-29 - 2899920 - Bump Server version to 1.22.6
2024-02-29 - 1eba091 - Update Go SDK to handle SDKPriorityUpdateHandling flag (#5468)
Helpful links to get you started with Temporal
Temporal Docs
Server
Docker Compose
Helm Chart
Docker images for this release (use the tag 1.22.6
)
Server
Server With Auto Setup (what is Auto-Setup?)
Admin-Tools