-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tt: add command tt upgrade
#936
base: master
Are you sure you want to change the base?
tt: add command tt upgrade
#936
Conversation
b10d9bd
to
34a3589
Compare
Examples: Case 1: OK
Case 2: More than one master in the same replicaset
Case 3: LSN didn't update
Case 4: There is a replicaset that does not have a master
Case 5: A non-existent replicaset was specified
|
2b33e94
to
f2406ee
Compare
234b980
to
e12fe07
Compare
a30a1e5
to
45fc816
Compare
tt upgrade
[WIP]tt upgrade
cli/upgrade/upgrade.go
Outdated
conn, err := connector.Connect(connector.ConnectOpts{ | ||
Network: "unix", | ||
Address: run.ConsoleSocket, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that tt upgrade
can be used only if all the instances of the given replicaset(s) are started on the same machine, where the command is executed?
If so, I have two questions:
- I think that we should at least make this contraint clear from the documentation.
- Is it complicated to implement evaluation of the needed commands on remote instances using iproto (assuming that we have a permission to evaluate arbitrary code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subject to discussion. See my comment below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: I filed #968 in case we'll implement the main logic here and remote instance support separately. We're free to solve them both at once if we want/agreed.
cli/upgrade/upgrade.go
Outdated
err = WaitLSN(conn, masterUpgradeInfo, lsnTimeout) | ||
if err != nil { | ||
printReplicasetStatus(rs.Alias, "error") | ||
return fmt.Errorf("[%s]: LSN wait timeout: error waiting LSN %d "+ | ||
"in vclock component %d on %s: time quota %d seconds "+ | ||
"exceeded", rs.Alias, masterUpgradeInfo.LSN, | ||
masterUpgradeInfo.IID, fullInstanceName, lsnTimeout) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@psergee Let's discuss it in a thread (to ease navigation over the discussion later).
⨯ [storage-001]: LSN wait timeout: error waiting LSN 2003085 in vclock component 1
This message about LSN seems a bit confusing to me. Maybe add something like "Schema upgrade replication timeout exceeded:" before "error waiting LSN..." to make it clearer what has just happened?
I agree that it may be convenient to have some reason, why the LSN waiting is needed in the first place. However, 'schema upgrade replication timeout' seems a bit vague for me. How about the following variant?
⨯ [storage-001-replica]: can't ensure that upgrade operations performed on "storage-001-master" are replicated to "storage-001-replica" to perform snapshotting on it: error waiting LSN 2003085 in vclock component 1: <..last error from eval or actual LSN..>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose the last from the ones suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@psergee Is it OK for you?
@pytest.mark.skipif(tarantool_major_version < 3, | ||
reason="skip test with cluster config for Tarantool < 3") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no tests without the cluster configuration. However, I see nothing specific to tarantool 3.x in the implementation. What is the reason? Is it easier to write a test for tarantool 3.x only?
@psergee Should we add some test that works on tarantool 2.11?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still in progress.
|
||
@pytest.mark.skipif(tarantool_major_version < 3, | ||
reason="skip test with cluster config for Tarantool < 3") | ||
def test_upgrade_all_replicasets(tt_cmd, tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the test cases perform a no-op upgrade: the instances are already on the last database schema version. As result an implementation that doesn't call box.schema.upgrade()
at all would pass this test. No-op WaitLSN
would also pass it.
@psergee Should we add scenarios that perform a real upgrade from, say, a 2.11.1 schema? Or it is considered as too complicated in the given testing infrastructure?
I see two variants how to implement it:
- Save the pre-generated 2.11 snapshot into the repository, copy them into appropriate instance directories before starting the cluster.
- Downgrade the database schema to 2.11.1 on running instances before calling the
tt upgrade
command.
Dowgrading may need extra tricks before tarantool/tarantool#10150 is solved if a cluster configuration is in use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests (downgrade -> 2.11 -> upgrade 3.x) added for single instance and vshard cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a special subcomand tt replicaset
for such staff. I propose to add a tt replicaset upgrade
command instead of the tt upgrade
.
@oleg-jukovec Is it a decision or a proposal to discuss? If the former, OK. If the latter I would highlight my opinion on it.
|
I agree that the current naming Currently, I don’t see a point to introduce further confusion with individual commands/another approach or to add new sets of subcommands. This should be a centralized design decision that is outside the scope of this pull request. |
Hm. My expectation is that the upgrade command is suitable for an arbitrary cluster, not only local cluster. (AFAIU,
OK. I'm a bit worrying about a need for compatibility aliases even for such a new functionality. However, I understand the wish to separate the problems. It sounds like a decision, so I'll not insist on my feelings here anymore (unless the discussion will gain new points from others). |
Regarding the |
97ed05d
to
dc16c99
Compare
The command is now called Changes:
In the current version, there is already a (workaround-based) ability to perform an upgrade on a remote cluster (each replicaset separately). This uses a mechanism similar to For example, the application is deployed on the server Config.yamlcredentials:
users:
client:
password: 'secret'
roles: [super]
replicator:
password: 'secret'
roles: [replication]
storage:
password: 'secret'
roles: [sharding]
iproto:
advertise:
peer:
login: replicator
sharding:
login: storage
sharding:
bucket_count: 3000
groups:
storages:
app:
module: storage
sharding:
roles: [storage]
replication:
failover: manual
replicasets:
storage-001:
leader: storage-001-a
instances:
storage-001-a:
iproto:
listen:
- uri: 10.0.10.123:3301
storage-001-b:
iproto:
listen:
- uri: 10.0.10.123:3302
storage-002:
leader: storage-002-a
instances:
storage-002-a:
iproto:
listen:
- uri: 10.0.10.123:3303
storage-002-b:
iproto:
listen:
- uri: 10.0.10.123:3304
routers:
app:
module: router
sharding:
roles: [router]
replicasets:
router-001:
instances:
router-001-a:
iproto:
listen:
- uri: 10.0.10.123:3305 Upgrade commands: $ tt replicaset upgrade tcp://client:[email protected]:3301
• storage-001: ok
$ tt replicaset upgrade tcp://client:[email protected]:3303
• storage-002: ok
tt replicaset upgrade tcp://client:[email protected]:3305
• router-001: ok However, this is likely not the ideal solution that we aim for, since this method requires a lot of manual work when upgrading a large cluster. We can make the TCP connection a mock in this patch, support only upgrades on local instances. However, since the mechanism for connecting to remote instances already exists, we might want to implement the full upgrade functionality right away. I would like to hear your thoughts on this @oleg-jukovec @psergee. |
019d9f5
to
da3946b
Compare
da3946b
to
49a7cb1
Compare
// This may be a single-instance application without Tarantool-3 config | ||
// or instances.yml file. | ||
if len(discoveryCtx.RunningCtx.Instances) == 1 { | ||
// Create a dummy replicaset | ||
var replicasetList []replicaset.Replicaset | ||
var dummyReplicaset replicaset.Replicaset | ||
var instance replicaset.Instance | ||
|
||
instance.InstanceCtx = discoveryCtx.RunningCtx.Instances[0] | ||
instance.Alias = running.GetAppInstanceName(instance.InstanceCtx) | ||
instance.InstanceCtxFound = true | ||
|
||
dummyReplicaset.Alias = instance.Alias | ||
dummyReplicaset.Instances = append(dummyReplicaset.Instances, instance) | ||
replicasetList = append(replicasetList, dummyReplicaset) | ||
|
||
return internalUpgrade(replicasetList) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this functionality is really needed, but I thought about it when writing tests (see the last one in test_replication_upgrade.py
). (In short, this is Tarantool 2.11 application with single instance).
956c134
to
5b37eec
Compare
5b37eec
to
5168866
Compare
Part of tarantool#924 @TarantoolBot document Title: `tt replicaset upgrade` upgrades database schema. The `tt replicaset upgrade` command allows for a automate upgrade of each replicaset in a Tarantool cluster. The process is performed sequentially on the master instance and its replicas to ensure data consistency. Below are the steps involved: For Each Replicaset: - **On the Master Instance**: 1. Run the following commands in sequence to upgrade the schema and take a snapshot: ```lua box.schema.upgrade() box.snapshot() ``` - **On Each Replica**: 1. Wait for the replica to apply all transactions produced by the `box.schema.upgrade()` command executed on the master. This is done by monitoring the vector clocks (vclock) to ensure synchronization. 2. Once the repica has caught up, run the following command to take a snapshot: ```lua box.snapshot() ``` > **Error Handling**: If any errors occur during the upgrade process, the operation will halt, and an error report will be generated. --- - Timeout for Synchronization Replicas will wait for synchronization for a maximum of `Timeout` seconds. The default timeout is set to 5 seconds, but this can be adjusted manually using the `--timeout` option. **Example:** ```bash $ tt replicaset upgrade [<APP_NAME>] --timeout 10 ``` - Selecting Replicasets for Upgrade You can specify which replicaset(s) to upgrade by using the `--replicaset` or `-r` option to target specific replicaset names. **Example:** ```bash $ tt replicaset upgrade [<APP_NAME> | <URI>] --replicaset <RS_NAME_1> -r <RS_NAME_2> ... ``` This provides flexibility in upgrading only the desired parts of the cluster without affecting the entire system.
5168866
to
8f0f987
Compare
tt upgrade
command steps:box.schema.upgrade()
on the master by comparing the vector clocks (vclock).The replica is waiting for synchronization for
Timeout
seconds. The default value forTimeout
is 5 seconds, but you can specify it manually using the--timeout
option.You can also specify which replicaset(s) to upgrade by using the
--replicaset
option.