-
Notifications
You must be signed in to change notification settings - Fork 106
default value at false for disable_auto_sst and startup_timeout #195
Comments
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/154352380 The labels on this github issue will be updated when the story is started. |
hello @ldangeard-orange I am curious what's the DB size in your env? |
hello @GETandSELECT, on my DB Test : 15Gb, but we wish to increase to 50Gb |
Hello, So, it's not a good idea to have |
@ldangeard-orange I'm a little confused. SST should only happen during BOSH pre-start phase, which is not governed by Am I understanding the problem, and explaining this correctly? |
Hello @menicosia, there are several cases:
|
|
Hi Marco (@menicosia) and Caroline (@ctaymor), I'm jumping into this issue in order to clarify things. I'm working with Laurent (@ldangeard-orange) in the Orange FR database experts team. Here, people have a very strong expertise on production data services. As a contractor and BOSH expert in France, I'm helping them into pivoting towards authoring BOSH releases and recommendations that benefit or encapsulate their expertise. Currently, we focus on MongoDB, Cassandra and MariaDB (with this Here in this issue, the situation that Laurent describes is the following :
I our case, step 1 was triggered by nodes joining back a cluster after a TPCC benchmark. Indeed, we had monit-stopped 2 out of 3 nodes before running a TPCC benchmark on the remaining node. And when the 2 other nodes join the cluster back with a So, it's correct that stopping nodes with Normally, Monit should not try to restart the daemon while the system is doing an SST. Maybe the SST script should write its PID into the PID file that Monit is tracking, so that Monit is happy with a live process. But this might require some PRs to be pushed in Galera so that the SST writes its PID in a specific PID file. What Laurent says is that, waiting for such changes to happen, it would be safer to move back to For a long-term solution, assuming that this log line is the result of a customized SST script, why not have this script write its own PID in the proper file so that Monit keeps being happy? This is just a guess. Now that I (hopefully) clarified the issue, I let you jump in a suggest any fix you find most relevant. As a conclusion, I hope this will help in solving this issue, which is a concern for anyone relying on the default values proposed by this BOSH release. Best, |
Hello,
With the new version develop 36.10 (dev), by default the value of
disable_auto_sst
isFALSE
.However, when you have a big database, the copy with Xtrababckup (SST) need more 60 seconds (
cf_mysql.mysql.startup_timeout
with 60 by default)So monit mariadb_ctrl tries to restart the base, while the transfer is not finished. Many error messages :
So I think , it's better to block SST because you need to analyse why your instance is desync.
If you want to maintain
default
value FALSE fordisable_auto_sst
, you need to :. increase
startup_timeout
. monitoring if instance execute SST, for example with
wsrep_cluster_conf_id
: http://galeracluster.com/documentation-webpages/monitoringthecluster.htmlThe text was updated successfully, but these errors were encountered: