Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some clarifications about data persistence [HZG-257] #1510

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

th0masb
Copy link

@th0masb th0masb commented Jan 30, 2025

Some clarifications that data persistence is there to augment in-memory partition backups and that planned cluster shutdowns must be done cluster-wide. A rolling restart is not the same as a cluster shutdown, it is a sequence of individual member restarts and so is more appropriate to categorize as a speed improvement instead of a resiliency one.

@th0masb th0masb requested a review from a team as a code owner January 30, 2025 14:28
@devOpsHazelcast
Copy link
Collaborator

devOpsHazelcast commented Jan 30, 2025

CLA assistant check
All committers have signed the CLA.

Copy link

netlify bot commented Jan 30, 2025

Deploy Preview for hardcore-allen-f5257d ready!

Name Link
🔨 Latest commit 3ecd200
🔍 Latest deploy log https://app.netlify.com/sites/hardcore-allen-f5257d/deploys/67a0f40248ff53000816318a
😎 Deploy Preview https://deploy-preview-1510--hardcore-allen-f5257d.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link
Contributor

@JamesHazelcast JamesHazelcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working to improve our documentation on this topic @th0masb - a few comments from me on this round

docs/modules/storage/pages/persistence.adoc Outdated Show resolved Hide resolved
docs/modules/storage/pages/persistence.adoc Outdated Show resolved Hide resolved
docs/modules/storage/pages/persistence.adoc Show resolved Hide resolved
docs/modules/storage/pages/persistence.adoc Outdated Show resolved Hide resolved
- **Speeding up single member restarts**:

** **Planned**: During a rolling restart each cluster member is restarted one by one for scenarios such as installing an operating system patch or new hardware. xref:maintain-cluster:rolling-upgrades.adoc[Rolling upgrades] are an example of a rolling restart.
** **Unplanned**: A member may crash or terminate unexpectedly at any time, using persistence allows faster recovery.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also depends on configuration - if rebalancing is delayed, or the cluster is in PASSIVE state, it loads all data from disk without receiving any from other members

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that even if the member crashed when the cluster was in active state and rebalancing was not delayed the persisted data could still be used to speed up the member rejoining the cluster. Aren't merkle trees compared to reduce network traffic as much as possible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's correct, I'm saying that in addition to this "faster recovery" case, there are conditions where it recovers entirely using its own data, not just using it as a speedup mechanism.


- **Speeding up single member restarts**:

** **Planned**: During a rolling restart each cluster member is restarted one by one for scenarios such as installing an operating system patch or new hardware. xref:maintain-cluster:rolling-upgrades.adoc[Rolling upgrades] are an example of a rolling restart.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of a cluster-wide shutdown, we do not want to create confusion between these topics imo

Copy link
Author

@th0masb th0masb Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it confusing to read the existing way, how is a rolling restart a cluster wide shutdown if only one member at any time is stopped? Is it mandatory to put the cluster into passive mode for a rolling restart? It seems like it would work fine if the cluster was still in active mode.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose there's arguments to be made both ways - I can see why you see it as a series of single member restarts, but I can also see that it does eventually result in the entire cluster restarting, so it could be seen as a cluster-wide shutdown (restart). I'm not too fussed, we can let docs team review this aspect and see if it makes sense to them 👍

@Rob-Hazelcast
Copy link
Contributor

@th0masb Looks like you're still debating the technical details here, so ignoring for now. Feel free to @ me when this is ready for editorial review, and please add the backport to all versions label if appropriate.

Copy link
Contributor

@JamesHazelcast JamesHazelcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments from me, but mainly nits so approving in advance - thanks for raising this @th0masb and improving our docs 👍

docs/modules/storage/pages/persistence.adoc Outdated Show resolved Hide resolved
docs/modules/storage/pages/persistence.adoc Outdated Show resolved Hide resolved
docs/modules/storage/pages/persistence.adoc Outdated Show resolved Hide resolved
docs/modules/storage/pages/persistence.adoc Outdated Show resolved Hide resolved
@th0masb
Copy link
Author

th0masb commented Feb 3, 2025

@Rob-Hazelcast I think this is ready now, as for backporting I am unsure of the policy for the docs but since this is a clarification on existing functionality and the page content looks to be the same since 5.3 I would recommend backporting to 5.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants