Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repair backups on ReadRepair #236

Closed

Conversation

kwapik
Copy link

@kwapik kwapik commented Dec 15, 2023

Before this change, ReadRepair mechanism was only restoring primary partitions and not backup ones.

Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@buraksezer buraksezer self-requested a review December 15, 2023 15:07
@buraksezer buraksezer self-assigned this Dec 15, 2023
@buraksezer
Copy link
Owner

Unfortunately, your approach is wrong. The readRepair method is only called by the primary owner, and running the read repair logic on the replica/backup partition is useless.

It seems that the actual problem relies on the lookupOnReplicas method. The readRepair method works on the versions slice. lookupOnReplicas method somehow fails to add some replicas to the versions slice.

Please take a look at this snippet. It doesn't create a version if the replica owner returns an error when the primary wants to read a key/value pair. I think the replica owner returns a "key-not-found" or similar error message, and then it fails to repair the replica.

func (dm *DMap) lookupOnReplicas(hkey uint64, key string) []*version {
	// Check backup.
	backups := dm.s.backup.PartitionOwnersByHKey(hkey)
	versions := make([]*version, 0, len(backups))
	for _, replica := range backups {
		host := replica
		cmd := protocol.NewGetEntry(dm.name, key).SetReplica().Command(dm.s.ctx)
		rc := dm.s.client.Get(host.String())
		err := rc.Process(dm.s.ctx, cmd)
		err = protocol.ConvertError(err)
		if err != nil {
			if dm.s.log.V(6).Ok() {
				dm.s.log.V(6).Printf("[DEBUG] Failed to call get on"+
					" a replica owner: %s: %v", host, err)
			}
			continue
		}

@buraksezer buraksezer changed the base branch from master to release/v0.5.0 December 20, 2023 07:09
@buraksezer buraksezer changed the base branch from release/v0.5.0 to master December 20, 2023 07:10
@buraksezer
Copy link
Owner

I'm actively working on this problem.

@buraksezer
Copy link
Owner

Closing this due to lack of interest.

@buraksezer buraksezer closed this May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants