Skip to content
This repository has been archived by the owner on May 18, 2022. It is now read-only.

Basic protection against concurrency issues #260

Merged
merged 33 commits into from
Feb 25, 2021
Merged

Basic protection against concurrency issues #260

merged 33 commits into from
Feb 25, 2021

Conversation

KyleMaas
Copy link
Collaborator

Basic protection against concurrency issues. See #139. Uses a localStorage-based mutex which times out in case a tab dies on us. Adds a bunch of extra checks throughout the codebase to try to gracefully handle a locked SSB.

Also fixes:

  • Fix error where clicking Refresh when you didn't have any messages would error out.
  • Fix missing getProfileNameAsync in Private.

Fix error where clicking Refresh when you didn't have any messages would error out.

Fix missing getProfileNameAsync in Private.
@KyleMaas
Copy link
Collaborator Author

If you do make changes, there are a few important things about this:

  1. It is possible for a not-quite-fully-initialized SSB to be passed back to the caller. This pretty much always happens with the first getSSB() call that initializes the SSB object. This is why in most places I don't just check for whether the SSB object is returned, but also that the functionality it needs is included in it.
  2. For long-running processes (like Connections status polling) or anything with a callback, it is possible for the parent window to be closed between the time the callback was set up and when it runs. So the SSB object needs to be acquired anew (and error checking needs to happen) each time one of these runs.
  3. I know you wanted this done synchronously, but a lot of the places that need SSB could be vastly simplified with an asynchronous callback. As it is, there are a lot of setTimeout()s to just try again. I intentionally set this up so it was easy to convert over.
  4. The mutex system is localStorage-based. This is because it's very widely-supported across browsers, it uses the same browser API as the storage we're protecting in many cases, and per the localStorage spec browsers are required to implement mutexes in their implementations. That was tricky to get right, but there is a reason it delays startup by 2 seconds, and that's to check for pings from other tabs. Without that, it starts nearly immediately but does not properly protect against concurrent access.

@arj03
Copy link
Owner

arj03 commented Feb 23, 2021

Looking at this sync stuff it's actually not that bad I think.
Like if we convert:

        [ err, SSB ] = ssbSingleton.getSSB()
        if (!SSB || !SSB.net) {
          // This is async anyway - try again later.
          setTimeout(this.renderConnections, 3000)
          return
        }

into something like:

        SSB = ssbSingleton.getSSBOrTryAgain(this.renderConnections)
        if (!SSB) return

and:

function getSSBOrTryAgain(tryAgain) {
        [ err, SSB ] = ssbSingleton.getSSB()
        if (err || (!SSB || !SSB.db)) { // <- I guess db should be enough in all cases
          setTimeout(tryAgain, 500)
          return
        } else
          return SSB
}

That would make stuff a bit cleaner.

@arj03
Copy link
Owner

arj03 commented Feb 23, 2021

I still need to properly test this, but wanted to leave my comment here, not going to create any changes to this, so go ahead and make that change if it looks good. I'll do some proper testing tomorrow.

@KyleMaas
Copy link
Collaborator Author

KyleMaas commented Feb 24, 2021

I like where you're going with that. It was a little different than what I was thinking, but I think it would work quite well. How about something like (writing this freehand and haven't tested it):

function getSSBEventually(timeout, isRelevantCB, ssbCheckCB, resultCB) {
  // If the caller no longer needs a result, return right away before processing anything.
  if (isRelevantCB && !isRelevantCB()) return;

  [ err, SSB ] = ssbSingleton.getSSB()
  
  // Do this here so that if we time out and return, SSB is set to null if it doesn't pass.
  // That way a simple if(SSB) is all it takes on the client end.
  SSB = (!err && ssbCheckCB(SSB) ? SSB : null)

  if (!SSB) {
    if (timeout != 0) {
      // Try again.
      setTimeout(function() {
        getSSBEventually((timeout > 0 ? Math.max(0, timeout - 500) : timeout), isRelevantCB, checkCB, resultCB)
      }, 500)
      return
    } else if (!err) {
      // We timed out but don't have an error, so we should set one before the callback below runs.
      err = "Could not lock database"
    }
  }
  resultCB(err, SSB)
}

The idea being that you would call it like this:

function renderSomethingOptionalButHelpful(err, SSB) {
  if (SSB) {
    // We know SSB fits what we need.
  } else {
    // We probably timed out, which means there is likely another window holding SSB.
    // But this is an optional feature, so we don't really care enough to keep retrying.
  }
}

// Pass 5000 so it times out with an error after 5 seconds.
ssbSingleton.getSSBEventually(5000, () => { return self.componentStillLoaded },
  (SSB) => { return SSB.getProfile }, this.renderSomethingOptionalButHelpful)

Or this:

function postMessage(err, SSB) {
  if (SSB) {
    // We know SSB fits what we need.
  } else {
    // We timed out.  Tell the user why.
    alert("Couldn't post.  Reason: " + err)
  }
}

// Pass 3000 so it times out with an error after 3 seconds so we can complain to the user.
ssbSingleton.getSSBEventually(3000, () => { return self.componentStillLoaded },
  (SSB) => { return SSB.db }, this.postMessage)

Or this:

function renderConnections(err, SSB) {
  // Do stuff with an SSB we know works because with an infinite timeout we are never called unless SSB fits.
}

// Pass -1 for a timeout so it keeps trying until it acquires a valid SSB.
ssbSingleton.getSSBEventually(-1, () => { return self.componentStillLoaded },
  (SSB) => { return SSB.net && SSB.db }, this.renderConnections)

And by having an "is SSB suitable for my use" callback, you could also have your check callback do stuff like checking if the profile index was loaded, rather than the way we do now where I had to use setTimeout() in the case where the database was loaded but the profile index was unavailable.

By having an "is this still relevant" callback, we can make sure it stops retrying if the Vue component which needed it gets closed. That way if we can't acquire an SSB, we're not leaking memory by holding open references and Vue components which are destroyed can be properly garbage collected.

What would you think about something like that?

(Edited to shorten line lengths to make it a little easier to read.)
(Edited again because of a missing parameter on the retry.)
(Edited again because infinite retry couldn't have worked. I really should probably test this sometime and see if there's anything else I missed.)

@arj03
Copy link
Owner

arj03 commented Feb 24, 2021

Yeah that sort of makes sense. Just need to be careful not to convert everything to callbacks, which is what I liked about the getSSBOrTryAgain proposal and the way you did it currently.

@arj03
Copy link
Owner

arj03 commented Feb 24, 2021

Another thing to consider is that this feels like it should live in ssb-browser-core. But lets just make it work here first and then we can port it over.

@KyleMaas
Copy link
Collaborator Author

Another thing to consider is that this feels like it should live in ssb-browser-core. But lets just make it work here first and then we can port it over.

Yeah, I was kind of thinking this, too. Part of the reason I tried to separate ssb-singleton out into its own module which doesn't rely on anything in ssb-browser-demo.

@KyleMaas
Copy link
Collaborator Author

I'm going to be afk for a while, so I figured I'd commit this so you can work on it if you want. This latest commit should still work. My changes to Profile are what are triggering a bunch of race conditions between different initialization steps of ssb-browser-core.

@KyleMaas
Copy link
Collaborator Author

Point being I held back the non-working Profile changes so I can debug those. I'd rather have this stable enough you can work on if you want.

@KyleMaas
Copy link
Collaborator Author

Turns out it was less of a race condition and more of a scope problem. If the async function was called, its use of SSB would override the global SSB object in use by ssb-browser-core and would set it to null. We still need to fix the problem of ssb-browser-core using an SSB global, but this at least works around that in the meantime.

… add missing API that profile image saving used
@arj03
Copy link
Owner

arj03 commented Feb 25, 2021

I'll review the code now. Will do a little refactor as well

@arj03
Copy link
Owner

arj03 commented Feb 25, 2021

Been reading the diff. Looks good. I only had those 2 comments. I pushed up a minor refactor to make the api a bit easier to use. I'm one of those people that use right click open more than middle click so would be really nice to have that fixed as well. I have been testing this and it seems to be working quite well with the locking etc.

@arj03
Copy link
Owner

arj03 commented Feb 25, 2021

Also I should probably say, this is really nice and quite a tricky one to get right, so great work so far! :)

@KyleMaas
Copy link
Collaborator Author

Thanks! Concurrency's a fascinating subject and exceedingly easy to get wrong. I'm more used to doing multiprocessing/multithreading in environments where you can have shared memory (either in the same process or system-wide), actual mutexes and semaphores, filesystem locking, critical sections, PID files, communications sockets that all your target platforms support (unlike such things as Broadcast Channels), etc. And I'm more used to doing it in projects where it was designed for it early on - client/server protocols for communicating with a daemon which holds the locks, for example. Doing this kind of thing entirely on the client side is a bit of an adventure for me.

@@ -3,6 +3,7 @@ module.exports = function () {
require('./ssb-msg')
require('./ssb-msg-preview')
require('./onboarding-dialog')
require('./link-contextmenu')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing this file :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not wrong. :)

@KyleMaas
Copy link
Collaborator Author

I think I'm actually pretty happy with that.

@arj03
Copy link
Owner

arj03 commented Feb 25, 2021

Nice, will give it a test :-)

@arj03
Copy link
Owner

arj03 commented Feb 25, 2021

If I control click on a link it doesn't work. Is that correct? I tried opening a thread using that. It works if I right click and say open in new tab.

@KyleMaas
Copy link
Collaborator Author

Having used a mouse with a center button for years, I didn't know that was a thing. But I should be able to catch that.

@KyleMaas
Copy link
Collaborator Author

See how that works for 'ya.

@arj03
Copy link
Owner

arj03 commented Feb 25, 2021

That works. I think this is ready to go. I'll just remove the console.log. Do you have anything else you think we should do first?

@KyleMaas
Copy link
Collaborator Author

(Incidentally, the name of this pull request is no longer accurate. This goes way further than I was intending for the initial concurrency stuff.)

@KyleMaas
Copy link
Collaborator Author

Nope. I'm good. Go for it.

@arj03 arj03 merged commit 1146b85 into master Feb 25, 2021
@arj03
Copy link
Owner

arj03 commented Feb 25, 2021

Basic protection :-)

@arj03 arj03 deleted the concurrency branch February 25, 2021 14:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants