-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use RWLock to prevent races in inferno-vc-server #36
Conversation
either (throwError . CouldNotDecodeObject h) pure =<< liftIO (eitherDecode <$> BL.readFile fp) | ||
|
||
-- | Fetch an object WITHOUT holding any locks. This is used by the cached client, which | ||
-- is safe since the cache is read only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@goodlyrottenapple is this correct?
@@ -387,18 +427,30 @@ getAllHeads = do | |||
[] | |||
(map takeFileName headsRaw) | |||
|
|||
fetchFunctionsForGroups :: (VCStoreLogM env m, VCStoreErrM err m, VCStoreEnvM env m, Ord g, FromJSON a, FromJSON g) => Set.Set g -> m [VCMeta a g VCObjectHash] | |||
-- | Fetch all objects that are public or that belong to the given set of groups. | |||
-- Note this is a potentially long operation so no locks are held while traversing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with this (at least for now). We are calling this function multiple times in Onping's selector. Lock will increase the respond time with each query.
I'm thinking whether we can use a different lock for each head. Currently, the lock is implemented for the whole store. The idea is that scripts mostly form a disjoint set. Each set consists of the head and all of its predecessors. Because currently the lock is implemented for the whole store, writing to one set of script will also block the read/write of other set of scripts. They should be safe to be updated concurrently because by design they are a fundamentally different set of scripts. If we can implement a lock per script set, then it should help with the performance. Anyway, I think it looks good, although we'd probably need to monitor if this affects performance and adjust accordingly after - since tthis PR has no migration it should not be a problem to change the implementation in the future. At least for selector used in Onping, it won't be an issue as there's no lock there. |
This PR uses a read-write lock to protect concurrent file IO performed by inferno-vc-server. This should hopefully prevent any concurrency bugs caused by two people trying to save/update/delete a script at the same time. (#19 )
The writeup below is a description of the current implementation, its issues, and some proposed solutions. This PR corresponds to Option 0.
Inferno Version Control Store
A simple version control for scripts.
Current Implementation
Data structure: uses the file system. Objects are stored in the root directory with their hash as filenames, the
heads/
directory contains one file for each script history named by the hash of the head of that history, and theto_head/
directory maps every script to the head of the history it belongs to:Store method:
The
to_head
mappings are used by thefetchHist(h)
operation, which uses the mapping to find the head of the given object and then reads the object's history from the file in theheads/
directory.Issues with current implementation:
The
renameFile
andappendFile
are not in an atomic block. This means by the time an operation tries to append to the file, the next store operation could have already renamed the file to something else, meaning the first operation's predecessor would be lost from the history.The update of
to_head
pointers of one operationstore(o2, o1)
can race with the successive operationstore(o3, o2)
. If the latter overtakes the former, this will result in some objects in the history incorrectly pointing too2
as their head instead ofo3
.Crash safety: crashes, for example between
renameFile
andappendFile
, will leave the store in an inconsistent state.Option 0: slap a lock around all operations
Option 1: fix the current implementation
Idea: all versions of a script share a unique
histID
. Instead ofto_head
, you have a meta fieldto_hist
in the object file as this is stable.Maintain a
hist_to_head
map in memory that maps each histID to the hash of its head, and use Control.Concurrent.Lock or an MVar to update this map atomically.store(o, p)
writes a new head file foro
, but retries until it can successfully updatehist_to_head
.fetchHist(h)
looks up thehistID
from the meta file, finds the head fromhist_to_head
, and returns the history saved in the appropriate head file.hist_to_head
can be periodically saved to file and on startup the last snapshot can be loaded and updated if necessary (as it is easy to reconstruct it from all object metas).Pros: small migration from current store, most operations don't need lock so is relatively efficient.
Cons: need to carefully check/prove concurrent correctness, needs migration
What happens if there is a crash, and 2 concurrent new HEADs? Recovery can pick arbitrary one. Or discard both.
Use a checksum to detect crash. Remember crash can happen when snapshotting! (How does VPDB deal with this?)
Option 2: use existing file-backed DB
hash, pred, isHead, object, histID
isHead
of pred (atomically)hash
, so fetch is fasthistID
, sofetchHist(h)
looks uphistID
ofh
and fetches all rows with thathistID
.Option 3: STM
Testing: are there libraries to test for such concurrency bugs?
For example, something that runs a random sequence of operations in a single-thread and in a concurrent setting and compares outputs.