Skip to content

Commit

Permalink
Fix VC history operation and add VC tests (#49)
Browse files Browse the repository at this point in the history
This PR fixes three bugs in the `fetchVCObjectHistory` operation:

The first is in the order of the history of a VC object. Currently,
given a script with 4 versions `s1 -> s2 -> s3 -> s4`,
`fetchVCObjectHistory` returns `[s3, s2, s1, s4]`. This is because we
store history in oldest-to-newest order in file, but the following line
adds `head_h` (i.e. the latest version `s4`) to the wrong end of the
history list:

https://github.com/plow-technologies/inferno/blob/73c59f13b4070323946cc370015e059eb043d2ab/inferno-vc/src/Inferno/VersionControl/Operations.hs#L339-L343
(the `foldM` below then reverses the order, but with `s4` still in the
wrong place)

Even after fixing the above, `fetchVCObjectHistory` incorrectly adds a
second source script in the case of a script that is cloned twice in
succession. Consider the following history:
```
mcJDUg CloneOf F6LRpH
4I47d_ CloneOf mcJDUg
cUpepL MarkedBreakingWithPred 4I47d_
```
In this case, `fetchVCObjectHistory cUpepL` returns `[F6LRpH, mcJDUg,
4I47d_, cUpepL]` (after the above fix) but it should return `[mcJDUg,
4I47d_, cUpepL]`. Interestingly, the above bug hides this bug in
production right now because because `history = [cUpepL, mcJDUg,
4I47d_]` (incorrect), so `metas = [4I47d_, mcJDUg, cUpepL]` and so when
it finds that `4I47d_,` is a clone it adds `original = mcJDUg` and
returns `nubBy [mcJDUg, 4I47d_, mcJDUg, cUpepL] = [4I47d_, mcJDUg,
cUpepL]`. But once I fix `history = [cUpepL, 4I47d_, mcJDUg]`, then
`metas = [mcJDUg, 4I47d_, cUpepL]` and `original = F6LRpH` and the code
returns `[F6LRpH, mcJDUg, 4I47d_, cUpepL]`. This is incorrect because we
only want to add on the original version of the first cloning operation
when going back in history.

While fixing this second bug, I also noticed that `nubBy` (a linear
operation) is unnecessary, as one can simply recurse through history and
stop when one sees the first cloning operation. I thus simplified the
fold function and the large case-split into a single recursive function
`getHistory` that walks through history newest-to-oldest, fetches the VC
objects, and stops when it sees a cloning operation.

**NOTE**: This PR requires a code change downstream. There is a third
issue with the current system: since the frontend requires the history
in newest-to-oldest order, but the current code returns the order `[s3,
s2, s1, s4]`, OnPing is reversing the order it gets from the VC to put
`s4` first. This PR fixes the VC to return history in newest-to-oldest
order, so OnPing should not reverse the order that it gets.

I also benchmarked a few different ways of doing this operation (see
below) and found that the fastest way would be to use a recursive
function instead of a fold, and return the history in newest-to-oldest
order.


### Test cases:

```
// iPFmvYX2GLvC9LYLxRW0HogFrhw188s7bwMguELJ0_c=
KyZIcL MarkedBreakingWithPred tHV1iC
wwWlxB CloneOf KyZIcL
_9eHVU MarkedBreakingWithPred wwWlxB
Vbke7C MarkedBreakingWithPred _9eHVU
iPFmvY MarkedBreakingWithPred Vbke7C
// Expected: [KyZIcL, wwWlxB, _9eHVU, Vbke7C, iPFmvY]

// 0yfJRn3k_14jJWPVMjFEBTHwgnYyBcgTJYU4ThJqHtM=
Rxq6af CloneOf F0j9ZH
26Brml MarkedBreakingWithPred Rxq6af
t5jCXh MarkedBreakingWithPred 26Brml
0yfJRn MarkedBreakingWithPred t5jCXh
// Expected: [F0j9ZH, Rxq6af, 26Brml, t5jCXh ,0yfJRn]

// cUpepLZIHur8wRKkx5Wc8IrvfLmNl0V_pnz9C2T9l54=
mcJDUg CloneOf F6LRpH
4I47d_ CloneOf mcJDUg
cUpepL MarkedBreakingWithPred 4I47d_
// Expected: [mcJDUg, 4I47d_, cUpepL]
```
I tested the new code on the above histories (`heads/` files) and it
returned the correct responses.

### Benchmarking different ways of recursing through history

```haskell
module Main where

import Data.Foldable (foldrM)
import System.Environment (getArgs)

_foldrM :: Int -> IO ()
_foldrM n = do
  let l1 = [1 .. n]
  let l2 = l1 ++ [n + 1]
  l3 <- foldrM process [] l2
  print $ sum $ reverse l3
  where
    process x acc = do
      if x > 5
        then do
          -- print x
          pure $ x : acc
        else pure acc

_rec :: Int -> IO ()
_rec n = do
  let l1 = [1 .. n]
  let l2 = n + 1 : reverse l1
  l3 <- processRec [] l2
  print $ sum $ reverse l3
  where
    processRec acc (x : xs) = do
      if x > 5
        then do
          -- print x
          processRec (x : acc) xs
        else pure acc
    processRec acc [] = pure acc

_rec2 :: Int -> IO ()
_rec2 n = do
  let l1 = [1 .. n]
  let l2 = n + 1 : reverse l1
  l3 <- processRec l2
  print $ sum l3
  where
    processRec (x : xs) = do
      if x > 5
        then do
          -- print x
          res <- processRec xs
          pure $ x : res
        else pure []
    processRec [] = pure []

{-
Run as: <exe> <method> <n>

I benchmarked with:
hyperfine -L n foldrM,rec,rec2 'cabal run inferno {n} 50000000'

And rec2 was 1.2x faster than the other 2 methods.
-}
main :: IO ()
main = do
  getArgs >>= \case
    [m, n'] -> do
      let n = read n'
      case m of
        "foldrM" -> _foldrM n
        "rec" -> _rec n
        "rec2" -> _rec2 n
        _ -> undefined
    _ -> error "expected args: method n"
```

---------

Co-authored-by: Rory Tyler Hayford <[email protected]>
  • Loading branch information
siddharth-krishna and ngua authored Jun 27, 2023
1 parent 73c59f1 commit 77bb919
Show file tree
Hide file tree
Showing 9 changed files with 324 additions and 86 deletions.
2 changes: 2 additions & 0 deletions flake.nix
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,8 @@
packages = {
# This takes forever to build
ghc.components.library.doHaddock = false;
# Broken
temporary.components.library.doHaddock = false;
};
packages.inferno-core = {
enableLibraryProfiling = profiling;
Expand Down
3 changes: 3 additions & 0 deletions inferno-core/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Revision History for inferno-core
*Note*: we use https://pvp.haskell.org/ (MAJOR.MAJOR.MINOR.PATCH)

## 0.3.1 -- 2023-06-26
* Update inferno-vc version

## 0.3.0 -- 2023-06-14
* Introduce Interpreter API to make Inferno parametric on types, values, and primitives

Expand Down
4 changes: 2 additions & 2 deletions inferno-core/inferno-core.cabal
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: 2.4
name: inferno-core
version: 0.3.0
version: 0.3.1
synopsis: A statically-typed functional scripting language
description: Parser, type inference, and interpreter for a statically-typed functional scripting language
category: DSL,Scripting
Expand Down Expand Up @@ -55,7 +55,7 @@ library
, cryptonite >= 0.30 && < 0.31
, exceptions >= 0.10.4 && < 0.11
, generic-lens >= 2.2.1 && < 2.3
, inferno-vc >= 0.2.0 && < 0.3
, inferno-vc >= 0.3.0 && < 0.4
, inferno-types >= 0.1.0 && < 0.2
, megaparsec >= 9.2.1 && < 9.3
, memory >= 0.18.0 && < 0.19
Expand Down
3 changes: 3 additions & 0 deletions inferno-lsp/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Revision History for inferno-lsp
*Note*: we use https://pvp.haskell.org/ (MAJOR.MAJOR.MINOR.PATCH)

## 0.1.5 -- 2023-06-26
* Update inferno-vc version

## 0.1.4 -- 2023-06-19
* Raise error if script evaluates to a function (and suggest adding input parameters instead)

Expand Down
4 changes: 2 additions & 2 deletions inferno-lsp/inferno-lsp.cabal
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: >=1.10
name: inferno-lsp
version: 0.1.4
version: 0.1.5
synopsis: LSP for Inferno
description: A language server protocol implementation for the Inferno language
category: IDE,DSL,Scripting
Expand Down Expand Up @@ -35,7 +35,7 @@ library
, exceptions >= 0.10.4 && < 0.11
, inferno-core >= 0.3.0 && < 0.4
, inferno-types >= 0.1.0 && < 0.2
, inferno-vc >= 0.2.0 && < 0.3
, inferno-vc >= 0.3.0 && < 0.4
, lsp >= 1.6.0 && < 1.7
, lsp-types >= 1.6.0 && < 1.7
, megaparsec >= 9.2.1 && < 9.3
Expand Down
3 changes: 3 additions & 0 deletions inferno-vc/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Revision History for inferno-vc
*Note*: we use https://pvp.haskell.org/ (MAJOR.MAJOR.MINOR.PATCH)

## 0.3.0 -- 2023-06-26
* Fix the order returned by `fetchVCObjectHistory`. BREAKING CHANGE: history is now returned in newest to oldest order.

## 0.2.1 -- 2023-04-26
* Fixes an issue in `fetchVCObjectHistory` that occurred when deleting the source script immediately after cloning it

Expand Down
32 changes: 31 additions & 1 deletion inferno-vc/inferno-vc.cabal
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
cabal-version: >=1.10
name: inferno-vc
version: 0.2.1
version: 0.3.0
synopsis: Version control server for Inferno
description: A version control server for Inferno scripts
category: DSL,Scripting
Expand Down Expand Up @@ -76,6 +76,36 @@ library
, TupleSections
, RecordWildCards

test-suite inferno-vc-tests
type: exitcode-stdio-1.0
hs-source-dirs: test
main-is: Spec.hs
build-depends:
base >=4.7 && <5
, containers
, hspec
, http-client
, inferno-types
, inferno-vc
, QuickCheck
, servant-client
, servant-server
, servant-typed-error
, temporary
, time
default-language: Haskell2010
default-extensions:
DeriveDataTypeable
, DeriveFunctor
, DeriveGeneric
, FlexibleContexts
, FlexibleInstances
, LambdaCase
, OverloadedStrings
, TupleSections
, RecordWildCards
ghc-options: -Wall -Wunused-packages -Wincomplete-uni-patterns -Wincomplete-record-updates -threaded

-- An example executable definition, needs instantation of author/group types:
-- executable inferno-vc-server
-- main-is: Main.hs
Expand Down
134 changes: 53 additions & 81 deletions inferno-vc/src/Inferno/VersionControl/Operations.hs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ import qualified Data.ByteString.Char8 as Char8
import qualified Data.ByteString.Lazy as BL
import Data.Generics.Product (HasType, getTyped)
import Data.Generics.Sum (AsType (..))
import qualified Data.List as List
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.Text (pack)
Expand Down Expand Up @@ -340,87 +339,60 @@ fetchVCObjectHistory h = do
head_h <- fetchCurrentHead h
let head_fp = storePath </> "heads" </> show head_h
preds <- readVCObjectHashTxt head_fp
pure $ head_h : preds
-- When we fold the history, we check if they exist in two places
-- 1. in 'vc_store' for available scripts
-- 2. then in 'vc_store/removed' for scripts that have been deleted
-- If a script has been deleted, we track its hash.
-- Since objects can never be modified, only deleted, we don't need to hold a lock here
let f acc hsh = do
existsInRoot <- liftIO $ doesFileExist $ storePath </> show hsh
existsInRemoved <- liftIO $ doesFileExist $ storePath </> "removed" </> show hsh

if existsInRoot
then do
obj <- fmap (const hsh) <$> fetchVCObject hsh
pure ((obj : fst acc), snd acc)
else do
if existsInRemoved
then do
obj <- fmap (const hsh) <$> fetchRemovedVCObject hsh
pure ((obj : fst acc), (hsh : snd acc))
else -- This script no longer exists even in 'removed' directory. The directory might get cleaned up by accident or something.
-- There are two choices we can make,
-- 1. Return a `VCMeta VCObjectHash` with dummy data
-- 2. Ignore this meta.
-- Approach no. 2 is taken here by just returning the accumulator.
pure acc
(metas, removeds) <- foldM f ([], []) history
-- The rest of this function handles the case when this script history was obtained by
-- cloning, and adds the original script to the returned history, if it still exists:
case removeds of
[] ->
-- if it is a clone, we would like to prepend source of the cloned script as part of the history.
-- it is fine to only do this once since we only show the last source of the clone
-- i.e. original -> cloneof orignal = cloned -> cloneof cloned = cloned'
-- when viewing cloned' history, it will only show up to cloned.
case metas of
all'@(x : _) ->
case Inferno.VersionControl.Types.pred x of
-- Order: newest to oldest
pure $ head_h : reverse preds
-- We recruse through history newest to oldest, and return the history in the same order:
getHistory history
where
-- Recurse through history, newest to oldest, and stop when we find a clone
getHistory (hsh : history) = do
getObj hsh >>= \case
Nothing -> getHistory history
Just eObj -> do
-- Assuming either the entire history of a script is deleted, or none of it,
-- we only care about whether a script has been deleted when we look up the
-- source of a clone
let obj = either id id eObj
case Inferno.VersionControl.Types.pred obj of
CloneOf hsh' -> do
existsInRoot <- liftIO $ doesFileExist $ storePath </> show hsh'
existsInRemoved <- liftIO $ doesFileExist $ storePath </> "removed" </> show hsh'

if existsInRoot
then do
original <- fmap (const hsh') <$> fetchVCObject hsh'
-- 'nubBy' is needed for backward compatibility with current scripts. Clone scripts' head look like this,
--
-- x_0 (init)
-- x_1 (clone)
-- x_1_1
--
-- However, for new script (anything after this PR landed,https://github.com/plow-technologies/all/pull/9801),
-- clone scripts' head are stored like this,
--
-- x_1 (clone)
-- x_1_1
--
-- Note that it is missing the init object. When we fetch for histories, we look for pred of clone and add it to the history, but for existing scripts this means it adds init object twice
pure $ List.nubBy (\a b -> obj a == obj b) $ original : all'
else do
if existsInRemoved
then do
original <- fmap (const hsh') <$> fetchRemovedVCObject hsh'
pure $ List.nubBy (\a b -> obj a == obj b) $ original {Inferno.VersionControl.Types.pred = CloneOfRemoved hsh'} : all'
else pure all'
_ -> pure all'
_ -> pure metas
_ ->
pure $
fmap
( \meta -> case Inferno.VersionControl.Types.pred meta of
CloneOf hsh'
| List.elem hsh' removeds ->
-- The source of the clone script has been deleted, so we alter its 'pred' field as 'CloneOfRemoved' but
-- with the same hash. This way the upstream system (e.g. onping/frontend) can differentiate between
-- source that is still available and no longer available.
-- This does not change the way the script is persisted in the db, it is still stored as 'CloneOf'.
-- See 'CloneOfRemoved' for details.
meta {Inferno.VersionControl.Types.pred = CloneOfRemoved hsh'}
_ -> meta
)
metas
-- if it is a clone, we would like to prepend source of the cloned script as part of the history.
-- it is fine to only do this once since we only show the last source of the clone
-- i.e. original -> cloneof orignal = cloned -> cloneof cloned = cloned'
-- when viewing cloned' history, it will only show up to cloned.
getObj hsh' >>= \case
Just (Right ori) ->
pure [obj, ori]
Just (Left ori) ->
-- The source of the clone script has been deleted, so we alter its 'pred' field as 'CloneOfRemoved' but
-- with the same hash. This way the upstream system (e.g. onping/frontend) can differentiate between
-- source that is still available and no longer available.
-- This does not change the way the script is persisted in the db, it is still stored as 'CloneOf'.
-- See 'CloneOfRemoved' for details.
pure [obj {Inferno.VersionControl.Types.pred = CloneOfRemoved hsh'}, ori]
Nothing ->
-- This script no longer exists even in 'removed' directory. The directory might get cleaned up by accident or something.
-- There are two choices we can make,
-- 1. Return a `VCMeta VCObjectHash` with dummy data
-- 2. Ignore this meta.
-- Approach no. 2 is taken here
getHistory history >>= \res -> pure $ obj : res
_ -> getHistory history >>= \res -> pure $ obj : res
getHistory [] = pure []

getObj hsh = do
VCStorePath storePath <- asks getTyped
existsInRoot <- liftIO $ doesFileExist $ storePath </> show hsh
if existsInRoot
then do
obj <- fmap (const hsh) <$> fetchVCObject hsh
pure $ Just $ Right obj
else do
existsInRemoved <- liftIO $ doesFileExist $ storePath </> "removed" </> show hsh
if existsInRemoved
then do
obj <- fmap (const hsh) <$> fetchRemovedVCObject hsh
pure $ Just $ Left obj
else pure Nothing

getAllHeads :: (VCStoreLogM env m, VCStoreEnvM env m) => m [VCObjectHash]
getAllHeads = do
Expand Down
Loading

0 comments on commit 77bb919

Please sign in to comment.