Documented time bounds are incorrect for persistent use #104

treeowl · 2021-11-23T00:41:45Z

The usual assumption for pure data structures in Haskell is that the documented time bounds may be amortized, but that amortization must hold up under persistent use. That is not true for dlist. snoc and head are both documented as being O(1), and yet the following program runs far longer than I would ever want to wait. Swapping in Data.Sequence for DList lets this run quickly.

import qualified Data.DList as D
import Data.DList (DList)
import Data.Foldable

blob :: DList Int
blob = foldl' D.snoc mempty [1..10^7]

main :: IO ()
main = print $ sum [ D.head (blob `D.snoc` a) | a <- [1..10^6 :: Int]]

I think the best approach is likely to document that DLists are intended to be used in an affine/linear fashion, and that the bounds are only valid in that context.

The text was updated successfully, but these errors were encountered:

treeowl · 2021-11-23T00:45:27Z

There's a lot of relevant discussion in Reflection Without Remorse: Revealing a hidden sequence to speed up monadic reflection, though the paper goes in a different direction.

treeowl · 2021-11-23T00:48:11Z

Hrm... affine use is putting too strong a point on it. It's perfectly fine to concat . replicate n a DList. But they should generally only be "inspected" once.

Decided to recap the `[]` notoriety. "Monoidal construction" still needs a better desctiption, wanted to put `NFData` & `deepseq` mention - but I think if you accept this version - you would know much better how to redact it. "Exponential performance degradation" needs to be explained, for example through spl/dlist#104. It would be nice to mention that `fromList` of `Acc` is slower then `Seq`, because `Acc` expects to be used as main monoid-stage datatype, so takes `[]` content & optimizes it for monoidal construction.

Decided to recap the `[]` notoriety. "Monoidal construction" still needs a better desription - wanted to put `NFData` & `deepseq` mention - but I think if you accept this version - you would know much better how to redact it. "Exponential performance degradation" needs to be explained, for example through spl/dlist#104 & additionally when it reveals itself may be added. It would be nice to mention why `fromList` of `Acc` is slower then `Seq`, something along the lines: "because `Acc` expects to be used as main monoid-stage datatype, so takes `[]` content & optimizes structure `Acc` monoidal construction."

Anton-Latukha · 2021-12-17T16:51:53Z

Yep.

To unwrap example:

(also decreased magnitude one time, so it is faster/easier to play & get code responses)

import qualified Data.DList as D
import Data.DList (DList)
import qualified Data.Sequence as S
import Data.Sequence (Seq)
import Data.Foldable

-- DList
blobD :: DList Int
blobD = foldl1 (D.snoc) mempty [1..10^6]

mainD :: IO ()
mainD = print $ sum [ D.head (D.snoc blobD a) | a <- [1..10^5 :: Int]]

-- Seq
blobS :: Seq Int
blobS = foldl1 (S.|>) mempty [1..10^6]

mainS :: IO ()
mainS = print $ fmap sum [ S.take 1 (blobS S.|> a) | a <- [1..10^5 :: Int]]

The First would do not terminate in meaningful time, but the second terminates fast.

foldr would be the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documented time bounds are incorrect for persistent use #104

Documented time bounds are incorrect for persistent use #104

treeowl commented Nov 23, 2021

treeowl commented Nov 23, 2021

treeowl commented Nov 23, 2021

Anton-Latukha commented Dec 17, 2021 •

edited

Loading

Documented time bounds are incorrect for persistent use #104

Documented time bounds are incorrect for persistent use #104

Comments

treeowl commented Nov 23, 2021

treeowl commented Nov 23, 2021

treeowl commented Nov 23, 2021

Anton-Latukha commented Dec 17, 2021 • edited Loading

Anton-Latukha commented Dec 17, 2021 •

edited

Loading