Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documented time bounds are incorrect for persistent use #104

Open
treeowl opened this issue Nov 23, 2021 · 3 comments
Open

Documented time bounds are incorrect for persistent use #104

treeowl opened this issue Nov 23, 2021 · 3 comments

Comments

@treeowl
Copy link

treeowl commented Nov 23, 2021

The usual assumption for pure data structures in Haskell is that the documented time bounds may be amortized, but that amortization must hold up under persistent use. That is not true for dlist. snoc and head are both documented as being O(1), and yet the following program runs far longer than I would ever want to wait. Swapping in Data.Sequence for DList lets this run quickly.

import qualified Data.DList as D
import Data.DList (DList)
import Data.Foldable

blob :: DList Int
blob = foldl' D.snoc mempty [1..10^7]

main :: IO ()
main = print $ sum [ D.head (blob `D.snoc` a) | a <- [1..10^6 :: Int]]

I think the best approach is likely to document that DLists are intended to be used in an affine/linear fashion, and that the bounds are only valid in that context.

@treeowl
Copy link
Author

treeowl commented Nov 23, 2021

There's a lot of relevant discussion in Reflection Without Remorse: Revealing a hidden sequence to speed up monadic reflection, though the paper goes in a different direction.

@treeowl
Copy link
Author

treeowl commented Nov 23, 2021

Hrm... affine use is putting too strong a point on it. It's perfectly fine to concat . replicate n a DList. But they should generally only be "inspected" once.

Anton-Latukha added a commit to Anton-Latukha/acc that referenced this issue Nov 23, 2021
Decided to recap the `[]` notoriety.

"Monoidal construction" still needs a better desctiption, wanted to put `NFData`
& `deepseq` mention - but I think if you accept this version - you would know
much better how to redact it.

"Exponential performance degradation" needs to be explained, for example through
spl/dlist#104.

It would be nice to mention that `fromList` of `Acc` is slower then `Seq`,
because `Acc` expects to be used as main monoid-stage datatype, so takes `[]`
content & optimizes it for monoidal construction.
Anton-Latukha added a commit to Anton-Latukha/acc that referenced this issue Nov 23, 2021
Decided to recap the `[]` notoriety.

"Monoidal construction" still needs a better desription - wanted to put `NFData`
& `deepseq` mention - but I think if you accept this version - you would know
much better how to redact it.

"Exponential performance degradation" needs to be explained, for example through
spl/dlist#104 & additionally when it reveals itself
may be added.

It would be nice to mention why `fromList` of `Acc` is slower then `Seq`,
something along the lines:
"because `Acc` expects to be used as main monoid-stage datatype, so takes `[]`
content & optimizes structure `Acc` monoidal construction."
@Anton-Latukha
Copy link

Anton-Latukha commented Dec 17, 2021

Yep.

To unwrap example:

(also decreased magnitude one time, so it is faster/easier to play & get code responses)

import qualified Data.DList as D
import Data.DList (DList)
import qualified Data.Sequence as S
import Data.Sequence (Seq)
import Data.Foldable

-- DList
blobD :: DList Int
blobD = foldl1 (D.snoc) mempty [1..10^6]

mainD :: IO ()
mainD = print $ sum [ D.head (D.snoc blobD a) | a <- [1..10^5 :: Int]]

-- Seq
blobS :: Seq Int
blobS = foldl1 (S.|>) mempty [1..10^6]

mainS :: IO ()
mainS = print $ fmap sum [ S.take 1 (blobS S.|> a) | a <- [1..10^5 :: Int]]

The First would do not terminate in meaningful time, but the second terminates fast.

foldr would be the same.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants