Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the Lazy basic implementation much faster... #2

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

GordonBGood
Copy link

@GordonBGood GordonBGood commented Jan 29, 2017

The current JavaScript Native "memoize" function works by returning a nested forcer function when the outer "memoize" function is called to initialize the representation of the Lazy type with the thunk argument so that the evaluated data will later be stored as a local variable within the outer "memoize" function's scope when the returned inner forcer function is called. This scheme of bulding a new nested inner function "on-the-fly" every time a new Lazy is initialized is very costly in execution time when run on many current main stream browsers such as Chrome 55: http://jsperf.com/iifes-vs-nested-functions/4.

In order to speed this up for some browsers, the generic Lazy type is changed to an Elm Tagged Union so that the force function can call the Native JS "memoize" function on that type without needing to create any new functions. so the code was tuned to use an Immediately Invoked Function Expression (IIFE) in order to produce somewhat better speed for some browsers, such as current Chrome.

As an added benefit, the Native JS "memoize" function can check for recursive evaluation of the thunk and throw an error early rather than waiting for a stack overflow on the infinite loop with very little cost
in execution time.

This change affects the Lazy a generic type, the lazy type constructor function, and the force function as well as the Native JS "memoize" function; however, it makes no changes to the API of the Lazy
library module.

This speed can be verified by timing the enumeration of the following simple natural number lazy list sequence producing function as per the following code:

type List a = Node a (Lazy (List a))

nat32s : () -> List Int
nat32s() =
  let nat32si n =
    Node n << lazy <| \() -> nat32si (n + 1)
  in nat32si 0

nthnat32 : Int -> String
nthnat32 n =
  let nthnati n nts =
    case nts of
      Node _ tl ->
        if n <= 0 then 
          case nts of
            Node hd _ -> toString hd else
        nthnati (n - 1) (force tl)
  in nthnati n (nat32s())

When run using the (latest) proposed changes compared to with the original code, running nthnat32 999999 (one million iterations), this test code is faster by about up to two and a half times for Chrome 55 Stable, but doesn't have much of an affect on Firefox 51 Stable or Microsoft Edge 38.14393.0.0, It is suspected that the problem with Chrome may be related to cache/heap/garbage collection, as the older version would get increasingly slower with extended repeated use of the Lazy type. This new version tends to better maintain the same performance. However, use the Lazy on Chrome is still slower than for Firefox, which is considerably slower than Edge, so there is still a considerable range of performance dependent on browser used of about three times.

As well, documentation is corrected for the erroneous andThen' function example which could not be compiled due to errors and a note added for potential implementers of a Lazy LazyList a' type that there is a very high execution time performance cost to adding the extra Lazy' wrapper and implementing cons' and append' as in this example which is required to avoid "The Halting Problem" when the resulting Lazy (LazyList a)` type is used to build infinite lazy lists..

Finally, a Lazy version of fix is provided (extending the API), that can by used to create recursion when passed a function of the accepted form; this has an advantage over the usual implementation of fix for strict strongly-typed languages such as Elm in that it uses lazy/force in its implementation and thus enjoys the benefits of recursive evaluation checks introduced with this PR. On the other hand, it cannot be used for ordinary recursion with an included recursion limit as ordinary fix can be (and thus building stack), as that will be automatically detected and an exception thrown if it is tried with this function. If one needs this behavior, a conventional fix should be used, although the code where that form of fix is used can often be written to not require fix at all (conventional function TCO'd recursive loops).

GordonBGood and others added 3 commits January 30, 2017 00:15
The current JavaScript Native "memoize" function works by returning a
nested forcer function when the outer "memoize" function is called to
initialize the representation of the `Lazy' type with the thunk argument
so that the evaluated data will later be stored as a local variable
within the outer "memoize" function's scope when the returned inner
forcer function is called.  This scheme of bulding a new nested inner
function "on-the-fly" every time a new `Lazy' is initialized is very
costly in execution time when run on many current main stream browsers
such as Chome 55:  http://jsperf.com/iifes-vs-nested-functions/4.

In order to speed this up, the generic `Lazy' type is changed to an Elm
Tagged Union so that the `force' function can call the Native JS
"memoize" function on that type without needing to create any new
functions.

As an added benefit, the Native JS "memoize" function can check for
recursive evaluation of the thunk and throw an error early rather than
waiting for a stack overflow on the infinite loop with very little cost
in execution time.

This change affects the `Lazy a' generic type, the `lazy' type
constructor function, and the `force' function as well as the Native JS
"memoize" function; howeverk it makes no changes to the API of the Lazy
library module.

This speed can be verified by timing the enumeration of the following
simple natural number lazy list sequence producing function as per the
following code:

type List a = Node a (Lazy (List a))

nat32s : () -> List Int
nat32s() =
let nat32si n =
Node n << lazy <| \() -> nat32si (n + 1)
in nat32si 0

nthnat32 : Int -> String
nthnat32 n =
let nthnati n nts =
case nts of
Node _ tl ->
if n <= 0 then
case nts of
Node hd _ -> toString hd else
nthnati (n - 1) (force tl)
in nthnati n (nat32s())

When run using the proposed changes compared to with the original code,
running `nthnat32 999999' (one million iterations) is faster by about
six times for Chrome 55 Stable, about four times for Firefox 51 Stable,
and about two times faster for Microsoft Edge 38.14393.0.0, with all
three browsers having about the same overall exectution time after the
change (Chrome was formerly considerably slower than the others).

The version number has only been upped to 2.1.0 although this is a very
significant change as there are no changes to the API.
Make the Lazy basic implementation much faster...
Once the Lazy type is based on a Tagged Union, it is then advantageous to have a `lazyFromValue` function added to the API so that when an immediately evaluated value that does not need to be deferred is desired, one can create it with this function.  The result is that one level of extra function call to `force` the evaluation is not required when this function is used.
Corrected example and simplified documentation to reflect a simple use case.
The previous example was incorrect as to use of arguments for the `cons` function and naming of an argument and non-matching braces for the `append' function; the example code as also been tuned with an inner "wrapper" to reduce the number of arguments.

A note was also added to remind users that this required way of implementing lazy lists inside a `Lazy' wrapper isn't as efficient as non-wrapped due to the many levels of function calls including force/thunk/lazy function calls/compositions required, but that an implementation such as this is required to avoid "The Halting Problem" for creating and use of infinite lazy lists inside a `Lazy' wrapper.
elm-package.json Outdated
@@ -1,5 +1,4 @@
{
"version": "2.0.0",
Copy link

@mgold mgold Feb 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert this. Version bumps are part of the release process, not the pull request process.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't realize.

Reverted.

@mgold
Copy link

mgold commented Feb 1, 2017

It's only faster because you're cheating.

> import Lazy
> x = Lazy.lazy (\_ -> 42)
Unevaluated <function> : Lazy.Lazy number
> y = Lazy.force x
42 : number
> x
Evaluated 42 : Lazy.Lazy number

Notice how the value of x is mutated by side effect of forcing it. That's not allowed. The logic behind the current implementation is that you still hold on to a function after calling force, the only thing you can do with a function is call it, and the only change memorization makes is to returned the cached value if one is available (which is safe because of purity, which this breaks).

Continuing in the same session:

> y
42 : number
> x
Evaluated 42 : Lazy.Lazy number
> y
42 : number
> x
42 : number
> x
Evaluated 42 : Lazy.Lazy number

Not just the value but the type of x fluctuates. This is broken.

I didn't realize where versions were set.
Calling JavaScript Native functions as necessary to support the JavaScript Closure Pattern used to hide mutation from Elm, while at the same time allowing the lazyFromValue call and early recursive evaluation detection (throwing an exception if detected).
@mgold
Copy link

mgold commented Feb 1, 2017

I was going to ask where the native functions you call in 0b72fe5 are defined, but just running elm make finds a compiler error.

Using the JavaScript Closure pattern to hide internal mutation, while as the same time detecting recursive evaluation (failing with an exception if detected) and allowing for the use of the lazyFromValue function to avoid unnecessary deferred evaluation for simple values.

Using the "memoize" function from within a Immediately Invoked Function Expression (IIFE) helped with execution speed, in particular on Chrome V8 after many recursions where it seems that some cache/storage overflows and what optimization there is fails.
@GordonBGood
Copy link
Author

@mgold, putting it another way: the original code is slow because it is generating a function inside another function, which is remarkably slow on some browsers such as current Chrome V8 based browsers. I am trying to avoid that, but perhaps the method I tried breaks the rules.

For the first phase of your test, that is expected behavior: yes, the Native "memoize" function does mutate the `Lazy' Tagged Union from Unevaluated "thunk function" into Evaluated "Type", but that is still the same Type, just a different Tag and I thought that might be allowed, but I can see now that it still appears as a mutation

As to the second phase of your test, I can't repeat that and I can't see how it could possibly change type for "x" as the only mutation through native JS code occurs when `Lazy.force' is called and that is never called (again) in your second phase testing.

You are right that the type should never change and I would never have submitted the PR if I could see that it did.

I must admit defeat in finding any other way than the JS Closure pattern to implement the mutation while hiding the mutation from Elm, so use of `Lazy' in tight loops is going to be slow on some browsers until those browsers are optimized for this pattern. I did find that wrapping the Native JS "memoize" function in an Immediately Invoked Function Expression (IIFE) helped performance, particularly on Chrome where without it long running loops slowed down an additional about two and a half times. I have also retained the recursive evaluation detection ability in the code so at least that is some improvement (it doesn't cost significant time compared to the creation of function objects).

So back to using the JS Closure pattern...

Running the same tests, as follows:
Phase I:

> x = Lazy.lazy <| \() -> 42
Lazy <function> : Lazy.Lazy number
> y = Lazy.force x
42 : number
> x
Lazy <function> : Lazy.Lazy number

Phase II in the same session:

> y
42 : number
> x
Lazy <function> : Lazy.Lazy number
> y
42 : number
> x
Lazy <function> : Lazy.Lazy number
> x
Lazy <function> : Lazy.Lazy number

Execution time is back to the same (slow) speed as the original code: acceptable on Edge, but not good on Firefox and poor on Chrome (although now no longer extremely poor for long running loops).

I have done a commit on my branch to the code as described above.

@mgold
Copy link

mgold commented Feb 1, 2017

I know I saw the "x is a number now" bug more than once, but I can't reproduce it now either.

@GordonBGood
Copy link
Author

@mgold , well, that code where Elm could see mutation is gone now anyway, so as far as I can tell, everything works as expected now.

I don't know that I can do much better than this and get reasonable performance across all browsers, as some improvements that I can make that help Edge somewhat affect Firefox negatively a little bit and greatly affect Chrome such that the ratio in speed between Edge and Chrome becomes over twenty times!!!

@GordonBGood
Copy link
Author

But wait just a bit: I am thinking that this fairly major change may as well include increasing the API slightly more to include lazy versions of the sharing fix point combinator and the (non-sharing) Y-combinator. These are very simple to write (although perhaps hard for some to understand ;) ) functions that are useful in implementing recursion, with fix using the same input and output (sharing) and ycomb not (non-sharing). They can be written as follows:

fix : (a -> a) -> a
fix f =
  let r = lazy <| \_ -> f <| force r in force r -- sharing

and

ycomb : (a -> a) -> a
ycomb f =
  let r = lazy <| \() -> f <| ycomb f in force r -- non-sharing

I think they belong in this library because I use the `Lazy'ified thunk rather than a simple thunk so that they can enjoy the benefits of the (new) recursive evaluation checking feature; they are also a handy tool that can be used to inject laziness into recursive deferred operations like, say, lazy lists.

Of course I have full documentation headers including examples of the use of each.

What do you think, do they belong here?

Make the inner forcer function as fast as possible when the value has already been evaluated by re-ordering the conditional checks.
@GordonBGood
Copy link
Author

@mgold, Regarding the addition of fix and ycomb, I have been working on that at forgot that in order to code this way, the compiler would have to be able to have type contexts and forall, which it does not have and may never have as it is perhaps too complex for the intended users.

The problem is fixable by changing the type signatures to: (Lazy a -> a) -> a and coding the functions accordingly, but although that makes fixusable and fast, theycomb` is very slow even when one gets it to work.

The 'fix` function is the most useful as it provides recursive variables as mentioned in Evan's article: https://github.com/elm-lang/elm-compiler/blob/0.18.0/hints/bad-recursion.md, but in this case with the recursive evaluation check, I recommend that this function by included in the API, since there are times when there is no other way to use this pattern (or write it one's self)

For ycomb there are other ways to write code to achieve the same ends, and although I could provide a Z-combinator, that is also not that useful as one can accomplish this too by other means. Therefore, my recommendation is that ycomb not be included in the API.

If you agree, I will shortly do a commit adding fix to this API, fully tested and working will documentation and a good example.

@mgold
Copy link

mgold commented Feb 2, 2017

...this is way too much detail and Evan is not going to ask you to open a new PR.

This function is a version that uses `Lazy' in its implementation so that it enjoys the recursive evaluation checks introduced with the previous changes.  It forms a recursion on a binding, which can also otherwise be done with a simple anonymous function but without the checks.

This version does not require a recursion limit condition internal to the function argument as the deferred execution ensures that it can not run away indefinitely.
@GordonBGood
Copy link
Author

@mgold, the fix is in, and I think I'm done here unless there are problems or requested changes.

Standardize the formatting.
@mgold
Copy link

mgold commented Feb 22, 2017

I can't merge elm-lang stuff, only elm-community stuff. Sorry to lead you on.

@GordonBGood
Copy link
Author

@mgold, so wait for Evan?

@GordonBGood
Copy link
Author

@mgold, this Pull Request turns out to be reasonably minor; the major Plull Request is elm-community/lazy-list. Perhaps you are confused between the two?

@mgold
Copy link

mgold commented Feb 23, 2017

Yes, wait for Evan here. Your lazy-list PR is on my todo list.

Missed a correction to the type such that the main file will not compile.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants