-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the Lazy basic implementation much faster... #2
base: master
Are you sure you want to change the base?
Conversation
The current JavaScript Native "memoize" function works by returning a nested forcer function when the outer "memoize" function is called to initialize the representation of the `Lazy' type with the thunk argument so that the evaluated data will later be stored as a local variable within the outer "memoize" function's scope when the returned inner forcer function is called. This scheme of bulding a new nested inner function "on-the-fly" every time a new `Lazy' is initialized is very costly in execution time when run on many current main stream browsers such as Chome 55: http://jsperf.com/iifes-vs-nested-functions/4. In order to speed this up, the generic `Lazy' type is changed to an Elm Tagged Union so that the `force' function can call the Native JS "memoize" function on that type without needing to create any new functions. As an added benefit, the Native JS "memoize" function can check for recursive evaluation of the thunk and throw an error early rather than waiting for a stack overflow on the infinite loop with very little cost in execution time. This change affects the `Lazy a' generic type, the `lazy' type constructor function, and the `force' function as well as the Native JS "memoize" function; howeverk it makes no changes to the API of the Lazy library module. This speed can be verified by timing the enumeration of the following simple natural number lazy list sequence producing function as per the following code: type List a = Node a (Lazy (List a)) nat32s : () -> List Int nat32s() = let nat32si n = Node n << lazy <| \() -> nat32si (n + 1) in nat32si 0 nthnat32 : Int -> String nthnat32 n = let nthnati n nts = case nts of Node _ tl -> if n <= 0 then case nts of Node hd _ -> toString hd else nthnati (n - 1) (force tl) in nthnati n (nat32s()) When run using the proposed changes compared to with the original code, running `nthnat32 999999' (one million iterations) is faster by about six times for Chrome 55 Stable, about four times for Firefox 51 Stable, and about two times faster for Microsoft Edge 38.14393.0.0, with all three browsers having about the same overall exectution time after the change (Chrome was formerly considerably slower than the others). The version number has only been upped to 2.1.0 although this is a very significant change as there are no changes to the API.
Make the Lazy basic implementation much faster...
Once the Lazy type is based on a Tagged Union, it is then advantageous to have a `lazyFromValue` function added to the API so that when an immediately evaluated value that does not need to be deferred is desired, one can create it with this function. The result is that one level of extra function call to `force` the evaluation is not required when this function is used.
Corrected example and simplified documentation to reflect a simple use case.
The previous example was incorrect as to use of arguments for the `cons` function and naming of an argument and non-matching braces for the `append' function; the example code as also been tuned with an inner "wrapper" to reduce the number of arguments. A note was also added to remind users that this required way of implementing lazy lists inside a `Lazy' wrapper isn't as efficient as non-wrapped due to the many levels of function calls including force/thunk/lazy function calls/compositions required, but that an implementation such as this is required to avoid "The Halting Problem" for creating and use of infinite lazy lists inside a `Lazy' wrapper.
elm-package.json
Outdated
@@ -1,5 +1,4 @@ | |||
{ | |||
"version": "2.0.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please revert this. Version bumps are part of the release process, not the pull request process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't realize.
Reverted.
It's only faster because you're cheating.
Notice how the value of Continuing in the same session:
Not just the value but the type of |
I didn't realize where versions were set.
Calling JavaScript Native functions as necessary to support the JavaScript Closure Pattern used to hide mutation from Elm, while at the same time allowing the lazyFromValue call and early recursive evaluation detection (throwing an exception if detected).
I was going to ask where the native functions you call in 0b72fe5 are defined, but just running |
Using the JavaScript Closure pattern to hide internal mutation, while as the same time detecting recursive evaluation (failing with an exception if detected) and allowing for the use of the lazyFromValue function to avoid unnecessary deferred evaluation for simple values. Using the "memoize" function from within a Immediately Invoked Function Expression (IIFE) helped with execution speed, in particular on Chrome V8 after many recursions where it seems that some cache/storage overflows and what optimization there is fails.
@mgold, putting it another way: the original code is slow because it is generating a function inside another function, which is remarkably slow on some browsers such as current Chrome V8 based browsers. I am trying to avoid that, but perhaps the method I tried breaks the rules. For the first phase of your test, that is expected behavior: yes, the Native "memoize" function does mutate the `Lazy' Tagged Union from Unevaluated "thunk function" into Evaluated "Type", but that is still the same Type, just a different Tag and I thought that might be allowed, but I can see now that it still appears as a mutation As to the second phase of your test, I can't repeat that and I can't see how it could possibly change type for "x" as the only mutation through native JS code occurs when `Lazy.force' is called and that is never called (again) in your second phase testing. You are right that the type should never change and I would never have submitted the PR if I could see that it did. I must admit defeat in finding any other way than the JS Closure pattern to implement the mutation while hiding the mutation from Elm, so use of `Lazy' in tight loops is going to be slow on some browsers until those browsers are optimized for this pattern. I did find that wrapping the Native JS "memoize" function in an Immediately Invoked Function Expression (IIFE) helped performance, particularly on Chrome where without it long running loops slowed down an additional about two and a half times. I have also retained the recursive evaluation detection ability in the code so at least that is some improvement (it doesn't cost significant time compared to the creation of function objects). So back to using the JS Closure pattern... Running the same tests, as follows:
Phase II in the same session:
Execution time is back to the same (slow) speed as the original code: acceptable on Edge, but not good on Firefox and poor on Chrome (although now no longer extremely poor for long running loops). I have done a commit on my branch to the code as described above. |
I know I saw the "x is a number now" bug more than once, but I can't reproduce it now either. |
@mgold , well, that code where Elm could see mutation is gone now anyway, so as far as I can tell, everything works as expected now. I don't know that I can do much better than this and get reasonable performance across all browsers, as some improvements that I can make that help Edge somewhat affect Firefox negatively a little bit and greatly affect Chrome such that the ratio in speed between Edge and Chrome becomes over twenty times!!! |
But wait just a bit: I am thinking that this fairly major change may as well include increasing the API slightly more to include lazy versions of the sharing fix point combinator and the (non-sharing) Y-combinator. These are very simple to write (although perhaps hard for some to understand ;) ) functions that are useful in implementing recursion, with
and
I think they belong in this library because I use the `Lazy'ified thunk rather than a simple thunk so that they can enjoy the benefits of the (new) recursive evaluation checking feature; they are also a handy tool that can be used to inject laziness into recursive deferred operations like, say, lazy lists. Of course I have full documentation headers including examples of the use of each. What do you think, do they belong here? |
Make the inner forcer function as fast as possible when the value has already been evaluated by re-ordering the conditional checks.
@mgold, Regarding the addition of The problem is fixable by changing the type signatures to: The 'fix` function is the most useful as it provides recursive variables as mentioned in Evan's article: https://github.com/elm-lang/elm-compiler/blob/0.18.0/hints/bad-recursion.md, but in this case with the recursive evaluation check, I recommend that this function by included in the API, since there are times when there is no other way to use this pattern (or write it one's self) For If you agree, I will shortly do a commit adding |
...this is way too much detail and Evan is not going to ask you to open a new PR. |
This function is a version that uses `Lazy' in its implementation so that it enjoys the recursive evaluation checks introduced with the previous changes. It forms a recursion on a binding, which can also otherwise be done with a simple anonymous function but without the checks. This version does not require a recursion limit condition internal to the function argument as the deferred execution ensures that it can not run away indefinitely.
@mgold, the |
Standardize the formatting.
I can't merge elm-lang stuff, only elm-community stuff. Sorry to lead you on. |
@mgold, so wait for Evan? |
@mgold, this Pull Request turns out to be reasonably minor; the major Plull Request is elm-community/lazy-list. Perhaps you are confused between the two? |
Yes, wait for Evan here. Your lazy-list PR is on my todo list. |
Missed a correction to the type such that the main file will not compile.
The current JavaScript Native "memoize" function works by returning a nested forcer function when the outer "memoize" function is called to initialize the representation of the
Lazy
type with the thunk argument so that the evaluated data will later be stored as a local variable within the outer "memoize" function's scope when the returned inner forcer function is called. This scheme of bulding a new nested inner function "on-the-fly" every time a newLazy
is initialized is very costly in execution time when run on many current main stream browsers such as Chrome 55: http://jsperf.com/iifes-vs-nested-functions/4.In order to speed this up for some browsers,
the genericso the code was tuned to use an Immediately Invoked Function Expression (IIFE) in order to produce somewhat better speed for some browsers, such as current Chrome.Lazy
type is changed to an Elm Tagged Union so that theforce
function can call the Native JS "memoize" function on that type without needing to create any new functions.As an added benefit, the Native JS "memoize" function can check for recursive evaluation of the thunk and throw an error early rather than waiting for a stack overflow on the infinite loop with very little cost
in execution time.
This change affects
thetheLazy a
generic type,lazy
type constructor function, and theforce
function as well as the Native JS "memoize" function; however, it makes no changes to the API of the Lazylibrary module.
This speed can be verified by timing the enumeration of the following simple natural number lazy list sequence producing function as per the following code:
When run using the (latest) proposed changes compared to with the original code, running
nthnat32 999999
(one million iterations), this test code is faster by about up to two and a half times for Chrome 55 Stable, but doesn't have much of an affect on Firefox 51 Stable or Microsoft Edge 38.14393.0.0, It is suspected that the problem with Chrome may be related to cache/heap/garbage collection, as the older version would get increasingly slower with extended repeated use of theLazy
type. This new version tends to better maintain the same performance. However, use theLazy
on Chrome is still slower than for Firefox, which is considerably slower than Edge, so there is still a considerable range of performance dependent on browser used of about three times.As well, documentation is corrected for the erroneous
andThen' function example which could not be compiled due to errors and a note added for potential implementers of a
Lazy LazyList a' type that there is a very high execution time performance cost to adding the extraLazy' wrapper and implementing
cons' andappend' as in this example which is required to avoid "The Halting Problem" when the resulting
Lazy (LazyList a)` type is used to build infinite lazy lists..Finally, a
Lazy
version offix
is provided (extending the API), that can by used to create recursion when passed a function of the accepted form; this has an advantage over the usual implementation offix
for strict strongly-typed languages such as Elm in that it useslazy
/force
in its implementation and thus enjoys the benefits of recursive evaluation checks introduced with this PR. On the other hand, it cannot be used for ordinary recursion with an included recursion limit as ordinaryfix
can be (and thus building stack), as that will be automatically detected and an exception thrown if it is tried with this function. If one needs this behavior, a conventionalfix
should be used, although the code where that form offix
is used can often be written to not requirefix
at all (conventional function TCO'd recursive loops).