Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The Scan confusion #29

Open
dmitriz opened this issue May 17, 2017 · 12 comments
Open

The Scan confusion #29

dmitriz opened this issue May 17, 2017 · 12 comments

Comments

@dmitriz
Copy link
Contributor

dmitriz commented May 17, 2017

The scan function seems to be both common and useful,
but it also caused me some confusions like here and here that I'd like to clear if possible.

Here is my current understanding (appreciate any correction):

  • The scan function in flyd is impure, strictly speaking. The simplest possible example is
const s = flyd.stream()
const s1 = flyd.scan(`add`, 1, s)
s1() //=> 1
s(1)
const s2 = flyd.scan(`add`, 1, s)
s2() //=> 2

So the result depends on the time of calling the scan.

  • I would like to run the same example with Hareactive, but at the moment it is not quite clear to me what would be the best way. From the scan tests I see that a sinkStream is used to create a "ProducerStream", then scan is followed the .at() evaluation, and then by subscribe, whose role is not quite clear to me, in particular, whether it turns stream from pending into active, like other libraries do. Then events need to be published. I wonder if there were any more direct way similar to the above.

  • It can be seen as composition of truncating the stream events between two moments into array (forgetting the times) and subsequent reduce. The latter part is pure. The impurity comes from the implicit dependence on the first moment (the moment when the scan was called). It is still pure in the second moment, which is the current time.

  • The scan becomes pure when applied to the so-called "cold" (I find "pending" more descriptive) streams, the ones that had not received any value yet. This is how they do it in MostJS. Any of their method "activating" the stream, transforms it into Promise, after which methods like scan are not allowed. That way scan remains pure.

  • Applying scan only to the pending streams is the most common use case, as e.g. @JAForbes was suggesting in Allow optional seed in Stream.scan? MithrilJS/mithril.js#1822 . Such as the action stream is passed to scan before at the initialisation time, whereas the actions begin to arrive after. This fact is also confirmed by the absence of tests in the non-pending cases, for instance, note how in https://github.com/paldepind/flyd/blob/master/test/index.js#L426 the stream is always created empty.

  • The 2 scan methods here are pure, however, they differ from other libraries in which they return the more complex types of either Behavior<Stream<A>> or Behavior<Behavior<A>>.

The implementation (as always) varies among libraries:

  • flyd (and mithril/stream) allows scan on any stream and returns stream

  • MostJS scan regards all Streams as "cold", with the additional mosj-subject to use with active streams, however, the purity is lost in that case.

  • Xstream does not have scan, it seems to be replaced with the fold which "Combines events from the past throughout the entire execution of the input stream". That seems to solve the purity problem for them, but may not be as convenient to use.

  • KefirJS https://rpominov.github.io/kefir/#scan and BaconJS https://baconjs.github.io/api.html let their scan to transform stream into
    what they call "property", which I understand is the same as Behavior here.
    I am not familiar with details but they seem to talk about "active", so possibly they way is similar to MostJS.

  • The RxJS makes the seed argument optional. Which, however, presents problems if it is of different type than the accumulator. (They only demonstrate the simple addition case, where the types are equal.)
    The same is in KefirJS

Possible suggestions:

  • Change the name to something like pureScan to emphasise both the difference and the motivation for the additional complexity, and to avoid the confusion with the other scans.

  • I would like to have some safe and simple variant. Like stream and behavior in one thing, what Xstream calls the Memory Stream. So I know that both stream and behavior would conform to this memoryStream interface and I don't have to think which one is which. It may be derived from other more refined versions, but I would like to have it for the sake of convenience.

  • A new MemoryStream interface might be a great entry point to build adapters for other libraries as it would accept all streams, promises, behaviours, properties and even observables. So people can use their old code with the api they understand, which is great.

  • A new Pending interface could be combined with Streams, Behaviors, or MemoryStreams. It would allow the use the "unlifted" version of scan, while preserving the purity.

  • The scan function for the Pending interface could be called something like "pendingScan" to emphasise its use case. It would only apply to pending memory streams, in its pure form. Its impure brother can be called "impureScan" and would apply to any memory stream, but with "impure" in it's name, it is no more the library's responsibility :)

  • The reducer would gets called and the initialisation time (when the stream is not pending) and when the events are emitted.

Let me know what you think.

@paldepind
Copy link
Member

I'm not really a fan of MemoryStream. It doesn't do anything that a behavior can't. And having to make a distinction between a behavior and a stream is a feature 😉

@dmitriz
Copy link
Contributor Author

dmitriz commented May 24, 2017

That sounds like a long departure from flyd :)

I can see that you may consider it a less cleaner abstraction,
but I think it may help saving a few methods and reducing the code complexity.
I've thought more as a convenience type derived from the others.

But I am sure you have thought it well through, so I might be wrong.

@paldepind
Copy link
Member

Flyd is still there for people who want to use that 😄

A key feature in Hareactive is that both stream and behavior have a precise and simple semantics. All operations can be understood in terms of the semantics. I can't think of semantics for MemoryStream that doesn't basically make it a behavior.

@dmitriz
Copy link
Contributor Author

dmitriz commented Jun 1, 2017

I had some more thoughts and found some conceptual difficulties when seeing streams as lists of time-value pairs. When you consider a pure function f : a -> b, you have the ability to pass values of type a when running code or tests, and the referential transparency means that the value is uniquely determined by the argument.

Those features seem to be lacking for the streams when seen as complete lists of time-value pairs. You can't pass it as value because you don't know the future part of the list. Also you don't want your pure function to depend on the past events, which means the value of f is determined by only present and future. That is a stronger property than being determined by the entire list (including past).

On the other hand, these problems seem to disappear if you regard streams as channels that can produce values at moments in the future. You can pass such channel as value to a function. And the function can be seen as set of instructions what to do when a value appears in the channel. Then the purity can be stated in terms of those instructions applied to the same values. That is strictly stronger property than simply being determined by the complete list. For instance, two lists producing the same values at different moments have to be considered different, but the instructions can be the same, and you can still test for the same results.

Also you can't pass as value the entire behavior as function of time. You can only pass the channel that will keep updating its value from that moment. And you look for the stronger purity property that the resulting behavior at any moment is determined by the input values only up to that moment.

And the scan seems to become pure in that model, because you never need to touch the past. The scan instruction tells us to use the initial value provided, and whenever a new value appears in the channel, to run the reducer and update the resulting behavior. If you pass the same initial value and the channel produces the same sequence of values, then the scan returns the same value.

If that works, it might be a simpler model where the scan takes just a stream channel and returns a behavior channel.

@paldepind
Copy link
Member

I'm not sure I understand the problem nor the solution. A stream is only a list in the semantic sense. Considering a stream a list gives us a mental model for understanding streams and the operators on it.

On the other hand, these problems seem to disappear if you regard streams as channels that can produce values at moments in the future.

That is pretty much how a stream is implemented. But it not a precise semantic.

@dmitriz
Copy link
Contributor Author

dmitriz commented Jun 1, 2017

I'm not sure I understand the problem nor the solution. A stream is only a list in the semantic sense. Considering a stream a list gives us a mental model for understanding streams and the operators on it.

The problem is to understand the signature:

https://github.com/funkia/hareactive#scansa-bfn-a-a-b-b--b-startingvalue-b-stream-streama-behaviorstreamb

Here the last parameter is the value of type stream: stream: Stream<A>, but what does it mean exactly? As the list of values, it is not available ahead of time, so how can you pass it as argument?

You can only pass the channel, not the list, so maybe the channel should be the argument type?
You say, it is not precise, but I don't see why.

Another problem is the return value, for which you are getting Behavior<Stream<B>> . But with the channel model, you get back the new stream channel, which is simpler. You don't need to keep the time parameter anymore. Instead you get this simplified signature:

scanS<A, B>(fn: (a: A, b: B) => B, startingValue: B, stream: Stream<A>): Stream<B>

Here both streams are just the channels without any list semantics.
And it is very convenient for writing tests. Because you can call the scan, then send values into the argument channel and test for correct values in the returned channel.

Is there any problem with this approach?

@limemloh
Copy link
Member

limemloh commented Jun 1, 2017

On the other hand, these problems seem to disappear if you regard streams as channels that can produce values at moments in the future.

This sounds very much like a Stream in hareactive.

Those features seem to be lacking for the streams when seen as complete lists of time-value pairs.

Assuming that by complete list, you mean a finite list, The semantics of Stream does not require the list to be finite.

You can only pass the channel, not the list, so maybe the channel should be the argument type?

If I understand your channel correct, a definition would be a pair of a timestamp and a list of occurrences occurring after the timestamp.

(Time, [(Time, Value), ...])

With this definition, you should be able to make a pure scan taking a stream and returning a stream. It would maybe make the scan look more simple but at the cost of adding complexity to the stream semantics.

@paldepind
Copy link
Member

paldepind commented Jun 1, 2017

@dmitriz

You say, it is not precise, but I don't see why.

Because, what is a channel? A stream is clearly defined. It's a list of pairs with time and value. Such a thing can be understood precisely and mathematically. And from this semantics, we can explain all the other operators. For instance, the semantics of map is:

const map = (fn, stream) => stream.map(([time, value]) => ([time, fn(value)]);

What is the semantics of a channel? What is the semantics of map?

You don't need to keep the time parameter anymore. Instead you get this simplified signature:

I don't understand how that signature can be pure.

And it is very convenient for writing tests.

There is no problem testing scan. It is supported by our declarative testing. Here is how you can currently test scan.

@limemloh

With this definition, you should be able to make a pure scan taking a stream and returning a stream.

The semantic makes sense. But, I don't think you could make scan return a stream directly in that case either. Because, what value would you return as the start time in the pair in scan? You would want to return the current time, but you can't do that in a pure way. So instead you'd have to return the start time of the stream you receieved, but then to make the scan satisfy the semantics you'd have to remember the entire history of the stream.

@dmitriz
Copy link
Contributor Author

dmitriz commented Jun 1, 2017

@paldepind

Because, what is a channel? A stream is clearly defined. It's a list of pairs with time and value. Such a thing can be understood precisely and mathematically. And from this semantics, we can explain all the other operators. For instance, the semantics of map is:
const map = (fn, stream) => stream.map(([time, value]) => ([time, fn(value)]);
What is the semantics of a channel? What is the semantics of map?

I mean the channel mostly in the sense it is used in Go https://gobyexample.com/channels or Javascript https://github.com/ubolonton/js-csp, inspired by Go, without any advanced functionality.

Precisely a stream channel is a variable that can receive values at the future moments. The map applies the given function to any value that channel receives. Whenever the argument channel receives x, the return channel receives f(x) at the same moment.

It is similar to the list of pairs, except it is more realistic because it is closer to how you use the streams. The list model is simpler but not as realistic, since you never have that list in practice, only parts of it.

You don't need to keep the time parameter anymore. Instead you get this simplified signature:

I don't understand how that signature can be pure.

In the sense that the values of the return channel are given as pure functions
of the initial value and the sequence of values the input channel received.
Is it not pure?

There is no problem testing scan. It is supported by our declarative testing. Here is how you can currently test scan.

I am not entirely sure how the methods there are working precisely.

The semantic makes sense. But, I don't think you could make scan return a stream directly in that case either. Because, what value would you return as the start time in the pair in scan? You would want to return the current time, but you can't do that in a pure way. So instead you'd have to return the start time of the stream you receieved, but then to make the scan satisfy the semantics you'd have to remember the entire history of the stream.

With the channel model you don't need to return the time, only value at the right time.
That is the advantage over the list model I think.

@paldepind
Copy link
Member

paldepind commented Jun 1, 2017

@dmitriz

I think we are talking past each other. What you're describing as channels doesn't sound any different from what we call streams. You're just explaining it in natural language.

Precisely a stream channel is a variable that can receive values at the future moments. The map applies the given function to any value that channel receives. Whenever the argument channel receives x, the return channel receives f(x) at the same moment.

That is not precise in the way I mean. Our semantics for streams is precise because it explains streams by mapping them to a corresponding mathematical object. The semantics for streams uses lists, pairs and time (real numbers). All of these are mathematical objects. map can be described as a pure mathematical function on this mathematical object.

If what I mean doesn't make sense then you may want to take a look at this great article Semantic Design, the section in my blog post titled "Thinking like Conal" or maybe this talk by Conal Elliott.

@dmitriz
Copy link
Contributor Author

dmitriz commented Jun 4, 2017

@paldepind

I had to think how to define it mathematically.
Ironically, my experience as professional mathematician
talking to programmers was mostly in the opposite direction --
making statements "too precise" can run me into troubles
to come across as too dry or irrelevant or impractical. ;)

I surely love the simplicity of the classical FRP approach,
where all primitives are provided with simple mathematical models,
but as you know and even mention in your blog, that simple model
has some problems and limitations leaving room for the "leaks".

Another issue was, the idea of the channel (or signal) was not really mine,
so I was simply referring to it rather than trying to define from scratch.
And it is true that most definitions out there are neither functional nor pure.

So if I am to try to define it mathematically,
I would put a time-value list into the IO monad (which is what I called the "instruction") to avoid the impurity.
Then our channel is the IO operation which, when executed at the time moment T,
returns a time-value list [(time, value)] with time <= T.
And for different moments T1, T2, both results should match
when filtered with time <= min(T1, T2).
Of course, that assumes some knowledge of the current time,
which is another IO operation, let us call it IO Time.

Now let me try to define the scan for such channel.
First, we need to forget the past,
that is to build a new dependent channel
which does not hold any events before the current time.
Second, we can run the reduce on the new channel up to the given time as usual.
For the first part, we need to record the time by accessing the IO Time operation,
and that might present some problem indeed.
At this stage, I am not sure if that can be made a complete mathematical definition,
but wanted to state it here for the record.
Also it is quite possibly written in some more details somewhere...

However, I have to admit, that model would suffer from extra complexity that I don't particularly like myself. Which also makes me like more the classical FRP model, even though it presents some bigger mental shift away from the traditional RP (without the "F").

Let me try to explain what I mean here. The Conal's FRP model requires us to think at once about the whole lifetime of our time-changing value as function of time. That is very natural when you compare with how the moving objects are described in Physics but different from the RP way of handling the streams and observables. Here I'd find the analogy with Physics the easiest way to explain as most people learned it in school.

But with streams, we are getting some additional confusion. As @limemloh mentioned, the Stream is not required to be a finite list. However, I presume the list is still required to be locally finite, that is having finite intersection any finite time interval. On the other hand, finite lists may suffice to define all the same methods and would make our model simpler. In any case, it would be good to include the finiteness aspect in the definition to avoid this confusion.

Another confusion is the difference in scan signatures between the blog post (where it returns Behavior<B>) and the Hareactive, where the two scans returning Behavior<Stream<B>> and Behavior<Behavior<B>>.

As a matter of disclaimer, please don't take it as any critics in any way.
I love your blog and the recent additions and improvement in clarity in the Hareactive readme! 👍
This is just my own interpretation of things and some feedback for the future records.

  • In the scan implementation in the blog post, the time-value list is only truncated from above, not from below. That seems to represent only the scan applied before any event in the stream? Otherwise, we would want to skip the past of the stream, right? Of course, you can also define the scan directly returning the stream of all partial reduces, whose stepper would be the current definition.

  • The stateful scanS in Hareactive is taking values in Behavior<Stream<B>>. Is the behavior argument here marking the initial time when the scan is applied? That is, is it used to truncate the list from below, before the reduce is applied? And the reason is to treat the time of applying the scan in a pure way, without nailing down the time of application?

  • The same question for the other scan. The description does not seem to take into account the possible application of scan after some events from the stream?

The simplest way I can see it, you have two basic operations here -- truncation and reduce. You need two time moments T1 and T2 to truncate the stream sequence of the time-value pairs. And once truncated, you run the usual reduce for the list (which must be finite at this stage).

For each of the moments T1 and T2, you have two choices: arbitrary time or one of the times in our list. With both choices from the list, you get Stream<Stream<A>>, which is the list of the triples (t1, t2, value). Then each discrete t can be replaced by the continuous t via the stepper method. Those will give the 3 other return types: Behavior<Stream<A>>, Stream<Behavior<A>>, and Behavior<Behavior<A>>. That seems to cover all possibilities without any specific choice. It would be nice to relate these to the Hareactive scans.

Hope you find it useful. I will write more feedback on the other parts as this post is already getting long :)

@dmitriz dmitriz mentioned this issue Jun 4, 2017
@paldepind
Copy link
Member

I think your model for Channel is interesting. I actually don't think the thing is very far from what Stream is in Hareactive.

First, we need to forget the past, that is to build a new dependent channel which does not hold any events before the current time.

Yes. Exactly. We need to forget the past in scan. That is why scan returns a Bavior<Behavior<B>> in Hareactive. Semantically that is a function from two points in time. (t1: Time) => (t2: Time) => B. The first point in time t1 is the "cut-off" point. Everything before that is what we forget. So we don't need IO Time because we already have Behavior to represent something that depends on time.

Let me try to explain what I mean here. The Conal's FRP model requires us to think at once about the whole lifetime of our time-changing value as function of time. That is very natural when you compare with how the moving objects are described in Physics but different from the RP way of handling the streams and observables. Here I'd find the analogy with Physics the easiest way to explain as most people learned it in school.

At the semantic level, the whole lifetime is included. For instance, something like map does really affects the whole timeline of the behavior it's called on. But when we program with behaviors we don't have to think about the entire lifetime.

I think the analogy to physics is very good 😄. The way moving objects are described in physics is identical to what a behavior is in FRP. We can even implement something similar to what people learned in school in Hareactive:

const physics = go(function* () {
  const acceleration = Behavior.of(1); // constant behavior
  const velocity = yield sample(integrate(acceleration));
  const position = yield sample(integrate(velocity));
});

integrate integrates a behavior of numbers. Since it's stateful it returns a Behavior<Behavior<number>>. integrate uses the Euler method to compute the integrals. So it's not the most precise thing in the world. But it's decent and we might be able to replace it with better approximations later.

I presume the list is still required to be locally finite, that is having finite intersection any finite time interval.

I've never thought about that. But yes, that is true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants