-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Idea sketch: FRP and sequential inference #183
Comments
I just wanted to add some of my thinking (which you may already have mentioned but I didn't notice). I think of a probabilistic program as a stochastic process (if we confine ourselves to programs that sample from a fixed number of times). Sequential Monte Carlo is a Markov process in which the the paths are modified by a potential function given by the data. This is really close to a program being a stochastic process. I will re-read your thoughts more carefully. Maybe this helps A sequence of random variables where And a program is a composition of Markov kernels. Actually I need to generalise the definition of a Markov kernel but you get the idea. Some more notes (Feynman-Kac measures): And some more notes: Happy to be corrected here but the maps are just translations: either you don't move or you move by which we can write as where And this is the familiar Metropolis Hastings kernel with a Gaussian random walk. Apologies for missing out some steps and not considering the case where the chain stays where it is. |
Thanks for writing this up! I'll read through it more carefully soon and respond in more detail with my ideas/plans (I have been moving this week, so things are a bit chaotic). But basically, I have been thinking on very similar lines! A few things that may or may not be relevant:
|
So in my own attempts to write a pipes based version of a similar idea, I ended up with the type Pipe observation (Population SamplerIO latent) IO () For example, I was trying to write a system which would receive a stream of Since the core monad is Is your suggestion to have a more functional and pure version of the same idea by using MSFs? Or to put it another way, can you say more about frpSir :: Monad m =>
-- | resampler
(forall x. Population m x -> Population m x) ->
-- | population size
Int ->
-- | model
MSF (Population m) a b ->
MSF (Population m) a b What does the input signal function represent? And what is the output one? Can it handle the use case described above? |
Awesome :) I think we'll arrive at something cool in the end.
Yes, arrowized FRP was designed partly to deal with some classes of leaks. One reason I don't like
The libraries I usually work with are basically streaming libraries (
Yes, I think it's possible somehow. But I ended up not being able to use
You mean, one particle does two
That's a fascinating article, but I still haven't finished reading it. Definitely a lot that can be implemented.
Yes, certainly!
Probably not, Yampa knows no monads/effects.
So does that mean that
That's the part that I don't quite understand yet. How can you "extract" the randomness to the output?
The type signature I proposed doesn't make much sense it turns out, and instead we can do the following which works: runPopulationS :: forall m a b . Monad m =>
-- | Number of particles
Int ->
-- | Resampler
(forall x . Population m x -> Population m x)
-> MSF (Population m) a b
-> MSF m a [(b, Log Double)]
runPopulationS nParticles resampler msf = runPopulationCl' $ spawn nParticles $> msf
where
runPopulationCl' :: Monad m => Population m (MSF (Population m) a b) -> MSF m a [(b, Log Double)]
runPopulationCl' msfs = MSF $ \a -> do
bAndMSFs <- runPopulation $ flip unMSF a =<< msfs
let (currentPopulation, continuations) = unzip $ (\((b, msf), weight) -> ((b, weight), (msf, weight))) <$> bAndMSFs
-- FIXME This normalizes, which introduces bias, whatever that means
return (currentPopulation, runPopulationCl' $ normalize $ resampler $ fromWeightedList $ return continuations) An What I'm doing here is something similar like you were doing: Resample every step of the MSF and output the current particle population of the output values explicitly. It works really well for me and I have a nice interactive graphical example that I'll polish and upload these days. What I don't understand is whether it is ok to Another work-in-progress question: I could simplify the type of the bayes filter: -- | Condition on one output of a distribution.
--
-- p(x,y | theta) ~> p(x | y, theta)
bayesFilter :: (MonadInfer m, Eq sensor) =>
MSF m input (sensor, latent) ->
-- | external sensor, data source
MSF m (input, sensor) latent One gets an additional input, which is used to condition the sensor. I found that in practice, this doesn't work well with particles because hard equality comparison throws away too many particles. One usually uses something called an "importance function" I believe, and you also use a normal distribution in the SMC example, instead of a hard class SoftEq a where
-- | `similarity a1 a2 == 1` if they are exactly equal, and 0 if they are completely different.
similarity :: a -> a -> Log Double
-- | Scores the similarity of the two inputs
(=~) :: MonadInfer m => a -> a -> m ()
a1 =~ a2 = score $ similarity a1 a2 Does that make sense to you? |
See https://github.com/turion/dunai-bayes and https://github.com/turion/rhine/tree/dev_monad_bayes for work in progress |
This is really great, way cooler than my pipes attempt. I'm reading through your code and in general it makes sense, but here are some things that are currently confusing me:
Btw, I did try using MMorph in monad-bayes, but there was some problem I can't remember. I think one of the |
Yes, it does. Re normalization, are you just saying that the absolute values of the weights in the population become too small? That confuses me, because in
Yep, because a particle is in fact a distribution over continuations of the program, whose fate depends on the stochastic choices made at each step. |
(I'll keep posting questions as I come up with them)
To make sure I'm understanding, wouldn't that be true in particular when |
Can we write |
By the way, the sort of application I have in mind here (and it seems like you're not that far off), is a GUI app where a user can produce observations live (e.g. by clicking points on the screen) and the system will update its posterior in real-time. One could even imagine changing the population size or resampling rate live. |
Yet another question: is there an easy way using rhine to get the current time as a signal? (I read your paper on Rhine btw, and it seems very cool). It would be cool to define a Gaussian process by just expressing the jump to the next state after time EDIT: I think this works: model4 :: forall cl m td. (Float ~ Time cl, MonadSample m) => ClSF m cl () Pos2
model4 = feedback (0,0) model4'
model4' = MSF \((), (x,y)) -> do
time <- DunaiReader.asks sinceLast -- @(TimeInfo cl) @ReaderT
x' <- normal x (float2Double (time*2))
y' <- normal y (float2Double (time*2))
let p = (x',y')
return ((p,p), model4') |
Yes, that's the right way. But you need to make sure that
No,
I had
Yes, do try, I'll try and help if it doesn't work! Btw. about naming:
Yes.
Makes sense what you're saying, yet still if I remove In fact, I don't understand this line here:
Why do we multiply with the previous mass? That way, we guarantee that the probability mass is exactly the same as before resampling. Maybe this is intended (to avoid "bias"?).
Ok, then it makes more sense to remove
Yes, I guess you're right.
I'm pretty sure that not. The issue is that we want to keep joining the population effects "in the background" over all timesteps, but
Yes, that's sort of what I wanted to do. Spawning or killing particles should be done on user input. The resampling rate is equal to the simulation speed, which is currently tied to the graphics FPS rate, but there is no a priori reason why graphics and simulation has to be synchronous. (In fact I'm not so convinced that the user should produce observations. It's super cool to have user input as measurements (or some other input like a physical sensor or so), but for the model to make sense we have to have some kind of rule how the latent value produces an observable output. In the current model, there is a fixed normal error, and we cannot expect the user to play a noisy sensor and produce roughly this noise distribution. We could put a prior on the noise, and then tell the user (as some kind of game) to try to click onto the latent position, and thus fit the noise that the user themself introduces by inaccurate clicking. Maybe this is a fun minigame. Or maybe it's too convoluted, what do you think?
Yes, that's what I think I'd try it like this: model4 :: forall cl m td. (Double ~ Time cl, MonadSample m) => ClSF m cl () Pos2
model4 = gaussianProcess &&& gaussianProcess
gaussianProcess :: forall cl m td. (Double ~ Time cl, MonadSample m) => ClSF m cl () Double
gaussianProcess = feedback (0,0) $ proc ((), x) -> do
timeSinceLast <- sinceLastS -< ()
x' <- arrMCl $ uncurry normal -< (x, (time*2))
return (x', x') Comments: Floats are stupid and I only use them because of WorkflowI think we've collected enough material so that we're sure there is something here that can work. How do we proceed now? I'd propose:
Unfortunately I don't have as much time as I'd like, I have to squeeze this project in between/after my work hours. But this is super exciting, and I'll continue whenever I can! Since you're already in it and digging, does it make sense if I keep developing |
I've made an issue, and will try when free.
Also made an issue.
I think that's intentional. I also found this confusing, but I think the point is to keep the weights from exploding/vanishing. I'm pretty sure it's correct, but we should try to work out why you have to normalize to make things work. Once I understand the FRP code more, I can hopefully work that out.
I've never used the arrow notation before, but this is a pretty good advert for it :) And thanks,
True, but on the other hand, the nice thing about being in Bayesian land is we can specify models that somewhat accommodate this. For example, we could have a model which tries to guess whether the input is coming from a sin wave with noise, or from uncorrelated random samples (an ok description of user input). Or just to do (non-parametric) polynomial regression and show the best fit. Reading sensor data live would be really nice too though. Anyway, that's all down the road... Workflow
Agreed!
Are you planning to have rhine-bayes and dunai-bayes be separate hackage packages from rhine and dunai? Am assuming yes, but just checking.
Yep, agreed.
That's pretty much where I'm at too. One of the reasons I find this exciting is that it feels like Haskell has a unique advantage at tackling this problem (because of mature FRP libraries, arrow notation, and monad-bayes) in a really declarative way.
Yes, that sounds great because a) I like writing tutorials and b) it will help me understand all the moving pieces properly, which I don't quite yet. Btw, are we envisioning that there are changes needed to monad-bayes to make this work better? Perhaps generalization from double, for example. But if there are other things, let me know. One interesting point to consider is that we could easily extend to more complex inference algorithms, e.g. doing a step of MH every so often (or even on the user's request) to rejuvenate the population. The non-reactive version of that is called RMSMC, and is basically a one-liner in monad-bayes. |
OK, closing this issue as activity is happening in dunai-bayes and rhine-bayes (see above for links). tldr: combining reactive and probabilistic programming is a great idea, and works. |
Recently @reubenharry asked whether it makes sense to combine Functional Reactive Programming (FRP), stochastic processes, and real-time inference:
And yes, I believe this makes sense. I want to try and explain the correspondence between sequential inference and arrowized FRP with effects as embodied by
dunai
/bearriver
/essence-of-live-coding
/rhine
. Comments are warmly welcome :) Hopefully, we'll end up with enough material to write a tutorial on FRP & real-time inference.CC @dataopt
Arrowized FRP with effects
MSFs
The fundamental building block of effectful AFRP is the monadic stream function:
An
MSF
runs in steps: At each step, it consumes ana
, performs a side effect inm
, outputs ab
, and continues.This is implemented in
dunai
,essence-of-live-coding
(as initial algebra though),varying
,netwire
, and probably other libraries I'm not aware of. It's not yet FRP because we have no notion of time, but let us for the sake of simplicity assume that there are discrete and evenly spaced time steps in which we will execute our program. (If that's not satisfying to you, look atrhine
and see how we recover continuity, events, classic FRP, and so on.)One can create data (by reading data from a file, a hardware sensor, a network socket, a random number generator, ...) like this:
Equally, one can consume data (by writing to a file, creating a plot, ...) with an easy to define function
arrM :: Monad m => (a -> m b) -> MSF m a b
.MSF
s form a category, that is, they can be composed:So we can add consumers and producers to our pipeline and finally arrive at a "main"
MSF
with no inputs and outputs that can be run:(Phew, this is a really terse primer, read more about all this in
dunai
oressence-of-live-coding
.)Stochastic processes
Definition: A stochastic process is an
MSF
in aMonadSample
.In other words, a stochastic process is an
MSF
where we are allowed to introduce some randomness at every step of the computation.For example, this is a Gaussian random walk (caution, I didn't typecheck or test):
(If you find the syntax
MSF $\...
daunting, have a look atdunai
, which cleans this up a lot.)Note in particular that
MSF
s can consume a value drawn from a prior (in this case the standard deviation), on every step. SinceMSF
form a category (they can be composed), you can easily build hierarchical models with prior processes, hyperprior processes and so on.Bayes filter
I'm carrying this idea around since a few years already, but since I couldn't find any write-up of it, I'll just dump it here so people can refer to it later:
Probability monads and monadic stream functions naturally give rise to Bayes filters. MSFs here play the role of the hidden Markov model.
It basically works like this:
In other words: At every step, draw a sample from the model and the measured data, condition on the observation being equal, and return the current state estimation. This is pretty neat and in principle works for simple
MonadInfer
s! But you have to be careful with naive implementations likeEnumerator
because you'll easily end up in an exponential explosion of possible paths.Termination
With an arbitrary monad
m
, there is no telling whether anMSF
will ever terminate. So let's add an effect to the stack:This is the type of
MSF
s that can, at every step, terminate, and return a resultr
. This may sound unmotivated right now, but we'll need it when we discussCoroutine
s.Coroutines
As implemented in https://hackage.haskell.org/package/monad-coroutine-0.9.2/docs/Control-Monad-Coroutine.html and used by
monad-bayes
, acoroutine
is this:A
Coroutine
decides (based on an action inm
) at every step whether it is done (returning anr
) or whether it needs to perform a further computation ins
. So it is sort of a generalization to anMSFExcept
, wheres
means "consume ana
and output ab
".Lemma
We need to understand one special case that is used in
monad-bayes
:Luckily, this
Await ()
functor is isomorphic toInOut () ()
, i.e. we have this isomorphism:So a
Sequential
is nearly the same as anMSF
, except for two things:ExceptT r m
, so it can finish at any stepMSF
. InSequential
, this has the meaning of the part of the computation that has already been performed, while the "inner" part are the computation steps that are yet to come.Remember the function
reactimate :: Monad m => MSF m () () -> m ()
to execute anMSF
? It specialises toreactimate :: Monad m => MSFExcept () () m r -> ExceptT r m ()
. The return type is isomorphic tom (Either r ())
which in our case is in principle the same asm r
.The same such function exists for
Sequential
, it is calledfinish :: Monad m => Sequential m a -> m a
.Sequential importance (re-)sampling
To summarize so far:
MonadSample m => MSFExcept a b m r
Sequential m r
This brings us to this function:
I banged my head on this one for a while because here is where the straightforward compatibility ends.
sis
does two things:n
time steps (theInt
parameter). (Question: Why not for all timesteps? After all, the information of how many timesteps there are is encoded inSequential m a
!)m
monad.sis
is the basis for all the sequential Monte Carlo methods. There is one simple strategy we can combine FRP with this:MSF
s (or, if you want some more high-level description, userhine
).Sequential m a
.sis
,sir
,smcMultinomial
, ..., feeding it a sufficiently high number of time steps (again, why do we have to do this?)This is all nice and dandy, and something similar is described for the
pipes
library in https://monad-bayes-site.netlify.app/smc, but it's not satisfying! We don't want to run the whole program once and get one result. An FRP program is an ongoing process that constantly consumes and produces data. What we actually want is to use the intermediate inference result all the time while we receive new data!The conundrum
In terms of
monad-bayes
, the type signature should be something like:Note that it returns a
Sequential
again.A more general, analogous
dunai
type signature would be:But it's surprisingly hard to implement this! Just following the types one would like to use
hoist
, but that is wrong, I believe, because it applies the resampler only on the current step, and not the whole model until then.I think part of the difficulty is that the resampler type is higher order in the effect, but I haven't been able to figure out how to make it first order yet.
Population
is a composition ofStateT
andListT
, and while theStateT
effects are all first order, I don't know how to makeListT
first order.Where to go now
I think I'll try and write a little
rhine
backend that:rhine
programs intoSequential
, so one can at least batch-run themI'll think a bit more about how particle resampling should work. If you have ideas, please feel free to let me know!
The text was updated successfully, but these errors were encountered: