Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify worker, re-write sp_load_draws #132

Merged
merged 7 commits into from
Jul 22, 2024
Merged

Conversation

WardBrian
Copy link
Collaborator

This is based on #120 and is my set of suggestions/changes before a merge:

  1. Per @magland's request I updated the sp_load_draws script to use stanio. Per the rest of our discussion, I also made the relevant changes to have the numpy array of the draws be the 'primary' storage used.
  2. I took a pass at the worker, with the intended effect being to make it less stateful:
    a. There is no "mode" any more - since we use stanio in both the data and analysis, a lot of the loading code could be homogenized
    b. Instead, the run method accepts a settings object which currently contains two flags: showsPlots and producesData. This lets us treat the worker more like a job server that doesn't care as much about how it is called, and makes it easier to support future features (for example, someone definitely might want plots alongside their data)

@WardBrian
Copy link
Collaborator Author

Here's a script that shows off the better get function using the disease transmission example:

print(draws)
import numpy as np
import matplotlib.pyplot as plt

pred_cases = draws.get("pred_cases")

colors = ['red', 'blue', 'green', 'purple']

for i, chain in enumerate(pred_cases):
    for draw in chain:
        plt.plot(np.arange(0,14), draw, color=colors[i], alpha=0.03)

plt.show()

@WardBrian
Copy link
Collaborator Author

Another contribution: This PR now makes it so images are sent back to the user as they render, rather than all at the end of the script.

Test with this:

import numpy as np
import matplotlib.pyplot as plt
import time

print("before")

plt.plot(np.arange(0,10), np.random.randn(10))
plt.show()

print("after, sleeping")

time.sleep(10)
print("done")

Copy link
Collaborator

@jsoules jsoules left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me--I think you simplify several things. I also like the ability to display images on the fly.

gui/src/app/pyodide/pyodideWorker/pyodideWorker.ts Outdated Show resolved Hide resolved
@magland
Copy link
Collaborator

magland commented Jul 17, 2024

This looks great!
To make sure this works with data generation I incorporated your changes into #127 and made some minor adjustments, which we can discuss there if you are okay with that.

@WardBrian
Copy link
Collaborator Author

This gets a bit tricky with separating out/reviewing different PRs, but that seems fine. Am I correct that the primary changes are a few extra warnings and adding the loadsDraws setting, rather than relying on the definedness of spDraws? Both seem like good ideas to me.

A few other things I've found while playing around:

  1. It seems like pyodide does not actually store state between calls to pyodide.runCode, so there is no reason not to have the actual pyodide worker be a singleton. The benefit of doing this is that the main pyodide download/load only needs to happen once between the data and the analysis.
  2. Supporting termination of the running script is a useful feature, but would unfortunately require an entire re-initialize/re-download, due to how terminate works

@WardBrian
Copy link
Collaborator Author

Hi @magland - my latest commit on this branch adds a custom hook to track the pyodide worker. This is closer to something that would allow the global singleton (and does allow cancelling, though currently there is no UI to do so), but doesn't go all the way until the data version is also implemented.

At this point I hit a snag related to some of the complexity in the useStanSampler machinery, so I'm turning to a refactor of that now

@magland
Copy link
Collaborator

magland commented Jul 17, 2024

Hi @magland - my latest commit on this branch adds a custom hook to track the pyodide worker. This is closer to something that would allow the global singleton (and does allow cancelling, though currently there is no UI to do so), but doesn't go all the way until the data version is also implemented.

Sounds good. I'll see if I can incorporate that into the data branch.

Do you think we should merge #127 into this one, or keep them separate for now?

At this point I hit a snag related to some of the complexity in the useStanSampler machinery, so I'm turning to a refactor of that now

I also was having difficulty with StanSampler and was thinking of proposing a refactor. I wanted to explore caching of draws in the case of non-undefined seed. I think we should store the draws in the (ephemeral part of) project state and have the sampler use dispatch events to set the draws and other pieces of output data. Not sure if that needs to be done here though.

@WardBrian WardBrian merged commit 07749af into analysis-py Jul 22, 2024
@WardBrian WardBrian deleted the bmw/analysis-py branch July 23, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants