Python analysis window #120

magland · 2024-07-03T14:44:35Z

Merged code from #94

Created "Analysis (Py)" tab.

Next step: provide sampling draws as input to script

Add Pyodide dependency and update AnalysisPyWindow

magland · 2024-07-03T15:32:46Z

You can now run the following in analysis.py

import pandas as pd

df = pd.DataFrame(sp_sampling.draws, columns=sp_sampling.parameter_names)

print(df)

magland · 2024-07-03T15:35:58Z

Here's that example as gist project

http://localhost:3000/?project=https://gist.github.com/magland/8d987a04c9db7bee8636960a69ae7c7f

magland · 2024-07-03T15:40:42Z

There are a number of things to think about. What should the global variable (now sp_sampling) be called? Right now sp_sampling.draws has all the chains concatenated, and you also get sp_sampling.num_chains. It reflects the internal structure of the draws in the data model. Should that be adjusted? What's most intuitive for the user creating analysis.py?

WardBrian · 2024-07-03T15:42:32Z

Right now the UI lets you click over to analysis.py/analysis.R when you have not yet run sampling. Moving those tabs down to the area we only show after sampling has completed seems like a good idea to prevent this

WardBrian · 2024-07-03T15:51:09Z

Should that be adjusted? What's most intuitive for the user creating analysis.py?

The most intuitive will depend pretty strongly on the user and their use case, but I think there are 3 main 'types' of uses:

the user cares about the markov chains as markov chains, so having a num_samples x num_chains x num_parameters (or the transpose of this) numpy array will be the most useful for them
The user is using an existing tool which relies on pandas dataframes or arviz InferenceData, so whatever we provide should be easy to convert to these
The user cares about one specific parameter in the model, in which case providing a way to extract that one item as a numpy array of the correct shape incredibly useful. This is what cmdstanpy calls stan_variable and is implemented in stanio. Note that the reason this is tricky is when I say "parameter", I am including things like a parameter declared to be a matrix, not just one column of output

Refactor code structure for better readability

magland · 2024-07-03T16:22:53Z

Right now the UI lets you click over to analysis.py/analysis.R when you have not yet run sampling. Moving those tabs down to the area we only show after sampling has completed seems like a good idea to prevent this

Okay I have reorganized the tabs. I do think it's important to be able to view and even edit the analysis scripts even prior to the completion of sampling. So those tabs are present. But if you try to execute the scripts before sampling is complete, you get a (helpful?) error message.

jsoules

A few questions and suggested areas to tighten.

gui/src/app/Project/ProjectQueryLoading.ts

gui/src/app/pages/HomePage/HomePage.tsx

gui/src/app/pyodide/AnalysisPyFileEditor.tsx

jsoules · 2024-07-03T15:46:53Z

gui/src/app/pyodide/AnalysisPyFileEditor.tsx

+    if (consoleOutputDiv) {
+      consoleOutputDiv.innerHTML = "";
+    }
+    if (imageOutputDiv) {
+      imageOutputDiv.innerHTML = "";
+    }


If these divs don't exist at this point, isn't that a blocking problem? Like we should prevent the run if there's nowhere for it to report to?

I could imagine a case (if this component is being reused somewhere else) where we might only be interested in console output (if the script is not expected to generate images) or only image output (if we don't need to show the console).

jsoules · 2024-07-03T15:49:28Z

gui/src/app/pyodide/AnalysisPyFileEditor.tsx

+  const toolbarItems: ToolbarItem[] = useMemo(() => {
+    const ret: ToolbarItem[] = [];
+    const runnable = fileContent === editedFileContent && imageOutputDiv;
+    if (runnable) {
+      ret.push({
+        type: "button",
+        tooltip: "Run script",
+        label: "Run",
+        icon: <PlayArrow />,
+        onClick: handleRun,
+        color: "black",
+      });
+    }
+    if (!imageOutputDiv) {
+      ret.push({
+        type: "text",
+        label: "No output window",
+        color: "red",
+      });
+    }
+    let label: string;
+    let color: string;
+    if (status === "loading") {
+      label = "Loading pyodide...";
+      color = "blue";
+    } else if (status === "running") {
+      label = "Running...";
+      color = "blue";
+    } else if (status === "completed") {
+      label = "Completed";
+      color = "green";
+    } else if (status === "failed") {
+      label = "Failed";
+      color = "red";
+    } else {
+      label = "";
+      color = "black";
+    }
+
+    if (label) {
+      ret.push({
+        type: "text",
+        label,
+        color,
+      });
+    }
+    return ret;
+  }, [fileContent, editedFileContent, handleRun, status, imageOutputDiv]);


I believe this is also a good candidate for refactoring out and sharing among the different [data | analysis].[py | r] editors.

I agree, but I think we should hold off until things solidify more, since we don't yet know exactly what will be in common (or different) between the 4 cases.

jsoules · 2024-07-03T16:02:49Z

gui/src/app/pyodide/pyodideWorker/pyodideWorkerTypes.ts

+export type PydodideWorkerStatus =
+  | "idle"
+  | "loading"
+  | "running"
+  | "completed"
+  | "failed";
+
+export const isPydodideWorkerStatus = (x: any): x is PydodideWorkerStatus => {
+  return ["idle", "loading", "running", "completed", "failed"].includes(x);
+};


It has its own drawbacks, but defining the status as an enum allows you to pull the keys/values of the enum as a list, which might make the includes check here more intuitive. (That said we probably aren't expanding the number of statuses any time soon, so maybe not worth worrying about?)

I might be wrong about this, but the simplicity of string types makes serialization much easier... and in this case we're passing json messages.

gui/src/app/pages/HomePage/AnalysisPyWindow/AnalysisPyWindow.tsx

magland · 2024-07-04T01:57:31Z

Thinking about the interface for analysis.py to get the draws data.

We want something that can be made backward compatible so that as we make improvements to the interface, existing SP Projects will continue to function, at least to a reasonable degree.

The only way I can think of for doing that is to provide python imports that implement specific interfaces, and the user can select which they want to use.

So here's an example analysis.py

import matplotlib.pyplot as plt

# Import a specific interface
from sp_util import load_draws_v1

# Internally loads the draws from a file
# The actual implementation can change over time, but the API should be steady
draws = load_draws_v1()

# Get a list of dataframes, one for each chain
a = draws.get_dataframes()
print(a[0])

# Plot a histogram of `lp__`
plt.figure()
plt.hist(a[0]['lp__'])
plt.show()

Here, by virtue of this specific import the user gets a particular interface, with the expectation that it will be maintained in a backward-compatible manner.

I prefer this import method compared with injecting a mysterious variable into the mix, because python isn't supposed to work that way. I think analysis.py should be a well-defined python script that does not assume any magic variables, but assumes certain modules (e.g., sp_util) exist.

In the implementation from my latest commit, the sp_util module does not come from pypi... but rather it is hard-coded in pyodide... so it's internal to the stan-playground web app. The actual sampling data (draws, parameter names, etc) are communicated behind-the-scenes by writing to a file that is available to the sp_util module. The load_draws_v1 function first loads the data from that file and then prepares an object that has the get_dataframes() method. It also has a draws.get_dataframe_longform() method.

WardBrian · 2024-07-08T14:51:34Z

from sp_util import load_draws_v1

I think doing semantic-versioning-by-symbol-name is pretty un-Pythonic as well. The code we provide to load the draws is the one thing we completely control, so it's the thing I'm least worried about backwards compatibility for, to be honest -- numpy or pandas updating and breaking something seems much more likely than us doing it. Especially if what we provide is an object, we can always just add more methods on it if we want newer functionality, and leave the old ones.

If we ever really needed to do something backwards-incompatible, I think the better way to handle that is something like a version parameter in the meta storage/query parameters.

I prefer this import method compared with injecting a mysterious variable into the mix, because python isn't supposed to work that way.

I think Python has enough magic in the language that this would be fine. I think the bigger argument as for why to do it by providing a global variable is that it makes it easier to implement a nice 'download this for use locally with cmdstanpy' easy -- we wouldn't need to provide a module to be imported or anything, just some simple-ish glue code which leaves behind the variable name we choose

magland · 2024-07-08T16:51:19Z

@WardBrian Are you suggesting the script would instead be the following, where "draws" is the special variable?

import matplotlib.pyplot as plt

# Get a list of dataframes, one for each chain
a = draws.get_dataframes()
print(a[0])

# Plot a histogram of `lp__`
plt.figure()
plt.hist(a[0]['lp__'])
plt.show()

WardBrian · 2024-07-08T17:58:23Z

Something like this, yes

magland · 2024-07-10T16:06:22Z

@WardBrian @jsoules

Implemented draws object for analysis.py based on what we had discussed. Here's a gist to try

http://localhost:3000?project=https://gist.github.com/magland/d811a59035037dd0e19e73900048a8fb

The analysis.py for this example, which illustrates the functionality.

import numpy as np
import matplotlib.pyplot as plt

# Histograms
for pname in ['gamma', 'beta', 'phi_inv', 'lp__']:
    plt.figure()
    plt.hist(draws.get(pname))
    plt.title(pname)
    plt.show()

print('PARAMETERS')
print('==========')
for p in draws.parameter_names:
    print(p)

print('')
print('PARAMETER VALUES')
print('================')
print('gamma:', draws.get('gamma'))
print('beta:', draws.get('beta'))
print('phi_inv:', draws.get('phi_inv'))

print('')
print('OTHER')
print('=====')
print('Mean of lp__:', np.mean(draws.get('lp__')))
print('Shape of y:', draws.get('y').shape)

print('')
print('DATAFRAME')
print('=========')

df = draws.as_dataframe()
print(df)

Note that y is a matrix parameter, but it gets flattened to a vector. Will want to use stanio instead, as noted in the comments in the source code. @WardBrian do you want to work on this? Here's the relevant script that gets loaded into the worker and then into the pyodide environment

https://github.com/flatironinstitute/stan-playground/pull/120/files#diff-356c7427b0952fcfd57ddd11e46493a6624870c92e611c335ee52f142272654d

magland · 2024-07-10T16:21:56Z

Assuming the structure of this looks good... In terms of merging order, it would help a lot to merge a basic version of this to minimize conflicts with other features we are working on (e.g., data generation). And then reopen a new PR with the needed tweaks to this one. What do you think?

WardBrian

A few things stood out to me in the current version -

gui/src/app/SamplerOutputView/SamplerOutputView.tsx

gui/src/app/pages/HomePage/AnalysisPyWindow/AnalysisPyWindow.tsx

gui/public/sp_load_draws.py

gui/src/app/pyodide/pyodideWorker/pyodideWorker.ts

Add data generation feature with Python and R support

WardBrian · 2024-07-23T14:58:06Z

@jsoules @magland this PR is now updated with respect to main. I've done a basic once-over to make sure everything still works as expected, but it can now be given a more thorough review!

magland · 2024-07-23T15:13:51Z

@WardBrian Nice! I have tested it and everything seems to be working.

As I mentioned I think we should do a separate PR for analysis.r and we can borrow code from #94 . LMK if you want me to work on that -- I can draft an initial version, but I will need help figuring out how to make the R draws object analogous to the python one.

WardBrian · 2024-07-23T15:20:54Z

I agree that should be a separate PR -- maybe after this is merged?

I'm currently trying to debug an issue with (I think) the TextEditor component - if I flip back and forth between the data.r and data.py tabs, the other text editors on the screen (e.g. main.stan) flicker/re-render -- maybe due to the monaco singleton getting touched? Any ideas @magland?

It's not introduced by this PR I'm pretty sure, just this makes it noticeable

Edit: This has been fixed

jsoules

I think this is overall in good shape. I have a few minor notes but would defer anything more substantial to a later refactor.

We should be able to get this merged in short order I think.

gui/src/app/pages/HomePage/AnalysisPyWindow/AnalysisPyWindow.tsx

gui/src/app/SamplerOutputView/TracePlotsView.tsx

gui/src/app/FileEditor/TextEditor.tsx

gui/src/app/pages/HomePage/DataGenerationWindow/DataGenerationWindow.tsx

gui/src/app/pyodide/pyodideWorker/pyodideWorkerTypes.ts

gui/src/app/pages/HomePage/DataGenerationWindow/DataRFileEditor.tsx

gui/src/app/pages/HomePage/DataGenerationWindow/getDataGenerationToolbarItems.tsx

gui/src/app/pages/HomePage/DataGenerationWindow/DataPyFileEditor.tsx

jsoules · 2024-07-23T18:34:59Z

Oh, one thing that would be nice that I didn't mention because it isn't strictly germane to this PR, but it would be good for us to document somewhere the expectations around the global data/memory sharing model that's used to pass data around among the workers/non-TS, non-Stan interpreters and our own app. Like, what's the data structure look like, what's the strategy.

…an-playground into analysis-py

magland · 2024-07-23T19:09:30Z

@WardBrian it looks like we had some simultaneous commits, doing some of the same things. Hopefully everything shakes out properly.

WardBrian · 2024-07-23T19:12:15Z

Yep, I resolved any conflicts

jsoules · 2024-07-23T19:17:47Z

Ok, looks to me like we're good to merge this--everybody good with that?

magland added 2 commits July 3, 2024 10:41

Add pyodide worker and update UI to include analysis Python window

3342360

analysis.py sp_sampling variable (and formatting)

afb87e3

Add Pyodide dependency and update AnalysisPyWindow

magland added 3 commits July 3, 2024 12:02

Merge branch 'main' into analysis-py

9d2b45d

reorganize tabs for analysis scripts

e1883a7

linter fixes

84d512a

Refactor code structure for better readability

jsoules reviewed Jul 3, 2024

View reviewed changes

magland added 4 commits July 3, 2024 13:17

Refactor console output styling logic

fdbf2b2

Update query parameter key to analysis_py

de99180

For analysis.py, implement sp_util and load_draws_v1()

097b68d

Formatting

0f6b5ba

magland closed this Jul 8, 2024

magland reopened this Jul 8, 2024

magland added 2 commits July 10, 2024 11:08

Merge branch 'main' into analysis-py

48cc94a

draws object for analysis.py

f63b814

Add method as_numpy() to DrawsObject class

0cacf96

Add data generation feature with Python and R support

7a27719

magland mentioned this pull request Jul 10, 2024

Add data generation feature with Python and R support #127

Merged

WardBrian requested changes Jul 10, 2024

View reviewed changes

WardBrian added 4 commits July 22, 2024 12:40

Merge pull request #127 from flatironinstitute/data.py.2

a56fd25

Add data generation feature with Python and R support

Merge branch 'main' into analysis-py

e0d7062

Remove remaining absolute positioning

dd8763c

UseRef cleanup

8226e1f

CSS cleanup

7ba6993

TextEditor cleanup, fix flicker from unrelated editors

339c9fc

WardBrian requested a review from jsoules July 23, 2024 16:11

jsoules approved these changes Jul 23, 2024

View reviewed changes

magland and others added 7 commits July 23, 2024 14:54

Update import paths to use aliases

afe9dea

Update imports to use aliases

a45aeaa

Update to path aliases

90d6205

Replace custom accordion in traceplots with MUI

f07a5f4

Move monaco language registration

5419280

promote baseObjectCheck

3839cdb

Merge branch 'analysis-py' of https://github.com/flatironinstitute/st…

8a22ebc

…an-playground into analysis-py

Moving code and other nits

1426e7e

WardBrian approved these changes Jul 23, 2024

View reviewed changes

This was linked to issues Jul 23, 2024

Integrate data.py and data.r into the UI #104

Closed

Integrate analysis.py/analysis.R into UI #119

Closed

jsoules merged commit 0f7c84d into main Jul 23, 2024
2 checks passed

jsoules deleted the analysis-py branch July 23, 2024 19:21

This was referenced Jul 23, 2024

Integrate webr and Pyodide for providing data and initial values #57

Closed

Integrate webr and Pyodide to allow analysis of the samples #58

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python analysis window #120

Python analysis window #120

magland commented Jul 3, 2024

magland commented Jul 3, 2024

magland commented Jul 3, 2024

magland commented Jul 3, 2024

WardBrian commented Jul 3, 2024

WardBrian commented Jul 3, 2024

magland commented Jul 3, 2024 •

edited

Loading

jsoules left a comment

jsoules Jul 3, 2024

magland Jul 3, 2024

jsoules Jul 3, 2024

magland Jul 3, 2024

jsoules Jul 3, 2024

magland Jul 3, 2024

magland commented Jul 4, 2024 •

edited

Loading

WardBrian commented Jul 8, 2024

magland commented Jul 8, 2024

WardBrian commented Jul 8, 2024

magland commented Jul 10, 2024

magland commented Jul 10, 2024

WardBrian left a comment

WardBrian commented Jul 23, 2024

magland commented Jul 23, 2024

WardBrian commented Jul 23, 2024 •

edited

Loading

jsoules left a comment

jsoules commented Jul 23, 2024

magland commented Jul 23, 2024

WardBrian commented Jul 23, 2024

jsoules commented Jul 23, 2024

Python analysis window #120

Python analysis window #120

Conversation

magland commented Jul 3, 2024

magland commented Jul 3, 2024

magland commented Jul 3, 2024

magland commented Jul 3, 2024

WardBrian commented Jul 3, 2024

WardBrian commented Jul 3, 2024

magland commented Jul 3, 2024 • edited Loading

jsoules left a comment

Choose a reason for hiding this comment

jsoules Jul 3, 2024

Choose a reason for hiding this comment

magland Jul 3, 2024

Choose a reason for hiding this comment

jsoules Jul 3, 2024

Choose a reason for hiding this comment

magland Jul 3, 2024

Choose a reason for hiding this comment

jsoules Jul 3, 2024

Choose a reason for hiding this comment

magland Jul 3, 2024

Choose a reason for hiding this comment

magland commented Jul 4, 2024 • edited Loading

WardBrian commented Jul 8, 2024

magland commented Jul 8, 2024

WardBrian commented Jul 8, 2024

magland commented Jul 10, 2024

magland commented Jul 10, 2024

WardBrian left a comment

Choose a reason for hiding this comment

WardBrian commented Jul 23, 2024

magland commented Jul 23, 2024

WardBrian commented Jul 23, 2024 • edited Loading

jsoules left a comment

Choose a reason for hiding this comment

jsoules commented Jul 23, 2024

magland commented Jul 23, 2024

WardBrian commented Jul 23, 2024

jsoules commented Jul 23, 2024

magland commented Jul 3, 2024 •

edited

Loading

magland commented Jul 4, 2024 •

edited

Loading

WardBrian commented Jul 23, 2024 •

edited

Loading