-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python analysis window #120
Conversation
Add Pyodide dependency and update AnalysisPyWindow
You can now run the following in analysis.py import pandas as pd
df = pd.DataFrame(sp_sampling.draws, columns=sp_sampling.parameter_names)
print(df) |
Here's that example as gist project http://localhost:3000/?project=https://gist.github.com/magland/8d987a04c9db7bee8636960a69ae7c7f |
There are a number of things to think about. What should the global variable (now sp_sampling) be called? Right now sp_sampling.draws has all the chains concatenated, and you also get sp_sampling.num_chains. It reflects the internal structure of the draws in the data model. Should that be adjusted? What's most intuitive for the user creating analysis.py? |
Right now the UI lets you click over to analysis.py/analysis.R when you have not yet run sampling. Moving those tabs down to the area we only show after sampling has completed seems like a good idea to prevent this |
The most intuitive will depend pretty strongly on the user and their use case, but I think there are 3 main 'types' of uses:
|
Refactor code structure for better readability
Okay I have reorganized the tabs. I do think it's important to be able to view and even edit the analysis scripts even prior to the completion of sampling. So those tabs are present. But if you try to execute the scripts before sampling is complete, you get a (helpful?) error message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions and suggested areas to tighten.
if (consoleOutputDiv) { | ||
consoleOutputDiv.innerHTML = ""; | ||
} | ||
if (imageOutputDiv) { | ||
imageOutputDiv.innerHTML = ""; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these divs don't exist at this point, isn't that a blocking problem? Like we should prevent the run if there's nowhere for it to report to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine a case (if this component is being reused somewhere else) where we might only be interested in console output (if the script is not expected to generate images) or only image output (if we don't need to show the console).
const toolbarItems: ToolbarItem[] = useMemo(() => { | ||
const ret: ToolbarItem[] = []; | ||
const runnable = fileContent === editedFileContent && imageOutputDiv; | ||
if (runnable) { | ||
ret.push({ | ||
type: "button", | ||
tooltip: "Run script", | ||
label: "Run", | ||
icon: <PlayArrow />, | ||
onClick: handleRun, | ||
color: "black", | ||
}); | ||
} | ||
if (!imageOutputDiv) { | ||
ret.push({ | ||
type: "text", | ||
label: "No output window", | ||
color: "red", | ||
}); | ||
} | ||
let label: string; | ||
let color: string; | ||
if (status === "loading") { | ||
label = "Loading pyodide..."; | ||
color = "blue"; | ||
} else if (status === "running") { | ||
label = "Running..."; | ||
color = "blue"; | ||
} else if (status === "completed") { | ||
label = "Completed"; | ||
color = "green"; | ||
} else if (status === "failed") { | ||
label = "Failed"; | ||
color = "red"; | ||
} else { | ||
label = ""; | ||
color = "black"; | ||
} | ||
|
||
if (label) { | ||
ret.push({ | ||
type: "text", | ||
label, | ||
color, | ||
}); | ||
} | ||
return ret; | ||
}, [fileContent, editedFileContent, handleRun, status, imageOutputDiv]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is also a good candidate for refactoring out and sharing among the different [data | analysis].[py | r] editors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I think we should hold off until things solidify more, since we don't yet know exactly what will be in common (or different) between the 4 cases.
export type PydodideWorkerStatus = | ||
| "idle" | ||
| "loading" | ||
| "running" | ||
| "completed" | ||
| "failed"; | ||
|
||
export const isPydodideWorkerStatus = (x: any): x is PydodideWorkerStatus => { | ||
return ["idle", "loading", "running", "completed", "failed"].includes(x); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has its own drawbacks, but defining the status as an enum allows you to pull the keys/values of the enum as a list, which might make the includes
check here more intuitive. (That said we probably aren't expanding the number of statuses any time soon, so maybe not worth worrying about?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be wrong about this, but the simplicity of string types makes serialization much easier... and in this case we're passing json messages.
gui/src/app/pages/HomePage/AnalysisPyWindow/AnalysisPyWindow.tsx
Outdated
Show resolved
Hide resolved
gui/src/app/pages/HomePage/AnalysisPyWindow/AnalysisPyWindow.tsx
Outdated
Show resolved
Hide resolved
Thinking about the interface for analysis.py to get the draws data. We want something that can be made backward compatible so that as we make improvements to the interface, existing SP Projects will continue to function, at least to a reasonable degree. The only way I can think of for doing that is to provide python imports that implement specific interfaces, and the user can select which they want to use. So here's an example analysis.py import matplotlib.pyplot as plt
# Import a specific interface
from sp_util import load_draws_v1
# Internally loads the draws from a file
# The actual implementation can change over time, but the API should be steady
draws = load_draws_v1()
# Get a list of dataframes, one for each chain
a = draws.get_dataframes()
print(a[0])
# Plot a histogram of `lp__`
plt.figure()
plt.hist(a[0]['lp__'])
plt.show() Here, by virtue of this specific import the user gets a particular interface, with the expectation that it will be maintained in a backward-compatible manner. I prefer this import method compared with injecting a mysterious variable into the mix, because python isn't supposed to work that way. I think analysis.py should be a well-defined python script that does not assume any magic variables, but assumes certain modules (e.g., sp_util) exist. In the implementation from my latest commit, the sp_util module does not come from pypi... but rather it is hard-coded in pyodide... so it's internal to the stan-playground web app. The actual sampling data (draws, parameter names, etc) are communicated behind-the-scenes by writing to a file that is available to the sp_util module. The load_draws_v1 function first loads the data from that file and then prepares an object that has the get_dataframes() method. It also has a |
I think doing semantic-versioning-by-symbol-name is pretty un-Pythonic as well. The code we provide to load the draws is the one thing we completely control, so it's the thing I'm least worried about backwards compatibility for, to be honest -- numpy or pandas updating and breaking something seems much more likely than us doing it. Especially if what we provide is an object, we can always just add more methods on it if we want newer functionality, and leave the old ones. If we ever really needed to do something backwards-incompatible, I think the better way to handle that is something like a
I think Python has enough magic in the language that this would be fine. I think the bigger argument as for why to do it by providing a global variable is that it makes it easier to implement a nice 'download this for use locally with cmdstanpy' easy -- we wouldn't need to provide a module to be imported or anything, just some simple-ish glue code which leaves behind the variable name we choose |
@WardBrian Are you suggesting the script would instead be the following, where "draws" is the special variable? import matplotlib.pyplot as plt
# Get a list of dataframes, one for each chain
a = draws.get_dataframes()
print(a[0])
# Plot a histogram of `lp__`
plt.figure()
plt.hist(a[0]['lp__'])
plt.show() |
Something like this, yes |
Implemented draws object for analysis.py based on what we had discussed. Here's a gist to try http://localhost:3000?project=https://gist.github.com/magland/d811a59035037dd0e19e73900048a8fb The analysis.py for this example, which illustrates the functionality. import numpy as np
import matplotlib.pyplot as plt
# Histograms
for pname in ['gamma', 'beta', 'phi_inv', 'lp__']:
plt.figure()
plt.hist(draws.get(pname))
plt.title(pname)
plt.show()
print('PARAMETERS')
print('==========')
for p in draws.parameter_names:
print(p)
print('')
print('PARAMETER VALUES')
print('================')
print('gamma:', draws.get('gamma'))
print('beta:', draws.get('beta'))
print('phi_inv:', draws.get('phi_inv'))
print('')
print('OTHER')
print('=====')
print('Mean of lp__:', np.mean(draws.get('lp__')))
print('Shape of y:', draws.get('y').shape)
print('')
print('DATAFRAME')
print('=========')
df = draws.as_dataframe()
print(df) Note that y is a matrix parameter, but it gets flattened to a vector. Will want to use stanio instead, as noted in the comments in the source code. @WardBrian do you want to work on this? Here's the relevant script that gets loaded into the worker and then into the pyodide environment |
Assuming the structure of this looks good... In terms of merging order, it would help a lot to merge a basic version of this to minimize conflicts with other features we are working on (e.g., data generation). And then reopen a new PR with the needed tweaks to this one. What do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few things stood out to me in the current version -
gui/src/app/pages/HomePage/AnalysisPyWindow/AnalysisPyWindow.tsx
Outdated
Show resolved
Hide resolved
Add data generation feature with Python and R support
@WardBrian Nice! I have tested it and everything seems to be working. As I mentioned I think we should do a separate PR for analysis.r and we can borrow code from #94 . LMK if you want me to work on that -- I can draft an initial version, but I will need help figuring out how to make the R draws object analogous to the python one. |
I agree that should be a separate PR -- maybe after this is merged? I'm currently trying to debug an issue with (I think) the TextEditor component - if I flip back and forth between the data.r and data.py tabs, the other text editors on the screen (e.g. It's not introduced by this PR I'm pretty sure, just this makes it noticeable Edit: This has been fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is overall in good shape. I have a few minor notes but would defer anything more substantial to a later refactor.
We should be able to get this merged in short order I think.
gui/src/app/pages/HomePage/AnalysisPyWindow/AnalysisPyWindow.tsx
Outdated
Show resolved
Hide resolved
gui/src/app/pages/HomePage/DataGenerationWindow/DataGenerationWindow.tsx
Outdated
Show resolved
Hide resolved
gui/src/app/pages/HomePage/DataGenerationWindow/DataRFileEditor.tsx
Outdated
Show resolved
Hide resolved
gui/src/app/pages/HomePage/DataGenerationWindow/getDataGenerationToolbarItems.tsx
Outdated
Show resolved
Hide resolved
Oh, one thing that would be nice that I didn't mention because it isn't strictly germane to this PR, but it would be good for us to document somewhere the expectations around the global data/memory sharing model that's used to pass data around among the workers/non-TS, non-Stan interpreters and our own app. Like, what's the data structure look like, what's the strategy. |
@WardBrian it looks like we had some simultaneous commits, doing some of the same things. Hopefully everything shakes out properly. |
Yep, I resolved any conflicts |
Ok, looks to me like we're good to merge this--everybody good with that? |
Merged code from #94
Created "Analysis (Py)" tab.
Next step: provide sampling draws as input to script