Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: example writing multiple datasets to orso file #92

Merged
merged 3 commits into from
Oct 22, 2024

Conversation

jokasimr
Copy link
Contributor

Fixes #87

This solution is not so well integrated in the sciline pipeline.
I'd appreciate suggestions about how to do this better.

@jokasimr jokasimr marked this pull request as ready for review October 10, 2024 12:47
Comment on lines 280 to 285
for fname, quantities in files.items():
wf = workflow.copy()
wf[Filename[SampleRun]] = fname
for name, value in quantities.items():
wf[name] = value
reflectivity_curves.append(wf.compute(NormalizedIofQ))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be done using something like:

reflectivity_curves = sciline.compute_mapped(wf.map(files), NormalizedIofQ)

provided that files is setup in a way compatible with map. I suggest using a pandas.DataFrame with column names as keys for Sciline.

Copy link
Contributor Author

@jokasimr jokasimr Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can use that here. But it seems to be slightly different from the current implementation. In the current implementation the runs that are mapped over don't have to all set the same parameters. One can set the SampleRotation and another can set some other parameters. But if the parameters are defined in a table then all "runs" (rows of that table) need set the same parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do they have different parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's plausible. For example, in the reduction of data from the experiments we did in PSI one month ago there were some special settings in the low angle measurement. The high angle measurement just used the default setting.

We could of course specify a value for that setting for both runs, specifying the default value for the high angle run, but that is more cumbersome and error prone.

Comment on lines 295 to 349
datasets = []
for (fname, quantities), curve, scale_factor in zip(
files.items(), reflectivity_curves, scale_factors, strict=True
):
wf = workflow.copy()
wf[Filename[SampleRun]] = fname
for name, value in quantities.items():
wf[name] = value
wf[NormalizedIofQ] = scale_factor * curve
dataset = wf.compute(orso.OrsoIofQDataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datasets = sciline.compute_mapped(wf.map(dataframe), orso.OrsoIofQDataset)

If you setup dataframe with a NormalizedIofQ column (I am not 100% we can map non-source nodes. Edit: No we cannot, I think we should add __delitem__ as also discussed in scipp/cyclebane#14).
Not sure about the find_corrections, but I presume they are the same for all files, i.e., you can just run that once on the base workflow?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but I presume they are the same for all files, i.e., you can just run that once on the base workflow?

They are probably the same, but I don't see the benefit of making that assumption here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are literally using the same workflow, how could it be different?

Copy link
Contributor Author

@jokasimr jokasimr Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What corrections are applied could depend on the parameters of the workflow. Right now they don't but I don't think that is an implausible situation. Besides, there's no cost to rerunning it, it takes 2ms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite implausible, I thought this is for handling measurements from the same "run" at multiple angles?

But if you are convinced that this is what you need, in that case I don't have any suggestions for improvement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite implausible, I thought this is for handling measurements from the same "run" at multiple angles?

That's the main use case yes. It's wrong to say that I'm convinced this is what we need. I don't know, and therefore opted to keep it flexible, since the cost of doing so is negligible.

Copy link
Member

@SimonHeybrock SimonHeybrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing tests.

Comment on lines +155 to +245
[normalized_ioq, orso_dataset], params={Filename[SampleRun]: 'default'}
)
datasets = orso_datasets_from_measurements(
workflow,
[{}, {Filename[SampleRun]: 'special'}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not really testing what you claimed above was needed: Different sets of parameter for different runs. Yes, you pass {} for the first, but since it relies on a default value that is not saying much? For what you are testing here, you could have went with the simpler version I had suggested, simply by passing the filename for each. As it is now, what orso_dataset_from_measurements will do for the first run is a mystery for the reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be a misunderstanding here. Maybe we should talk about this in person? If you have a simpler and better solution in mind I'll be happy to use that.

You are not really testing what you claimed above was needed: Different sets of parameter for different runs.

We want to override different parameters in different runs. For example, in some run we might want to override the SampleRotation parameter, but in another run we want to use the default (meaning reading it from the file).
The intent with this test is to assert if:

  1. If no value was provided for a parameter (in this case Filename), is the default value used?
  2. If a value was provided, is that value used?

scale_to_overlap=False,
)
assert len(datasets) == 2
assert tuple(d.info.name for d in datasets) == ('default.orso', 'special.orso')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above you said orso_dataset_from_measurements should be able to perform/record possibly distinct corrections applied for each run. Here you are testing the same correction, but it retrieves a different value (the filename). Is that the same?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test any aspect of the "correction finder"-mechanism, so I'm not sure what you mean.
This only tests if the provided parameter value was used, or if the default parameter value was used.

@jokasimr jokasimr force-pushed the orso-multi-dataset branch 2 times, most recently from a8cb24d to 1860988 Compare October 22, 2024 08:07
@jokasimr jokasimr merged commit a6f14eb into main Oct 22, 2024
4 checks passed
@jokasimr jokasimr deleted the orso-multi-dataset branch October 22, 2024 09:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Functionality to write ORSO dataset from combination of several runs
2 participants