Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document somewhere that order of parameters matters for reproducibility #61

Closed
Saethox opened this issue Sep 18, 2023 · 3 comments
Closed
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@Saethox
Copy link
Contributor

Saethox commented Sep 18, 2023

I lost several hours trying to find out why irace wasn't being deterministic despite setting a seed, and it turns out that specifying the parameter space in a different order results in different values. Not a good combination with dictionaries that have a non-stable hash function... :/

Correct me if I'm wrong, but I don't see this mentioned anywhere in the documentation.

This is most certainly a niche problem, but given that I find this behavior not obvious, documenting it somewhere seems like a good idea.

@MLopez-Ibanez MLopez-Ibanez added enhancement New feature or request help wanted Extra attention is needed labels Sep 19, 2023
@MLopez-Ibanez
Copy link
Owner

Yes, the order of the parameters affects the order in which they are sent to the target-runner and it also affects the order in which the parameters are sampled. This is important for some applications where the target-runner expects the parameters to have a particular order and the user can specify this in the parameters table. Also, it seems unavoidable that the order of the parameters at the same level of the dependency hierarchy has an effect on the results. The question is what order should be used and the order provided by the user is as good as (perhaps better than) any other order that I can think of.

In R, named lists have a stable order. I believe iracepy uses an OrderedDict for parameters, so the order should also be stable, no?

Nevertheless, more than happy to document this behaviour. Where do you think this should be documented? We have the user-guide (vignette) and the documentation within R. I am happy to merge a pull request.

@MLopez-Ibanez
Copy link
Owner

Just to note that I do not believe this is surprising behaviour in an optimization procedure. The order (either positional or according to name) of decision variables will affect a single run of almost any optimization procedure, even some deterministic ones.

On the other hand, one may hope that it doesn't have an effect on expectation over many runs. Otherwise, it may be worth figuring out why this is the case and fixing it (or randomizing the order for each run, which will not fix the problem with single runs as the random order will depend on the initial order, but it will fix it over many runs). This is the case for mathematical programming solvers: https://pubsonline.informs.org/doi/10.1287/opre.2013.1231

@Saethox
Copy link
Contributor Author

Saethox commented Sep 19, 2023

In R, named lists have a stable order. I believe iracepy uses an OrderedDict for parameters, so the order should also be stable, no?

That's correct. Unfortunately, the default RandomState of Rust's HashMap is initialized with random keys, which results in different hash values in each execution, and a different iteration order. Easy to fix, but not so easy to identify as the issue.

Where do you think this should be documented?

Maybe a sentence on reproducibility in the documentation of the irace function, and in the user guide FAQ. Something along the lines of this:

Are experiments with irace reproducible?

An irace run with the same seed, scenario and parameters will yield identical results. Note that the order of parameters and instances matters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants