Releases: mad-lab-fau/tpcp
Releases · mad-lab-fau/tpcp
0.17.0 - Parallel Fixes
[0.17.0] - 2023-03-24
Added
- We now have a workaround for global configuration that should be passed to worker processes when using
multiprocessing.
This is a workaround to a joblib issue and is quite hacky.
If you want to use this feature with your own configs you can usetpcp.parallel.register_global_parallel_callback
.
If you need to write your own parallel loop using joblib, you need to usetpcp.parallel.delayed
instead of
joblib.delayed
.
(#65)
v0.16.0 - OptunaSearch `eval_str_paras` feature
[0.16.0] - 2023-03-21
Changed
- We are now raising an explicit ValidationError, if any of the parameters of a class have a trailing underscore, as
this syntax is reserved for result objects.
(#63)
Added
- The Optuna search methods have new parameter called
eval_str_paras
that allows to automatically turn categorical
string parameters into python objects.
This can be usefull, if you want to select between complex objects and not just strings in your parameter search.
To use this in your subclasses, you need to wrap the use oftrial.params
withself.sanitize_params(trial.params)
.
(#64)
v0.15.0
[0.15.0] - 2023-02-07
Added
- GridSearch and GridSearchCV now have the option to pick the parameters with the lowest score if desired.
This is useful, if your metric represents an error and you want to pick the parameters that minimize the error.
To do that, you can set thereturn_optimized
parameter of these classes to the name of metric prefixed with a-
.
(e.g.return_optimized="-rmse"
).
(#61) - A new Optimization Algorithm called
OptunaSearch
. This is a (nearly) drop-in replacement forGridSearch
using
Optuna under the hood.
It can be used to quickly implement parameter searches with different samplers for non-optimizable algorithms.
(#57)
Changed
- In this release we added multiple safe guards against edge cases related to non-deterministic dataset indices.
Most of these changes are internal and should not require any changes to your code.
Still, they don't solve all edge cases. Make sure your index is deterministic ;)
(#62)- The index of datasets objects are now cached
The first timecreate_index
is called, the index is stored insubset_index
and used for subsequent calls.
This should avoid the overhead of creating the index every time (in particular if the index creation requires IO).
It should also help to avoid edge cases, wherecreate_index
is called multiple times and returns different results. - When
create_index
of a dataset is called, we actually call it twice now, to check if the index is deterministic.
Having a non-deterministic index can lead to hard to debug issues, so we want to make sure that this is not the case.
It could still be that the index changes when using a different machine/OS (which is not ideal for reproducibility),
but this should prevent most cases leading to strange issues. - Internally, the
_optimize_and_score
method now directly gets the subset of the dataset, instead of the indices of
the train and test set.
This should again help to avoid issues, where the index of the dataset changes between calculating the splits and
actually retrieving the data.
- The index of datasets objects are now cached
v0.14.0
[0.14.0] - 2023-02-01
Added
- Custom Aggregators can now use the
RETURN_RAW_SCORES
class variable to specify, if their raw input scores should be
returned.
(#58)
Fixed
- GridSearch and GridSearchCV now correctly handle custom aggregators that return scores with new names.
(#58) - When using the
create_group_labels
method on dataset with multiple groupby columns, the method returned a list of
tuples.
This caused issues withGroupKFold
, as the method internally flattens the list of tuples.
To avoid this, the method now return a list of strings.
The respective string is simply the string representation of the tuple that was returned before.
(#59) - The fix provided in 0.12.1 to fix hashing of objects defined in the
__main__
module was only partially working.
When the object in question was nested in another object, the hashing would still fail.
This is hopefully now fixed for good.
(#60)
v0.13.0 - JOSS Paper
[0.13.0] - 2023-01-11
Changed
- Some improvements to the documentation
Added
- Added an option to the optuna search to use multiprocessing using the suggestions made in
optuna/optuna#2862 .
This has not been extensively tested in real projects.
Therefore, use with care and please report any issues you encounter.
Deprecated
- Fully deprecated the
_skip_validation
parameter for base classes, which was briefly used for some old versions.
v0.12.2
v0.12.1
Changed
- The
safe_run
method did unintentionally double-wrap the run method, if it already had amake_action_safe
decorator. This is now fixed.
Fixed
- Under certain conditions hashing of an object defined in the
__main__
module failed.
This release implements a workaround for this issue, that should hopefully resolve most cases.
v0.12.0 - Some minor quality of life improvements
Added
- Added the concept of the
self_optimize_with_info
method that can be implemented instead or in addition to the
self_optimize
method.
This method should be used when an optimize method requires to return/output additional information besides the main
result and is supported by theOptimize
wrapper.
(#49) - Added a new method called
__clone_param__
that gives a class control over how params are cloned.
This can be helpful, if for some reason objects don't behave well with deepcopy. - Added a new method called
__repr_parameters__
that gives a class control over how params are represented.
This can be used to customize the representation of individual parameters in the__repr__
method. - Add proper repr for
CloneFactory
v0.11.0
[0.11.0] - 2022-10-17
Added
- Support for Optuna >3.0
- Example on how to use
attrs
anddataclass
with tpcp - Added versions for
Dataset
andCustomOptunaOptimize
that work with dataclasses and attrs. - Added first class support for composite objects (e.g. objects that need a list of other objects as parameters).
This is basically sklearn pipelines with fewer restrictions (#48).
Changed
CustomOptunaOptimize
now expects a callable to define the study, instead of taking a study object itself.
This ensures that the study objects can be independent when the class is called as part ofcross_validate
.- Parameters are only validated when
get_params
is called. This reduces the reliance on__init_subclass__
and that
we correctly wrap the init.
This makes it possible to easier supportattrs
anddataclass
v0.10.0
[0.10.0] - 2022-09-09
Changed
- Reworked once again when and how annotations for tpcp classes are processed.
Processing is now delayed until you are actually using the annotations (i.e. as part of the "safe wrappers").
The only user facing change is that the chance of running into edge cases is lower and that__field_annotations__
is
now only available on class instances and not the class itself anymore.