Skip to content

Releases: mad-lab-fau/tpcp

0.17.0 - Parallel Fixes

24 Mar 14:31
Compare
Choose a tag to compare

[0.17.0] - 2023-03-24

Added

  • We now have a workaround for global configuration that should be passed to worker processes when using
    multiprocessing.
    This is a workaround to a joblib issue and is quite hacky.
    If you want to use this feature with your own configs you can use tpcp.parallel.register_global_parallel_callback.
    If you need to write your own parallel loop using joblib, you need to use tpcp.parallel.delayed instead of
    joblib.delayed.
    (#65)

v0.16.0 - OptunaSearch `eval_str_paras` feature

21 Mar 15:17
Compare
Choose a tag to compare

[0.16.0] - 2023-03-21

Changed

  • We are now raising an explicit ValidationError, if any of the parameters of a class have a trailing underscore, as
    this syntax is reserved for result objects.
    (#63)

Added

  • The Optuna search methods have new parameter called eval_str_paras that allows to automatically turn categorical
    string parameters into python objects.
    This can be usefull, if you want to select between complex objects and not just strings in your parameter search.
    To use this in your subclasses, you need to wrap the use of trial.params with self.sanitize_params(trial.params).
    (#64)

v0.15.0

07 Feb 09:45
Compare
Choose a tag to compare

[0.15.0] - 2023-02-07

Added

  • GridSearch and GridSearchCV now have the option to pick the parameters with the lowest score if desired.
    This is useful, if your metric represents an error and you want to pick the parameters that minimize the error.
    To do that, you can set the return_optimized parameter of these classes to the name of metric prefixed with a -.
    (e.g. return_optimized="-rmse").
    (#61)
  • A new Optimization Algorithm called OptunaSearch. This is a (nearly) drop-in replacement for GridSearch using
    Optuna under the hood.
    It can be used to quickly implement parameter searches with different samplers for non-optimizable algorithms.
    (#57)

Changed

  • In this release we added multiple safe guards against edge cases related to non-deterministic dataset indices.
    Most of these changes are internal and should not require any changes to your code.
    Still, they don't solve all edge cases. Make sure your index is deterministic ;)
    (#62)
    • The index of datasets objects are now cached
      The first time create_index is called, the index is stored in subset_index and used for subsequent calls.
      This should avoid the overhead of creating the index every time (in particular if the index creation requires IO).
      It should also help to avoid edge cases, where create_index is called multiple times and returns different results.
    • When create_index of a dataset is called, we actually call it twice now, to check if the index is deterministic.
      Having a non-deterministic index can lead to hard to debug issues, so we want to make sure that this is not the case.
      It could still be that the index changes when using a different machine/OS (which is not ideal for reproducibility),
      but this should prevent most cases leading to strange issues.
    • Internally, the _optimize_and_score method now directly gets the subset of the dataset, instead of the indices of
      the train and test set.
      This should again help to avoid issues, where the index of the dataset changes between calculating the splits and
      actually retrieving the data.

v0.14.0

01 Feb 10:45
Compare
Choose a tag to compare

[0.14.0] - 2023-02-01

Added

  • Custom Aggregators can now use the RETURN_RAW_SCORES class variable to specify, if their raw input scores should be
    returned.
    (#58)

Fixed

  • GridSearch and GridSearchCV now correctly handle custom aggregators that return scores with new names.
    (#58)
  • When using the create_group_labels method on dataset with multiple groupby columns, the method returned a list of
    tuples.
    This caused issues with GroupKFold, as the method internally flattens the list of tuples.
    To avoid this, the method now return a list of strings.
    The respective string is simply the string representation of the tuple that was returned before.
    (#59)
  • The fix provided in 0.12.1 to fix hashing of objects defined in the __main__ module was only partially working.
    When the object in question was nested in another object, the hashing would still fail.
    This is hopefully now fixed for good.
    (#60)

v0.13.0 - JOSS Paper

11 Jan 08:34
Compare
Choose a tag to compare

[0.13.0] - 2023-01-11

Changed

  • Some improvements to the documentation

Added

  • Added an option to the optuna search to use multiprocessing using the suggestions made in
    optuna/optuna#2862 .
    This has not been extensively tested in real projects.
    Therefore, use with care and please report any issues you encounter.

Deprecated

  • Fully deprecated the _skip_validation parameter for base classes, which was briefly used for some old versions.

v0.12.2

14 Dec 18:57
Compare
Choose a tag to compare

Fixed

  • The previous for fixing hashing of objects defined in the __main__ module was not working
    This should now be fixed.

v0.12.1

14 Dec 18:05
Compare
Choose a tag to compare

Changed

  • The safe_run method did unintentionally double-wrap the run method, if it already had a make_action_safe
    decorator. This is now fixed.

Fixed

  • Under certain conditions hashing of an object defined in the __main__ module failed.
    This release implements a workaround for this issue, that should hopefully resolve most cases.

v0.12.0 - Some minor quality of life improvements

15 Nov 14:25
Compare
Choose a tag to compare

Added

  • Added the concept of the self_optimize_with_info method that can be implemented instead or in addition to the
    self_optimize method.
    This method should be used when an optimize method requires to return/output additional information besides the main
    result and is supported by the Optimize wrapper.
    (#49)
  • Added a new method called __clone_param__ that gives a class control over how params are cloned.
    This can be helpful, if for some reason objects don't behave well with deepcopy.
  • Added a new method called __repr_parameters__ that gives a class control over how params are represented.
    This can be used to customize the representation of individual parameters in the __repr__ method.
  • Add proper repr for CloneFactory

v0.11.0

17 Oct 07:41
Compare
Choose a tag to compare

[0.11.0] - 2022-10-17

Added

  • Support for Optuna >3.0
  • Example on how to use attrs and dataclass with tpcp
  • Added versions for Dataset and CustomOptunaOptimize that work with dataclasses and attrs.
  • Added first class support for composite objects (e.g. objects that need a list of other objects as parameters).
    This is basically sklearn pipelines with fewer restrictions (#48).

Changed

  • CustomOptunaOptimize now expects a callable to define the study, instead of taking a study object itself.
    This ensures that the study objects can be independent when the class is called as part of cross_validate.
  • Parameters are only validated when get_params is called. This reduces the reliance on __init_subclass__ and that
    we correctly wrap the init.
    This makes it possible to easier support attrs and dataclass

v0.10.0

09 Sep 08:57
Compare
Choose a tag to compare

[0.10.0] - 2022-09-09

Changed

  • Reworked once again when and how annotations for tpcp classes are processed.
    Processing is now delayed until you are actually using the annotations (i.e. as part of the "safe wrappers").
    The only user facing change is that the chance of running into edge cases is lower and that __field_annotations__ is
    now only available on class instances and not the class itself anymore.