You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the ScalarRange, ScalarInequality, Positive, and Negative constraints all scale the data using log-based transformers. Since the SDV already enforces min/max values by simpler means, these constraints can be deprecated. However, it would be helpful to move the scaling logic into the RDT library, so that it may be used independently of the constraints in the future.
Expected behavior
Create a new RDT called LogScaler. This RDT transforms the data by applying a log. Its functionality is equivalent to the ScalarInequality constraint (as well as Positive and Negative).
Parameters
missing_value_replacement (object): Same as FloatFormatter
missing_value_generation (str or None): Same as FloatFormatter
constant (float): The constant to set as the 0-value for the log-based transform. Default to 0 (do not modify the 0-value of the data)
invert (bool): Whether to invert the data with respect to the constant value. If False, do not invert the data (all values will be greater than the constant value). If True, invert the data (all the values will be less than the constant value). Defaults to False.
learn_rounding_scheme (bool): Same as FloatFormatter
Implementation Notes
Apply similar logic that is used in ScalarInequality
During fit:
Learn needed information about missing values and rounding scheme.
Validate that we can actually take the log of the data.
During transform:
Fill in missing values
Transform the data
If invert is False (default): transformed_data = log(data - constant)
If invert is True: transformed_data = log(constant - data)
If there are any issues with taking the log, raise a descriptive error explaining what wen wrong:
Example: Error: Unable to apply a log transform to column 'capital-gains' due to a non-positive value (-1).
Intended usage: A user would call update_transformers to replace the FloatFormatter with the LogScaler for any columns that would benefit from this.
Problem Description
Currently, the ScalarRange, ScalarInequality, Positive, and Negative constraints all scale the data using log-based transformers. Since the SDV already enforces min/max values by simpler means, these constraints can be deprecated. However, it would be helpful to move the scaling logic into the RDT library, so that it may be used independently of the constraints in the future.
Expected behavior
Create a new RDT called
LogScaler
. This RDT transforms the data by applying a log. Its functionality is equivalent to the ScalarInequality constraint (as well as Positive and Negative).Parameters
missing_value_replacement (object)
: Same asFloatFormatter
missing_value_generation (str or None)
: Same asFloatFormatter
constant (float)
: The constant to set as the 0-value for the log-based transform. Default to 0 (do not modify the 0-value of the data)invert (bool)
: Whether to invert the data with respect to the constant value. If False, do not invert the data (all values will be greater than the constant value). If True, invert the data (all the values will be less than the constant value). Defaults to False.learn_rounding_scheme (bool)
: Same asFloatFormatter
Implementation Notes
ScalarInequality
transformed_data = log(data - constant)
transformed_data = log(constant - data)
Error: Unable to apply a log transform to column 'capital-gains' due to a non-positive value (-1).
Intended usage: A user would call
update_transformers
to replace theFloatFormatter
with theLogScaler
for any columns that would benefit from this.The text was updated successfully, but these errors were encountered: