Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LogitScaler transformer #929

Closed
frances-h opened this issue Jan 13, 2025 · 0 comments · Fixed by #933
Closed

Add LogitScaler transformer #929

frances-h opened this issue Jan 13, 2025 · 0 comments · Fixed by #933
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@frances-h
Copy link
Contributor

frances-h commented Jan 13, 2025

Problem Description

Currently, the ScalarRange, ScalarInequality, Positive, and Negative constraints all scale the data using log-based transformers. Since the SDV already enforces min/max values by simpler means, these constraints can be deprecated. However, it would be helpful to move the scaling logic into the RDT library, so that it may be used independently of the constraints in the future.

Expected behavior

Create a new RDT called LogitScaler. This RDT transforms the data by applying a logit function. Its functionality is equivalent to the ScalarRange constraint.

Parameters

  • missing_value_replacement (object): Same as FloatFormatter
  • missing_value_generation (str or None): Same as FloatFormatter
  • min_value (float): The min value for the logit function. Defaults to 0.
  • max_value (float): The max value for the logit function. Defaults to 1.0.
  • learn_rounding_scheme (bool): Same as FloatFormatter

Implementation Notes

  • Apply similar logic that is used in ScalarRange
  • During fit:
    • Learn needed information about missing values and rounding scheme.
    • Validate that we can actually apply the logit function to the data.
  • During transform:
    • Fill in missing values
    • Transform the data
      • Use the logit function to transform the data from range [min_value, max_value] into range (-inf, +inf)
    • If there are any issues with taking the logit, raise a descriptive error explaining what wen wrong:
      • Example: Error: Unable to apply the logit function to column 'output-value' due to an out-of-range value (101.0).

Intended usage: A user would call update_transformers to replace the FloatFormatter with the LogitScaler for any columns that would benefit from this.

synthesizer.auto_assign_transformers(data)
synthesizer.update_transformers({
  'output-value': LogitScaler(min_value=0.0, max_value=100.0)
})
synthesizer.fit(data)
@frances-h frances-h added the feature request Request for a new feature label Jan 13, 2025
@npatki npatki changed the title Add LogitScalar transformer Add LogitScaler transformer Jan 13, 2025
@frances-h frances-h changed the title Add LogitScaler transformer Add LogitScalar transformer Jan 13, 2025
@frances-h frances-h changed the title Add LogitScalar transformer Add LogitScaler transformer Jan 13, 2025
@frances-h frances-h added this to the 1.13.3 milestone Jan 22, 2025
@frances-h frances-h self-assigned this Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant