Releases: sdv-dev/SDMetrics
v0.11.0 - 2023-08-10
This release adds a function that allows users to plot the cardinality of foreign and primary keys in synthetic data. More specifically, it graphs the frequency that each number of children per parent row occurs in the parent table.
Additionally, architectural changes are made to improve the efficiency and error handling of the QualityReport
! The progress bar is also enhanced to be more informative when the report is generating.
This release also adds support for Python 3.11 and drops support for Python 3.7.
New Features
- Visualize cardinality of foreign key columns - Issue #283 by @R-Palazzo
- Create single table BaseProperty class - Issue #354 by @amontanez24
- Create single table column shapes property - Issue #355 by @R-Palazzo
- Create single table column pair trends property - Issue #356 by @R-Palazzo
- Create multi table BaseProperty class - Issue #357 by @pvk-developer
- Create multi table column shapes and column pair trends properties - Issue #358 by @R-Palazzo
- Create Parent Child Relationships property class - Issue #359 by @pvk-developer
- In Multi Table Quality Report: Rename "Table Relationships" property to "Cardinality" - Issue #360 by @frances-h
- More accurate progress bar for single table Quality Report - Issue #361 by @R-Palazzo
- More accurate progress bar for multi table Quality Report - Issue #362 by @fealho
- Raise error in CorrelationSimilarity if either column is constant - Issue #407 by @fealho
Bug Fixes
- Issue in building the denormalized table inside the Parent-Child Detection metrics - Issue #328 by @fealho
- Don't modify the rounding in the quality report - Issue #401 by @R-Palazzo
- The Cardinality property is missing some relationships - Issue #404 by @pvk-developer
- The Cardinality property is not returning a DataFrame - Issue #405 by @fealho
- Overall property score should be the average across all breakdowns - Issue #415 by @amontanez24
Internal
- Use property classes in single table QualityReport - Issue #370 by @R-Palazzo
- Use property classes in multi table QualityReport - Issue #371 by @fealho
- Add add-on detection for premium metrics - Issue #388 by @amontanez24
Maintenance
- Add support for Python 3.11 - Issue #353 by @amontanez24
- Drop support for Python 3.7 - Issue #380 by @amontanez24
v0.10.1 - 2023-06-06
This release fixes a bug that was causing the DiagnosticReport
to crash on the NewRowSynthesis
metric. It also adds support for PyTorch 2.0!
Bug Fixes
- ValueError: multi-line expressions (NewRowSynthesis metric in DiagnosticReport) - Issue #327 by @R-Palazzo
Maintenance
v0.10.0 - 2023-05-03
This release makes the DiagnosticReport
more fault tolerant by preventing it from crashing if a metric it uses fails. It also adds support for Pandas 2.0!
Additionally, support for the old SDV
metadata format (pre SDV
1.0) has been dropped.
New Features
- Cleanup SDMetrics to only accept SDV 1.0 metadata format - Issue #331 by @amontanez24
- Make the diagnostic report more fault-tolerant - Issue #332 by @frances-h
Maintenance
- Remove upper bound for pandas - Issue #338 by @pvk-developer
v0.9.3 - 2023-04-12
This release improves the clarity of warning/error messages. We also add a version add-on, update the workflow to optimize the runtime and fix a bug in the NewRowSynthesis
metric when computing the synthetic_sample_size
for multi-table.
New Features
- Add functionality to find version add-on - Issue #321 by @frances-h
- More detailed warning in QualityReport when there is a constant input - Issue #316 by @pvk-developer
- Make error more informative in QualityReport when tables cannot be merged - Issue #317 by @frances-h
- More detailed warning in QualityReport for unexpected category values - Issue #315 by @frances-h
Bug Fixes
- Multi table DiagnosticReport sets synthetic_sample_size too low for NewRowSynthesis - Issue #320 by @pvk-developer
v0.9.2 - 2023-03-08
This release fixes bugs in the NewRowSynthesis
metric when too many columns were present. It also fixes bugs around datetime columns that are formatted as strings in both get_column_pair_plot
and get_column_plot
.
Bug Fixes
- Method get_column_pair_plot: Does not plot synthetic data if datetime column is formatted as a string - Issue [#310] (#310) by @frances-h
- Method get_column_plot: ValueError if a datetime column is formatted as a string - Issue #309 by @frances-h
- Fix ValueError in the NewRowSynthesis metric (also impacts DiagnosticReport) - Issue #307 by @frances-h
v0.9.1 - 2023-02-17
This release fixes bugs in the existing metrics and reports.
Bug Fixes
- Fix issue-296 for discrete and continuous columns - Issue #296 by @R-Palazzo
- Support new metadata for datetime_format - Issue #303 by @frances-h
v0.9.0 - 2023-01-18
v0.8.1 - 2022-12-09
This release fixes bugs in the existing metrics and reports. We also make the reports compatible with future SDV versions.
New Features
- Filter out additional sdtypes that will be available in future versions of SDV - Issue #265 by @katxiao
- NewRowSynthesis should ignore PrimaryKey column - Issue #260 by @katxiao
Bug Fixes
- Visualization crashes if there are metric errors - Issue #272 by @katxiao
- Score for TVComplement if synthetic data only has missing values - Issue #271 by @katxiao
- Fix 'timestamp' column metadata in the multi table demo - Issue #267 by @katxiao
- Fix 'duration' column in the single table demo - Issue #266 by @katxiao
- README.md example has a bug - Issue #262 by @katxiao
- Update README.md to fix a bug - Issue #263 by @katxiao
- Visualization get_column_pair_plot: update parameter name to column_names - Issue #258 by @katxiao
- "Column Shapes" and "Column Pair Trends" Calculation Inconsistency - Issue #254 by @katxiao
- Diagnostic Report missing RangeCoverage for numerical columns - Issue #255 by @katxiao
v0.8.0 - 2022-11-02
This release introduces the DiagnosticReport
, which helps a user verify – at a quick glance – that their data is valid. We also fix an existing bug with detection metrics.
New Features
- Fixes for new metadata - Issue #253 by @katxiao
- Add default synthetic sample size to DiagnosticReport - Issue #248 by @katxiao
- Exclude pii columns from single table metrics - Issue #245 by @katxiao
- Accept both old and new metadata - Issue #244 by @katxiao
- Address Diagnostic Report and metric edge cases - Issue #243 by @katxiao
- Update visualization average per table - Issue #242 by @katxiao
- Add save and load functionality to multi-table DiagnosticReport - Issue #218 by @katxiao
- Visualization methods for the multi-table DiagnosticReport - Issue #217 by @katxiao
- Add getter methods to multi-table DiagnosticReport - Issue #216 by @katxiao
- Create multi-table DiagnosticReport - Issue #215 by @katxiao
- Visualization methods for the single-table DiagnosticReport - Issue #211 by @katxiao
- Add getter methods to single-table DiagnosticReport - Issue #210 by @katxiao
- Create single-table DiagnosticReport - Issue #209 by @katxiao
- Add save and load functionality to single-table DiagnosticReport - Issue #212 by @katxiao
- Add single table diagnostic report - Issue #237 by @katxiao
v0.8.0 - 2022-11-02
This release introduces the DiagnosticReport
, which helps a user verify – at a quick glance – that their data is valid. We also fix an existing bug with detection metrics.
New Features
- Fixes for new metadata - Issue #253 by @katxiao
- Add default synthetic sample size to DiagnosticReport - Issue #248 by @katxiao
- Exclude pii columns from single table metrics - Issue #245 by @katxiao
- Accept both old and new metadata - Issue #244 by @katxiao
- Address Diagnostic Report and metric edge cases - Issue #243 by @katxiao
- Update visualization average per table - Issue #242 by @katxiao
- Add save and load functionality to multi-table DiagnosticReport - Issue #218 by @katxiao
- Visualization methods for the multi-table DiagnosticReport - Issue #217 by @katxiao
- Add getter methods to multi-table DiagnosticReport - Issue #216 by @katxiao
- Create multi-table DiagnosticReport - Issue #215 by @katxiao
- Visualization methods for the single-table DiagnosticReport - Issue #211 by @katxiao
- Add getter methods to single-table DiagnosticReport - Issue #210 by @katxiao
- Create single-table DiagnosticReport - Issue #209 by @katxiao
- Add save and load functionality to single-table DiagnosticReport - Issue #212 by @katxiao
- Add single table diagnostic report - Issue #237 by @katxiao
Bug Fixes
- Detection test test doesn't look at metadata when determining which columns to use - Issue #119 by @R-Palazzo
Internal Improvements
v0.7.0 - 2022-09-27
This release introduces the QualityReport
, which evaluates how well synthetic data captures mathematical properties from the real data. The QualityReport
incorporates the new metrics introduced in the previous release, and allows users to get detailed results, visualize the scores, and save the report for future viewing. We also add utility methods for visualizing columns and pairs of columns.
New Features
- Catch typeerror in new row synthesis query - Issue #234 by @katxiao
- Add NewRowSynthesis Metric - Issue #207 by @katxiao
- Update plot utilities API - Issue #228 by @katxiao
- Fix column pairs visualization bug - Issue #230 by @katxiao
- Save version - Issue #229 by @katxiao
- Update efficacy metrics API - Issue #227 by @katxiao
- Add RangeCoverage Metric - Issue #208 by @katxiao
- Add get_column_pairs_plot utility method - Issue #223 by @katxiao
- Parse date as datetime - Issue #222 by @katxiao
- Update error handling for reports - Issue #221 by @katxiao
- Visualization API update - Issue #220 by @katxiao
- Bug fixes for QualityReport - Issue #219 by @katxiao
- Update column pair metric calculation - Issue #214 by @katxiao
- Add get score methods for multi table QualityReport - Issue #190 by @katxiao
- Add multi table QualityReport visualization methods - Issue #192 by @katxiao
- Add plot_column visualization utility method - Issue #193 by @katxiao
- Add save and load behavior to multi table QualityReport - Issue #188 by @katxiao
- Create multi-table QualityReport - Issue #186 by @katxiao
- Add single table QualityReport visualization methods - Issue #191 by @katxiao
- Add save and load behavior to single table QualityReport - Issue #187 by @katxiao
- Add get score methods for single table Quality Report - Issue #189 by @katxiao
- Create single-table QualityReport - Issue #185 by @katxiao