Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribution of pydecimal is very far from optimal #2090

Open
sshishov opened this issue Aug 25, 2024 · 1 comment
Open

Distribution of pydecimal is very far from optimal #2090

sshishov opened this issue Aug 25, 2024 · 1 comment

Comments

@sshishov
Copy link
Contributor

sshishov commented Aug 25, 2024

  • Faker version: 24.14.0 (same happens on the latest version)
  • OS: MacOS (does not matter)

Distribution of pydecimal is very far from optimal which can lead to difficulty of use it in the tests.
For instance, it the initial value is max_value and the updated value is also max_value then it will "break" the test because the value will not be updated.

I can recommend the following approaches (imho):

  • re-evaluate the value if it is min or max value (maybe provide special extra kwargs to support it)
  • make the logic of generation more "random" as currently it is obvious that due to overflow we set it to max value or in case of underflow to min value
  • use min and max value inside the calculation to make sure that the value will be in the boundaries during generation

Steps to reproduce

import faker
import collections
import decimal as dec

fake = faker.Faker()

counter = collections.Counter(fake.pydecimal(left_digits=0, right_digits=4, min_value=dec.Decimal('0.1'), max_value=1) for item in range(1000000))
for value, count in counter.most_common(10):
    print(value, ':', count)

Expected behavior

0.1437 : 76
0.3199 : 76
0.2477 : 75
0.7345 : 75
0.1284 : 74
0.6271 : 74
0.1597 : 74
0.4462 : 74
0.6293 : 74
0.4967 : 74

Actual behavior

1 : 500105
0.1 : 50284
0.1437 : 76
0.3199 : 76
0.2477 : 75
0.7345 : 75
0.1284 : 74
0.6271 : 74
0.1597 : 74
0.4462 : 74
@sshishov
Copy link
Contributor Author

This is how we are handling it for our tests:

def get_value() -> dec.Decimal:
    """Generates real fake decimal by eliminating `min_value` and `max_value` value which is returned in case of underflow/overflow."""
    return next(
        item
        for item in iter(
            lambda: fake['en'].pydecimal(
                left_digits=0,
                right_digits=4,
                min_value=dec.Decimal('0.0001'),
                max_value=dec.Decimal(1),
            ),
            None,
        )
        if item not in {dec.Decimal('0.0001'), dec.Decimal(1)}
    )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant