Distribution of `pydecimal` is very far from optimal #2090

sshishov · 2024-08-25T09:11:22Z

Faker version: 24.14.0 (same happens on the latest version)
OS: MacOS (does not matter)

Distribution of pydecimal is very far from optimal which can lead to difficulty of use it in the tests.
For instance, it the initial value is max_value and the updated value is also max_value then it will "break" the test because the value will not be updated.

I can recommend the following approaches (imho):

re-evaluate the value if it is min or max value (maybe provide special extra kwargs to support it)
make the logic of generation more "random" as currently it is obvious that due to overflow we set it to max value or in case of underflow to min value
use min and max value inside the calculation to make sure that the value will be in the boundaries during generation

Steps to reproduce

import faker
import collections
import decimal as dec

fake = faker.Faker()

counter = collections.Counter(fake.pydecimal(left_digits=0, right_digits=4, min_value=dec.Decimal('0.1'), max_value=1) for item in range(1000000))
for value, count in counter.most_common(10):
    print(value, ':', count)

Expected behavior

0.1437 : 76
0.3199 : 76
0.2477 : 75
0.7345 : 75
0.1284 : 74
0.6271 : 74
0.1597 : 74
0.4462 : 74
0.6293 : 74
0.4967 : 74

Actual behavior

1 : 500105
0.1 : 50284
0.1437 : 76
0.3199 : 76
0.2477 : 75
0.7345 : 75
0.1284 : 74
0.6271 : 74
0.1597 : 74
0.4462 : 74

The text was updated successfully, but these errors were encountered:

sshishov · 2024-08-25T09:20:12Z

This is how we are handling it for our tests:

def get_value() -> dec.Decimal:
    """Generates real fake decimal by eliminating `min_value` and `max_value` value which is returned in case of underflow/overflow."""
    return next(
        item
        for item in iter(
            lambda: fake['en'].pydecimal(
                left_digits=0,
                right_digits=4,
                min_value=dec.Decimal('0.0001'),
                max_value=dec.Decimal(1),
            ),
            None,
        )
        if item not in {dec.Decimal('0.0001'), dec.Decimal(1)}
    )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distribution of `pydecimal` is very far from optimal #2090

Distribution of `pydecimal` is very far from optimal #2090

sshishov commented Aug 25, 2024 •

edited

Loading

sshishov commented Aug 25, 2024

Distribution of pydecimal is very far from optimal #2090

Distribution of pydecimal is very far from optimal #2090

Comments

sshishov commented Aug 25, 2024 • edited Loading

Steps to reproduce

Expected behavior

Actual behavior

sshishov commented Aug 25, 2024

Distribution of `pydecimal` is very far from optimal #2090

Distribution of `pydecimal` is very far from optimal #2090

sshishov commented Aug 25, 2024 •

edited

Loading