Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zeros trimmed off when parsing strings, leading to inability to use string to store a decimal value #107

Closed
imirkin opened this issue Aug 3, 2018 · 16 comments

Comments

@imirkin
Copy link

imirkin commented Aug 3, 2018

Looks like merge #46 (Remove insignificant digits during string parsing) made it impossible to initialize decimals from string with a fixed precision. What was the motivation for that change, and would a change be accepted to undo it, or move it off into a NewFromStringTrimmed() function? [Or, less preferably, introduce a NewFromStringPrecise variant.]

The whole point of decimals is to be precise, and there is a semantic difference between 1.0 and 1.00.

@imirkin
Copy link
Author

imirkin commented Aug 3, 2018

Oh wow! It looks like .String() was also changed to cut off zeroes. That goes counter to how any other bigdecimal library works.

Do let me know if this is something that you'd be interested in having fixed, or if I need to clone this off.

@wizardist
Copy link

@imirkin could you tell how exactly does this make it impossible? Examples and/or repro code would be highly appreciated.

@imirkin
Copy link
Author

imirkin commented Aug 14, 2018

So let's say I want to store the data as fixed-precision strings (like, let's say, prices), out to N digits.

I perform some calculation, round it variously, I end up with "1.000000". This is a string that I have in my database already, potentially computed by another decimal library. I have validation in place that ensures that all data is rounded properly, and padded out accordingly if zeroes are missing, to indicate among other things that it has the correct precision. [If I were to perform the calculation differently, rounded it differently, I might have ended up with "1.000", which gives me a different bit of information.]

Now I load it in, save it again, and I get "1". The data is different. The meaning of the data is different -- one was rounded to 1 decimal, the other to 6. Is it because I changed my mind at some point and started rounding to a different number of digits, and so my code needs to adapt the old data, potentially recompute it? I have no way of knowing without having the proper precision.

Separately from all that, it's just not how the Java BigDecimal works, which I have experience with, and likely other BigDecimal-like libraries, which leads to enormous confusion. Note that there have been other people complaining about the same problems earlier (I found their issues after I filed mine).

In Python:

>>> from decimal import Decimal
>>> Decimal("1.0")
Decimal('1.0')
>>> Decimal("1.000")
Decimal('1.000')

In Java (jython, but same thing):

>>> import java.math.BigDecimal as BigDecimal
>>> BigDecimal("1.0")
1.0
>>> BigDecimal("1.000")
1.000
>>> BigDecimal("1.000").toString()
u'1.000'

But all around, this is a philosophical question -- is there a difference between "1.0" and "1.00". Every other decimal library I'm aware of says "yes". It appears that as currently written, this library says "no". If you don't see the difference, I'm unlikely to be able to convince you in this bug that there is one, and will cut my losses. The patches to make it happen are relatively small (already done).

@wizardist
Copy link

So far looks like a presentation issue, but I think I understand your concern. If you're not storing decimals as strings in DB, then you're safe.

For presentation purposes you have methods like StringFixed() and a few others which may or may not help you.

HTH

@imirkin
Copy link
Author

imirkin commented Aug 14, 2018

Nope, it's not a presentation issue. 1.0 and 1.000 are semantically different values and I need to be able to tell which is which. Feel free to close this issue, I'll just clone this off.

@kempeng
Copy link

kempeng commented Aug 14, 2018

Do I understand your issue correctly in that you want the first value (1.0) to be stored as (1, 0) (bigInt value, precision) and the second one as (10000, -4) ? I think that is a valid ask to be supported by the package?

I stumbled on this in a related application of the decimal package in my code: I created new Decimals with values in multiples of a thousand (1000, 2000, 5000, 10000) by creating a new decimal's with the value as big int value and precision as zero). However later on in the code, I divided the values by a thousand, and only realised later that the precision was not zero anymore (as I naively expected), but had become -16, the default precision of DivRound() ....

@imirkin
Copy link
Author

imirkin commented Aug 14, 2018

A decimal is the composition of a big integer and an exponent, (a, b). It would be nice to be able to save those off to a string, and then load them again from the string, such that one gets the same values back of (a, b). Otherwise the values can't be serialized.

The globally agreed string representation of all this is the logical one... write out the big integer, and then insert a . in the right place [or potentially eX if one tries to represent e.g. (1, 2)].

This library manipulates the values of (a, b) when moving both to and from string, therefore I have made the (minor) changes necessary to make it happen. I'll publish a link to the fork when I get a chance.

@kempeng
Copy link

kempeng commented Aug 14, 2018

Agree that this would be a nice/useful feature that would enrich the package. I'm not sure, but it feels that several improvement suggestions are not being considered. I submitted several useful helpers functions but see no reaction to them

@wizardist
Copy link

@imirkin I may be wrong but it seems tgat you're trying to repurpose the serialized data for what it is not meant to.

"1.0" and "1.000" are semantically equivalent values as long as they are treated as serialized values. After you serialize something, you're obliged to unserialize it into something meaningful. Otherwise you're bending rules and try to do something to a stream of bytes that you're not supposed to.

If for some reason you have to do anything with serialized decimals other than just storing them somewhere, I see at least a few methods which would suit you to convert the decimals to a string rendered with arbitrary precision.

In my opinion, value and exp are internal entities and are absolutely irrelevant. You've got two decimals, do they compare alright with the provided methods? Great! Trying to compare serialized values? Well, tough luck - you're shooting in your own foot.

@imirkin
Copy link
Author

imirkin commented Aug 14, 2018

This goes counter to how both the python "decimal" library works and the Java built-in java.math.BigDecimal work. My expectations were that this library would roughly mirror them. In those libraries, 1.0 != 1.000, but cmp(1.0, 1.000) == 0.

As for 1.0 and 1.000 being different, that's usually covered as part of the "significant figures" discussion in high-school chemistry class, and one of the many reasons to use a "decimal" library even though one is not dealing with "big" numbers.

Given that they are different, it's important to be able to store and retrieve those in their original form.

@wizardist
Copy link

I see. So what you're aiming for is to get as close to the implementation of decimals in other languages as possible.

Indeed, .NET shows the same behavior for decimals, i.e. no normalization takes place.

Unfortunately, same as you, I'm just a concerned citizen passing by and cannot help other than to feel your pain :)

@jayd3e
Copy link

jayd3e commented Sep 7, 2018

I completely agree with the issue you are bringing up. In fact, I also submitted an issue for it a while back: #31. This is how I fixed it in a forked version of the code: rainhq@4c20285.

@mwoss
Copy link
Member

mwoss commented Dec 8, 2019

@imirkin @jayd3e I somehow understand your concerns about the logic behind the NewFromString method.
Despite the fact that 1.0 and 1.0* are semantically equivalent values, I'm aware we should aim for implementation that is as close as possible to other languages, so library API won't be confusing for newcomers.
I've already reached out to @njason. I'll try to find out the motivation for PR #46

@imirkin
Copy link
Author

imirkin commented Dec 8, 2019

@mwoss the important point is that 1.0 and 1.00 are not semantically equivalent. While they are mathematically equal numbers on the real number line, they convey different precision. "Significant Figures" is an important concept for a decimal library to be able to handle, and every language's decimal library recognizes this.

@mwoss
Copy link
Member

mwoss commented Dec 8, 2019

@imirkin Oh, my bad. You are right.
There is no reason why we should stick to the current implementation. This "issue" will be fixed in the next release.

@njason
Copy link
Member

njason commented Feb 27, 2020

Reverted #46 in #159

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants