Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improwing speed and reduce code size when fast_float is using as internal parser code. #307

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

IRainman
Copy link

@IRainman IRainman commented Mar 5, 2025

…_SIGN and FASTFLOAT_DISALLOW_NAN. This both allow to significantly reduce code size and speedup your code when you use fast_float as a part of your parser.

We have benchmarks, please consider running them: see our README for details.

Our CI tests check formatting automating. If such a test fails, please consider running the bash script:

bash script/run-clangcldocker.sh

Make sure that you have docker installed and running on your system. Most Linux distributions support docker though some (like RedHat) have the equivalent (Podman). Users of Apple systems may want to consider OrbStack. You do not need to familiar with docker, you just need to make sure that you are have it running.

If you are unable to format the code, we may format it for you.

…_SIGN and FASTFLOAT_DISALLOW_NAN. This both allow to significantly reduce code size and speedup your code when you use fast_float as a part of your parser.
@lemire
Copy link
Member

lemire commented Mar 5, 2025

@IRainman

This both allow to significantly reduce code size and speedup your code when you use fast_float as a part of your parser.

I am skeptical. Your claims are not documented or quantified.

@IRainman
Copy link
Author

IRainman commented Mar 5, 2025

@IRainman

This both allow to significantly reduce code size and speedup your code when you use fast_float as a part of your parser.

I am skeptical. Your claims are not documented or quantified.

OK, I'll add some additional tests, related to parsing only positive numbers and positive infinite. It's really a common case in any mathematical parser. Maybe I should take a more aggressive approach and disable infinity too. In theory, this defines should not be needed with constexpr if...
Actually, one big question: do you plan to allow the compile time config options? Because current realisation parse options in fully software mode and this is bad.

@dalle
Copy link
Collaborator

dalle commented Mar 5, 2025

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

@lemire
Copy link
Member

lemire commented Mar 5, 2025

In this instance, the argument here is to reduce compiled size and improve runtime performance we avoid checking negatives ('-') and for 'nan'/'inf'. In my view, it is an extraordinary claim that checking for the negative ('-') and for 'nan'/'inf' makes a large difference.

It is possibly true, but I'd like to see the numbers. Please note that we have do have benchmarks included in the library.

Because current realisation parse options in fully software mode and this is bad.

How did you benchmark to arrive at the conclusion that it is bad?

I recently changed some compile time config options into runtime config options.

I would not assume that it makes a difference to the performance unless we have a hard data. Some parameters can be passed as template arguments, certainly...

But we need to take into account that parsing a single number takes hundreds of instructions. Predictable branches are not a big concern.

@IRainman
Copy link
Author

IRainman commented Mar 6, 2025

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

Very well! I'm currently reworking my patch to use templates argument and want to add more constexpr / consteval to the code.

@IRainman
Copy link
Author

IRainman commented Mar 6, 2025

In this instance, the argument here is to reduce compiled size and improve runtime performance we avoid checking negatives ('-') and for 'nan'/'inf'. In my view, it is an extraordinary claim that checking for the negative ('-') and for 'nan'/'inf' makes a large difference.

It is possibly true, but I'd like to see the numbers. Please note that we have do have benchmarks included in the library.

I'm measuring performance on the mathematical parser, that already processes minus signs, and I yesterday rewrote it to also process inf / infinity for mathematical questions. In mathematical parser nan isn't possible at all because if it's nan the parser stops processing equations immediately.

Because current realisation parse options in fully software mode and this is bad.

How did you benchmark to arrive at the conclusion that it is bad?

Options it's not fully processed by the template and generate many additional lines of code. Also complete removing the minus sign for mantissa and inf/nan from fast_float significantly reduces generated assembler.

To check the performance of such a case, tests should contain only positive numbers in general notation.

@dalle
Copy link
Collaborator

dalle commented Mar 6, 2025

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

Very well! I'm currently reworking my patch to use templates argument and want to add more constexpr / consteval to the code.

Please don't do this until we have the performance comparisons on the gains.

@IRainman
Copy link
Author

IRainman commented Mar 6, 2025

I recently changed some compile time config options into runtime config options. I believe that both worlds would be good. I would prefer if the compile time options to not be any ifdef-macros, but controlled by using template arguments.

Very well! I'm currently reworking my patch to use templates argument and want to add more constexpr / consteval to the code.

Please don't do this until we have the performance comparisons on the gains.

Of course, I just cleaned my code of defines and replaced them with an option. Unfortunately, code isn't fully clean-up at compile time as a preprocessing macro.

@IRainman
Copy link
Author

IRainman commented Mar 6, 2025

P. S. my current version of the test code is:
[[assume(_view._Unchecked_end() - _view._Unchecked_begin() >= 1)]]; double val; constexpr auto options{fast_float::chars_format::general | fast_float::chars_format::no_infnan | fast_float::chars_format::disallow_leading_sign}; const auto res = fast_float::from_chars(_view._Unchecked_begin(), _view._Unchecked_end(), val, options); [[assume(res.ptr - _view._Unchecked_begin() >= 1)]]; const size_t n = res.ptr - _view._Unchecked_begin();

IRainman added 4 commits March 6, 2025 19:43
… faster and more compact code parsing numbers with input support only positive C/C++ style numbers without nan or inf. That case is very useful in mathematical applications, game development, CSS parsing, embedded code, etc...

Additional improve in constant initialization.
IRainman added 2 commits March 7, 2025 14:51
…R_WO_INF_NAN is enabled we assume that we are in parser code with external loop that checks bounds.

Function cpp20_and_in_constexpr() now is really compile time evaluated. TODO fix warnings.
IRainman added 3 commits March 7, 2025 20:39
…nd FASTFLOAT_SKIP_WHITE_SPACE, please use options.

Compilation fix when FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN isn't defined.
Added examples of usage FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN macros and documented allow_leading_plus and skip_white_space options
@dalle
Copy link
Collaborator

dalle commented Mar 9, 2025

I'm sorry to interrupt you, but I think you're going the wrong way here and focusing on the wrong stuff. I don't want to seem pompous or ignorant, and I don't want to offend you, I just don't want you to (possibly) waste more time on small changes.

I will say it clearly: Without proof of the gains there is very low chances that PR will be merged.

So please focus on the tests, make the performance/benchmarks to compile and run. Then you should run the performance/benchmarks workflows. If there are quantified improvements then the likelihood of merging this PR will increase substantially.

@lemire
Copy link
Member

lemire commented Mar 9, 2025

I agree with @dalle.

It is fine to fork the library for your own purposes… it is open source. You can adapt it to your own needs. Please do so.

But we won’t take this PR unless we are convinced of its benefits. And you have not provided the evidence.

We are concerned about maintaining the code, keeping it correct and well tested.

Changes must be motivated.

IRainman added 2 commits March 9, 2025 04:07
…POSITIVE_C_NUMBER_WO_INF_NAN.

Now benchmark only measure parameters for fast_float::from_chars and nothing else.
Copy-past fix.
@IRainman
Copy link
Author

IRainman commented Mar 9, 2025

I added an improvement test with parser emulation.

P. S. In real code of mathematical processing engine, my patches also give significant improvements (this is from MSVC, not tested on the other compilers):

std::from_chars

Tests: exactly: 115, almost: 9,
failed: 4,
time is: 32064ms.

fast_float::from_chars

Tests:
time is: 26684ms.

( commits from this PR )

Tests (100% of CPU cores is used by BOINC):
time is: 43135ms.
Tests (50% of CPU cores is used by BOINC):
time is: 28898ms.
Tests (BOINC is stopped):
time is: 25459ms.

When switch FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN
is applied:

Tests: exactly: 115, almost: 9,
failed: 4,
time is: 25947ms.

( commits from this PR )

Tests (100% of CPU cores is used by BOINC):
time is: 39588ms.
Tests (50% of CPU cores is used by BOINC):
time is: 27301ms.
Tests (BOINC is stopped):
time is: 24597ms.

@IRainman
Copy link
Author

IRainman commented Mar 9, 2025

Additionally, you can also test the compilation flag FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN with a file from here https://population.un.org/wpp/downloads?folder=Standard%20Projections&group=CSV%20format — this is a CSV with only positive numbers in there, for example this https://population.un.org/wpp/assets/Excel%20Files/1_Indicator%20(Standard)/CSV_FILES/WPP2024_TotalPopulationBySex.csv.gz

@lemire
Copy link
Member

lemire commented Mar 9, 2025

@IRainman We already have a dataset made entirely of positive values (mesh), see #308

If you prepare a file that contains just one number per line, you can then pass it to our benchmark like so...

cmake -B build -D FASTFLOAT_BENCHMARKS=ON
cmake --build build
./build/benchmarks/realbenchmark myfile.txt

(This assumes Linux/macOS... The instructions are similar under Visual Studio except that you need to specify that you work in release mode.)

IRainman added 5 commits March 9, 2025 22:55
Improve FASTFLOAT_CONSTEVAL20 usage for older standards.
# reading C:/Projects/fast_float/build/benchmarks/data/canada.txt
####
# read 111126 lines
ASCII volume = 1.82777 MB
fastfloat (64)                          :   233.01 MB/s (+/- 2.0 %)    14.17 Mfloat/s      70.59 ns/f
fastfloat (32)                          :   221.31 MB/s (+/- 1.5 %)    13.46 Mfloat/s      74.32 ns/f
UTF-16 volume = 3.65553 MB
fastfloat (64)                          :   460.78 MB/s (+/- 1.4 %)    14.01 Mfloat/s      71.39 ns/f
fastfloat (32)                          :   439.76 MB/s (+/- 2.1 %)    13.37 Mfloat/s      74.80 ns/f
####
# reading C:/Projects/fast_float/build/benchmarks/data/mesh.txt
####
# read 73019 lines
ASCII volume = 0.536009 MB
fastfloat (64)                          :   131.38 MB/s (+/- 0.4 %)    17.90 Mfloat/s      55.87 ns/f
fastfloat (32)                          :   123.03 MB/s (+/- 0.4 %)    16.76 Mfloat/s      59.67 ns/f
UTF-16 volume = 1.07202 MB
fastfloat (64)                          :   259.29 MB/s (+/- 1.5 %)    17.66 Mfloat/s      56.62 ns/f
fastfloat (32)                          :   243.71 MB/s (+/- 1.8 %)    16.60 Mfloat/s      60.24 ns/f

c:\Projects\fast_float\build\benchmarks\Release>
When FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN is disabled:

####
# reading C:/Projects/fast_float/build/benchmarks/data/canada.txt
####
# read 111126 lines
ASCII volume = 1.93374 MB
fastfloat (64)                          :   228.57 MB/s (+/- 4.1 %)    13.14 Mfloat/s      76.13 ns/f
fastfloat (32)                          :   226.19 MB/s (+/- 3.1 %)    13.00 Mfloat/s      76.93 ns/f
UTF-16 volume = 3.86749 MB
fastfloat (64)                          :   445.30 MB/s (+/- 2.8 %)    12.79 Mfloat/s      78.16 ns/f
fastfloat (32)                          :   439.31 MB/s (+/- 2.1 %)    12.62 Mfloat/s      79.22 ns/f
####
# reading C:/Projects/fast_float/build/benchmarks/data/mesh.txt
####
# read 73019 lines
ASCII volume = 0.536009 MB
fastfloat (64)                          :   123.40 MB/s (+/- 0.8 %)    16.81 Mfloat/s      59.49 ns/f
fastfloat (32)                          :   117.22 MB/s (+/- 1.4 %)    15.97 Mfloat/s      62.62 ns/f
UTF-16 volume = 1.07202 MB
fastfloat (64)                          :   243.93 MB/s (+/- 2.6 %)    16.61 Mfloat/s      60.19 ns/f
fastfloat (32)                          :   232.48 MB/s (+/- 2.5 %)    15.83 Mfloat/s      63.15 ns/f

c:\Projects\fast_float\build\benchmarks\Release>

When FASTFLOAT_ONLY_POSITIVE_C_NUMBER_WO_INF_NAN is enabled:

####
# reading C:/Projects/fast_float/build/benchmarks/data/canada.txt
####
# read 111126 lines
ASCII volume = 1.82777 MB
fastfloat (64)                          :   233.01 MB/s (+/- 2.0 %)    14.17 Mfloat/s      70.59 ns/f
fastfloat (32)                          :   221.31 MB/s (+/- 1.5 %)    13.46 Mfloat/s      74.32 ns/f
UTF-16 volume = 3.65553 MB
fastfloat (64)                          :   460.78 MB/s (+/- 1.4 %)    14.01 Mfloat/s      71.39 ns/f
fastfloat (32)                          :   439.76 MB/s (+/- 2.1 %)    13.37 Mfloat/s      74.80 ns/f
####
# reading C:/Projects/fast_float/build/benchmarks/data/mesh.txt
####
# read 73019 lines
ASCII volume = 0.536009 MB
fastfloat (64)                          :   131.38 MB/s (+/- 0.4 %)    17.90 Mfloat/s      55.87 ns/f
fastfloat (32)                          :   123.03 MB/s (+/- 0.4 %)    16.76 Mfloat/s      59.67 ns/f
UTF-16 volume = 1.07202 MB
fastfloat (64)                          :   259.29 MB/s (+/- 1.5 %)    17.66 Mfloat/s      56.62 ns/f
fastfloat (32)                          :   243.71 MB/s (+/- 1.8 %)    16.60 Mfloat/s      60.24 ns/f

c:\Projects\fast_float\build\benchmarks\Release>

P.S. tested on latest Windows 10 update with latest MSVC 2022 updates on older, but still powerful machine with Intel Xeon E5-2680 v2
# Conflicts:
#	benchmarks/benchmark.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants