Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spherepoint_hash32: float8 needs wrapping into a Datum #107

Merged
merged 1 commit into from
Nov 22, 2023

Conversation

df7cb
Copy link
Contributor

@df7cb df7cb commented Nov 15, 2023

Sorry I forgot to test on 32-bit... The new spherepoint_hash32 function crashes on platforms where float8 is passed by pointer.

@vitcpp
Copy link
Contributor

vitcpp commented Nov 15, 2023

Well, zero patch number seems to be unlucky :) Having a 32 bit platform in the test pipeline might help to catch such problems. But I haven't found such platforms on GitHub Actions unfortunately.

@df7cb
Copy link
Contributor Author

df7cb commented Nov 15, 2023

The embarrassing part is that I do have such a test pipeline on apt.postgresql.org, but it wasn't running for pgsphere yet:

https://pgdgbuild.dus.dg-i.net/view/Snapshot/job/pgsphere-binaries-snapshot/

@@ -315,8 +315,8 @@ Datum
spherepoint_hash32(PG_FUNCTION_ARGS)
{
SPoint *p1 = (SPoint *) PG_GETARG_POINTER(0);
Datum h1 = DirectFunctionCall1(hashfloat8, p1->lat);
Datum h2 = DirectFunctionCall1(hashfloat8, p1->lng);
Datum h1 = DirectFunctionCall1(hashfloat8, Float8GetDatum(p1->lat));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code will work, but I think it will significantly decrease the performance on 32 bit platforms. It is ok to fix the fails on 32 bit, but the function should be improved. We have to create a new Issue for this problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how 64-bit floats are supposed to be handled on 32-bit platforms, what would you want to change there?

Copy link
Contributor

@vitcpp vitcpp Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be something like this? I haven't compiled it yet.

uint32 pgs_hashfloat8(double key)
{
	/*
	 * On IEEE-float machines, minus zero and zero have different bit patterns
	 * but should compare as equal.  We must ensure that they have the same
	 * hash value, which is most reliably done this way:
	 */
	if (key == (float8) 0)
		PG_RETURN_UINT32(0);

	/*
	 * Similarly, NaNs can have different bit patterns but they should all
	 * compare as equal.  For backwards-compatibility reasons we force them to
	 * have the hash value of a standard NaN.
	 */
	if (isnan(key))
		key = get_float8_nan();

	return hash_bytes((unsigned char *) &key, sizeof(key));
}

void spherepoint_hash32()
{
	SPoint	   *p1 = (SPoint *) PG_GETARG_POINTER(0);
	const uint32 h1 = pgs_hashfloat8(p1->lng);
	const uint32 h2 = pgs_hashfloat8(p1->lat);
	...
}

Copy link
Contributor

@vitcpp vitcpp Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to work on 64 bit platforms. I may create a PR.
(force-pushed) 6f9a86e

P.S. The compilation fails on 10-12 versions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it seems my proposes solution doesn't work on PG 10-12 because the compilation fails. It seems, hash functions are not declared in the headers or the headers are different. It is sad that hash functions are implemented as "pg-functions", not as simple functions (they accept and return Datum). The hash operation may be called frequently. Thus, calling palloc to wrap float8 for hash calculation is not a good way on 32 bit platforms, I believe. Anyway, I propose to accept the PR and think about hash functions later in a separate Issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this proposed fix be slower on 64-bit machines at all? Not sure what the difference is with Float8GetDatum() on 64-bit.

I'm just wondering if we should do something like

#if IS64BIT /* theoretical - it's more complicated than this, just illustrating */
    ... current code ...
#else
    ... new version of code with Float8GetDatum()
#endif

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed fix will not be slower on 64 bit platforms. I guess, It may be slightly faster, but insignificantly. On 64 bit platforms there is no difference, in general. My patch helps to fix the issue on 32 bit platform. Float8GetDatum uses palloc to pack float8 into Datum. I'm not sure that deallocation is happened until the end of the transaction, that may lead to huge memory consumption in case of huge number of hash calculations.

I'm not sure, we should use ifdef and create a different hash calculation logic. I would like to have the same calculation logic on all platforms. Furthermore, the hash calculation function takes float8 and returns uint32 types, which sizes are the same on both 32/64 bit platforms.

My proposed solution is not compiled on 12 or lesser versions. I think, that the original solution, proposed by @df7cb with some modifications (NAN, and +-0 processing) would be the better alternative It doesn't require some external headers.

Copy link
Contributor

@esabol esabol Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitcpp wrote:

Float8GetDatum uses palloc to pack float8 into Datum.

Just to clarify, does Float8GetDatum() call palloc() on 64-bit or only on 32-bit?

Copy link
Contributor

@vitcpp vitcpp Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float8GetDatum calls palloc on 32 bit platforms because sizeof(Datum) = 4 is not enough to store 8 bytes of float8. There is the macro USE_FLOAT8_BYVAL that defines which version to use. For 32 bits it is 1.

If USE_FLOAT8_BYVAL is not defined then the following version is used:

Datum
Float8GetDatum(float8 X)
{
	float8	   *retval = (float8 *) palloc(sizeof(float8));

	*retval = X;
	return PointerGetDatum(retval);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For 32 bits we may undef ifdef emulate palloc as a temporary solution. But I do not like temporary solutions.

uint32 pgs_hashfloat8(double key)
{
#ifdef 32BIT
    Datum datum = &key;
    uint32 hash = DirectFunctionCall1(hashfloat8, datum);
#endif

@df7cb
Copy link
Contributor Author

df7cb commented Nov 15, 2023

It would avoid the palloc call, true.

TBH, the current implementation is fast on 64-bit platforms, and anyone running database servers on 32-bit today should already be aware that there are limitations. Not sure we have to optimize for that.

@vitcpp vitcpp closed this Nov 20, 2023
@vitcpp
Copy link
Contributor

vitcpp commented Nov 20, 2023

Dear All, I'm going to increment the patch number and create a new release artifact. Let me know please if you have some objections.

@vitcpp vitcpp reopened this Nov 22, 2023
@vitcpp
Copy link
Contributor

vitcpp commented Nov 22, 2023

Sorry, it seems I haven't merged the change before close. Now fixing it.

@vitcpp vitcpp merged commit f229b2e into postgrespro:master Nov 22, 2023
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants