Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Text measure cache assumes that strings are immutable (and can return stale values) #225

Open
gw3583 opened this issue Jan 25, 2025 · 10 comments

Comments

@gw3583
Copy link

gw3583 commented Jan 25, 2025

Thanks for the impressive work on this library.

Perhaps this issue is intended behavior, though I didn't notice it in the documentation? It seems like a foot gun that should be documented if intended behavior, but I think it should probably be made to handle the case below correctly.

If you pass a dynamically allocated string to CLAY_TEXT where the content subsequently changes next layout (but the lifetime of the string buffer remains valid), the code that caches text measurements can return stale values, because it only checks the value of the pointer when hashing for the id, rather than the content of the string.

The test case below demonstrates the issue. It asserts by default as it returns a stale measurement for the text buffer the second time layout occurs. If you comment out the cache hit here:

clay/clay.h

Line 1655 in c3fcf6c

return hashEntry;

Then the test case passes.

One potential fix would be to copy the string content when updating / checking a cache item, making use of the existing dynamicStringData array. Then, when checking if the hash entry is valid, the string itself could be compared rather than the pointer (after the main hash check), though managing the lifetimes of those strings may be slightly complicated if they are retained in that cache.

#include <iostream>
#include <cassert>
#include <cstring>

#define CLAY_IMPLEMENTATION
#include "../../clay.h"

static void HandleClayErrors(Clay_ErrorData errorData) {
    printf("%s", errorData.errorText.chars);
}

static Clay_Dimensions MeasureText(Clay_StringSlice text, Clay_TextElementConfig *config, uintptr_t userData) {
    float width = 0.0;

    switch (text.chars[0]) {
        case 'a':
            width = 100.0;
            break;
        case 'b':
            width = 200.0;
            break;
        default:
            width = 10.0;
            break;
    }

    return Clay_Dimensions { width, 16.0 }; 
}

static float DoLayout(const char *text) {
    Clay_String s = { .length = (int) strlen(text), .chars = text };

    Clay_BeginLayout();

    CLAY(CLAY_ID("root")) {
        CLAY_TEXT(s, CLAY_TEXT_CONFIG({}));
    }

    Clay_RenderCommandArray cmdArray = Clay_EndLayout();
    return cmdArray.internalArray[0].boundingBox.width;
}

int main(void) {
    uint64_t totalMemorySize = Clay_MinMemorySize();
    Clay_Arena clayMemory = Clay_CreateArenaWithCapacityAndMemory(totalMemorySize, (char *)malloc(totalMemorySize));
    Clay_Initialize(clayMemory, Clay_Dimensions {1024,768}, Clay_ErrorHandler { HandleClayErrors });
    Clay_SetMeasureTextFunction(MeasureText, 0);

    char textBuffer[32];

    strcpy(textBuffer, "a");
    float w1 = DoLayout(textBuffer);

    strcpy(textBuffer, "b");
    float w2 = DoLayout(textBuffer);

    assert(w1 == 100.0);
    assert(w2 == 200.0);

    return 0;
}
@nicbarker
Copy link
Owner

Hello, thanks for raising this issue. Apologies that it isn't documented better.
I have a couple of thoughts on solutions for this. How would you feel about the addition of a uint32_t cacheKey field added to the text element config, that would be included in the hash? You could simply have an integer that you bump when strings are regenerated, etc.

@nicbarker nicbarker changed the title Text measure cache assumes that strings are immutable (and can return stale values) [Core] Text measure cache assumes that strings are immutable (and can return stale values) Jan 26, 2025
@gw3583
Copy link
Author

gw3583 commented Jan 26, 2025

I don't think that would be particularly helpful in my use case, even though it would be reasonable enough for the simplified test case above.

In my case, I'm calling the library from rust, passing through pointers to heap allocated strings. In the case I ran in to, it wasn't that the content of a string was being regenerated between layouts. What happened was that I passed a string during one layout, then that string was freed, and on the next layout frame a new string was allocated with the same underlying heap pointer but completely different content.

So there's no obvious way to know that the pointer is reused and bump an associated cacheKey field in this scenario.

It would be possible to work around this by building infrastructure on the rust side to avoid this occurring, but it seems fairly unexpected for the case above to not work, I think.

@Pleune
Copy link

Pleune commented Jan 26, 2025

Hello, thanks for raising this issue. Apologies that it isn't documented better. I have a couple of thoughts on solutions for this. How would you feel about the addition of a uint32_t cacheKey field added to the text element config, that would be included in the hash? You could simply have an integer that you bump when strings are regenerated, etc.

I just ran into this same issue. A cacheKey would be a solution, but maybe it would be better to provide an invalidateTextCache(text, config) or something to be called whenever changing or freeing the text.

@Pleune
Copy link

Pleune commented Jan 26, 2025

For anyone who runs into this before there is a good solution, here is my hack in Clay__HashTextWithConfig to just hash the whole string

uint32_t Clay__HashTextWithConfig(Clay_String *text, Clay_TextElementConfig *config) {

   ... snip ...

    // NOTE: temp for for https://github.com/nicbarker/clay/issues/225
    for(int i=0; i<text->length; i++)
        hash *= text->chars[i];

    return hash + 1; // Reserve the hash result of zero as "null id"
}

This performs fine for my application with only very short strings... Of course this could be a decent slowdown if you have walls of text.

@nicbarker
Copy link
Owner

@gw3583 I see, in that case, how about a bool in the text config that hashes the contents of the string instead of the pointer? It's a performance concern for apps that render documents, but should be totally fine if it's opt in 🙂

@nicbarker
Copy link
Owner

@Pleune you read my mind!

@gw3583
Copy link
Author

gw3583 commented Jan 26, 2025

@gw3583 I see, in that case, how about a bool in the text config that hashes the contents of the string instead of the pointer? It's a performance concern for apps that render documents, but should be totally fine if it's opt in 🙂

Yup, that sounds great to me, thanks!

@gw3583
Copy link
Author

gw3583 commented Jan 26, 2025

For anyone who runs into this before there is a good solution, here is my hack in Clay__HashTextWithConfig to just hash the whole string

uint32_t Clay__HashTextWithConfig(Clay_String *text, Clay_TextElementConfig *config) {

   ... snip ...

    // NOTE: temp for for https://github.com/nicbarker/clay/issues/225
    for(int i=0; i<text->length; i++)
        hash *= text->chars[i];

    return hash + 1; // Reserve the hash result of zero as "null id"
}

This performs fine for my application with only very short strings... Of course this could be a decent slowdown if you have walls of text.

Just to clarify - that will work 99.9% of the time, but will fail when you have two strings that result in a hash collision, right? To be resistant to hash collisions, it would need to also store and compare the string after the hash id check.

@nicbarker
Copy link
Owner

@gw3583 yes you're right, I'll think of a solution 👍

@nicbarker
Copy link
Owner

I've created #238 as a halfway solution to this problem, with two caveats:

  • The string contents that are hashed are limited to 256 characters, as some local profiling confirms that it's very slow to hash large bodies of text
  • There isn't currently an implementation that deals with hash collisions, we'll open another PR at some point to handle that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants