-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python, rust!): Add disable_string_cache
#11020
Conversation
f744184
to
a7ad283
Compare
Why two functions rather than one |
Some discussion on that in #8631 and in the linked issue. I'm open to alternatives. But |
Alright, I don't have strong opinions about it. |
I'd rather call it something that reflects that action. |
Basically, a few things can happen:
The refcount is an implementation detail that we shouldn't show to users. So we have three functions:
This is what I implemented. I don't see a more user-friendly way to do this. |
I think that we do need to document and make clear the implementation details as you can get race conditions. If a contextmanager (thread A) incremented the cache and another thread calls |
@ritchie46 That's a good point. I think what we should do to solve that is the following (in pseudocode): # Initial condition: string cache enabled
global_string_cache_enabled = True
global_refcount = 1
def enable_string_cache():
if not global_string_cache_enabled:
global_string_cache_enabled = True
_start_using_cache()
def disable_string_cache():
if global_string_cache_enabled:
_stop_using_cache()
global_string_cache_enabled = False
def _start_using_cache():
global_refcount += 1
def _stop_using_cache():
global_refcount -= 1
if global_refcount == 0:
_clear_cache() Not shown here is that the context managers also use |
The way I see it, the global string cache isn't thread-safe anyway. If one thread enables the global string cache, it will disrupt any other threads that are trying to construct local categoricals. So I don't really get the discussion about thread safety, because it's not designed to be thread-safe (it's global...). But maybe I am misunderstanding things here. We can add a warning somewhere in the docs to document this behavior, of course. |
Sure, but enabling the string cache doesn't break a program, disabling does. We use the atomic reference counting to ensure a thread that holds the string cache will have the string cache for that lifetime. I think we must guarantee that, unless you call a very low level function like We do atomic reference counting to ensure it is thread safe with regard to holding the string cache.
It does matter, as then the string cache will never get cleared. With the context manager and on the rust side the |
a7ad283
to
b033fb2
Compare
@orlp Not sure how that extra boolean state would help matters here? Maybe we can discuss on Thursday. Discussed the design with Ritchie at the office and I updated the PR with a warning that |
d3a3e22
to
24a08b3
Compare
disable_string_cache
disable_string_cache
This PR is now done. @orlp mind reviewing to check if I implemented the pseudocode correctly? |
crates/polars-core/src/chunked_array/logical/categorical/string_cache.rs
Outdated
Show resolved
Hide resolved
crates/polars-core/src/chunked_array/logical/categorical/string_cache.rs
Show resolved
Hide resolved
0967e4c
to
7185f47
Compare
7185f47
to
bd39574
Compare
Closes #10425
Changes
enable_string_cache
. The function now always enables the string cache.disable_string_cache
. This disables the string cache.IUseStringCache
toStringCacheHolder
.IUse...
seemed like a weird name to me - but maybe I am missing some convention here.API is now consistent across the Rust and Python side - with the exception that Python offers a context manager instead of a RAII object.