-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MARISA_SIZE_ERROR: buf.size() > MARISA_UINT32_MAX #1
Comments
It seems that your data reach the limitation of marisa-trie. If you set |
Thanks for the quick response, I'll give that shot. Do you think it is possible for this library to scale out to support bigger data sets? My naive thought is that I could try moving things from a 32 bit limit to a 64 limit? Do you think that would work? Thanks again. |
The library is hard-coded to use |
This limitation should be removed in future... |
@s-yata Any chance this ancient issue will be addressed? I'm running into the same problem |
at the risk of being an echo, would add that as datasets grow larger, more and more people will run into this issue. Marisa Trie is really great for my work, but on my latest project I've encountered this issue. |
I have encountered this issue as well (for example when trying to build a trie of around 100m elements with 100 bytes each). I have noticed that the library is capable of creating files that are larger than 4GB (2^32 bytes). Data has become bigger since 2016. This data structure is a real gem. Anyone has ideas/suggestions on how to fix this UInt32 limitation? Is that a few hours/days of work or more? What needs to be done really? I have not done anything in C++ for a very long time (using the python bindings) but I would be happy to try/help with this issue. My end goal would be to be able to create tries of 10 to 100GB using python. Thanks in advance for any help/pointers and congratulations to the author for an amazing library. |
Datasets are larger in 2024, i've been reaching this issue myself. |
Hello Susumu,
We are using the Python Marisa trie wrapper (https://github.com/kmike/marisa-trie) which implements your library. The amount of data we've been placing in the trie has been increasing over time and the most recent trie generation caused the following overflow:
File "marisa_trie.pyx", line 422, in marisa_trie.BytesTrie.init (src/marisa_trie.cpp:7670)
File "marisa_trie.pyx", line 127, in marisa_trie.Trie._build (src/marisa_trie.cpp:2768)
RuntimeError: lib/marisa/grimoire/trie/tail.cc:192: MARISA_SIZE_ERROR: buf.size() > MARISA_UINT32_MAX
If there's any more info you need please let me know!
The text was updated successfully, but these errors were encountered: