Configurable cache path for List token (local tokenizer file path) #4543

vi3k6i5 · 2024-10-14T10:33:03Z

Is your feature request related to a problem? Please describe.
Currently Google aiplatform loads tokenizer by either downloading from github or reading from a tmpdir cache path which is not configurable at a library level. https://github.com/googleapis/python-aiplatform/blob/main/vertexai/tokenization/_tokenizer_loading.py#L136-L147

Can we make it configurable like how TikToken or NLK does it ? https://github.com/openai/tiktoken/blob/main/tiktoken/load.py#L34-L42 With a env variable like VERTEX_TOKENIZER_CACHE_DIR ?

Describe the solution you'd like

Our org does not allow network download of file on our deployment servers, so we need to uplaod the file to a fixed read only directory on the server. Being able to configure the path for that server would be useful.

Describe alternatives you've considered

I tried setting the TMPDIR env variable, but that works at python global level for all libraries and does not seem very configurable.

The text was updated successfully, but these errors were encountered:

product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Oct 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable cache path for List token (local tokenizer file path) #4543

Configurable cache path for List token (local tokenizer file path) #4543

vi3k6i5 commented Oct 14, 2024

Configurable cache path for List token (local tokenizer file path) #4543

Configurable cache path for List token (local tokenizer file path) #4543

Comments

vi3k6i5 commented Oct 14, 2024