Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable cache path for List token (local tokenizer file path) #4543

Open
vi3k6i5 opened this issue Oct 14, 2024 · 0 comments
Open

Configurable cache path for List token (local tokenizer file path) #4543

vi3k6i5 opened this issue Oct 14, 2024 · 0 comments
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.

Comments

@vi3k6i5
Copy link

vi3k6i5 commented Oct 14, 2024

Is your feature request related to a problem? Please describe.
Currently Google aiplatform loads tokenizer by either downloading from github or reading from a tmpdir cache path which is not configurable at a library level. https://github.com/googleapis/python-aiplatform/blob/main/vertexai/tokenization/_tokenizer_loading.py#L136-L147

Can we make it configurable like how TikToken or NLK does it ? https://github.com/openai/tiktoken/blob/main/tiktoken/load.py#L34-L42 With a env variable like VERTEX_TOKENIZER_CACHE_DIR ?

Describe the solution you'd like

Our org does not allow network download of file on our deployment servers, so we need to uplaod the file to a fixed read only directory on the server. Being able to configure the path for that server would be useful.

Describe alternatives you've considered

I tried setting the TMPDIR env variable, but that works at python global level for all libraries and does not seem very configurable.

@product-auto-label product-auto-label bot added the api: vertex-ai Issues related to the googleapis/python-aiplatform API. label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vertex-ai Issues related to the googleapis/python-aiplatform API.
Projects
None yet
Development

No branches or pull requests

1 participant