Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable SHORTFIN_ENABLE_TOKENIZERS in Linux package builds #679

Open
2 of 4 tasks
ScottTodd opened this issue Dec 11, 2024 · 1 comment · May be fixed by #688
Open
2 of 4 tasks

Enable SHORTFIN_ENABLE_TOKENIZERS in Linux package builds #679

ScottTodd opened this issue Dec 11, 2024 · 1 comment · May be fixed by #688
Assignees

Comments

@ScottTodd
Copy link
Member

ScottTodd commented Dec 11, 2024

This will allow us to replace the current loosely coupled tokenizers Python package (https://pypi.org/project/tokenizers/) with a source dependency on https://github.com/mlc-ai/tokenizers-cpp that provides bindings to the underlying rust library from https://github.com/huggingface/tokenizers and C++ library from https://github.com/google/sentencepiece.

See also:

Main tasks for this issue:

@ScottTodd ScottTodd self-assigned this Dec 11, 2024
@ScottTodd
Copy link
Member Author

We also can't use a github action to set up rust on the host machine like the shortfin CI workflow:

- name: Setup Rust
# For now, `SHORTFIN_ENABLE_TOKENIZERS` is only enabled for 'Ubuntu (Clang)(full)'.
# TODO(#620): Enable on Windows.
if: ${{ matrix.name == 'Ubuntu (Clang)(full)'}}
uses: dtolnay/rust-toolchain@315e265cd78dad1e1dcf3a5074f6d6c47029d5aa # master branch (Nov 18, 2024)
with:
toolchain: stable

since the Linux package builds are run in a manylinux docker container:

# Trampoline to the docker container if running on the host.
if [ -z "${__MANYLINUX_BUILD_WHEELS_IN_DOCKER-}" ]; then
run_on_host "$@"
else
run_in_docker "$@"
fi

ScottTodd added a commit to nod-ai/base-docker-images that referenced this issue Dec 12, 2024
Progress on nod-ai/shark-ai#679

Tested:
```bash
cd base-docker-images
sudo docker buildx build --file dockerfiles/manylinux_x86_64.Dockerfile . --tag manylinux:latest

cd ../shark-ai
CACHE_DIR=~/.shark-ai-cache \
  OUTPUT_DIR=/tmp/wheelhouse \
  MANYLINUX_DOCKER_IMAGE=manylinux:latest \
  sudo -E ./shortfin/build_tools/build_linux_package.sh

# ******************** BUILD COMPLETE ********************
# + echo 'Generated binaries:'
# Generated binaries:
# + ls -l /tmp/wheelhouse
# total 40084
# -rw-r--r-- 1 root root 13691622 Dec 11 15:48 shortfin-3.0.0rc20241118-cp311-cp311-manylinux_2_28_x86_64.whl
# -rw-r--r-- 1 root root 13682296 Dec 11 15:50 shortfin-3.0.0rc20241118-cp312-cp312-manylinux_2_28_x86_64.whl
# -rw-r--r-- 1 root root 13666409 Dec 11 15:52 shortfin-3.0.0rc20241118-cp313-cp313-manylinux_2_28_x86_64.whl

```

Note that this image is in use without pinning in some repositories, but
from what I can tell, all affected workflows are already failing and the
code is unmaintained. For example:
https://github.com/nod-ai/SRT/blob/373685f1cfff5dd6d934bf5858b6d58fc7a5bcdf/build_tools/pkgci/build_linux_packages.sh#L67.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant