OCR CI build artifact (#153)

* init for [IDWA-OCR-72] Install OCR into an executable * edit readme with the build command * add refs to form_filled to use this without args * lint * lint * lint * build/upload artifact * pip install pyinstaller * pip install pyinstaller * rm dev * put build in dependencies * add requirements.txt and use pyinstaller cli * rm working directory * point to main and not dir main * try dist/ and --onefile * try -windowed * dist/main * upgrade upload action * macos-latest * try using assets from tests * try using assets from tests * revert back to dup assets * CLI for better handling of arguments * lint * lint * rm args with pyinstaller because of new cli * use ^ woth version * install docopt with gh action job * rm unused assets in the ocr dir * docs * docs * upload bin for each os * matrix exp * matrix exp * matrix exp * zip * zip * check path * check path * check path * whoops * try gzexe * && * wip * try building release * fix the needs: * try building release * try building release * try building release * try building release * add checkout * change from action * add another checkout * add paths * try using workflow_call * try using workflow_call * wip * using download action * using download action * try that * token * try for loop * dont use matrix * token ref * try with workspace * try with workspace * try with workspace * whoops * working dir * working dir * add --repo * think i got it * try encoding with jq * try encoding with jq * github.repository * full url * matrix again * see what dir we're in * path to artifactas * put everything in first job with create * put everything in first job with create * upload all in dir * write all * try dif action * try dif action * try with content * upgrade action * new output * upgrade upload and download versions * just path for download * ls * ls * add to uplaod * try full workflow * forgot to switch needs job * again * fix file names * fix file names * change release title name * missed diffs * clean-up * try changing ref * try changing ref * try changing ref * try changing ref * try changing ref * try changing ref * try changing ref * create and see if upload asset chooses it * create and see if upload asset chooses it * create and see if upload asset chooses it * create and see if upload asset chooses it * create and see if upload asset chooses it * create and see if upload asset chooses it * create and see if upload asset chooses it * use ref at checkout * ls * use ref at checkout * try new upload action * fix script * create tag again * full workflow * full workflow * full workflow * cleaned up and made as workflow_dispatch --------- Co-authored-by: Derek Dombek <derek.a.dombek.com>
CDCgov · Aug 7, 2024 · 5aef439 · 5aef439
1 parent 8de679f
commit 5aef439
Show file tree

Hide file tree

Showing 7 changed files with 106 additions and 27 deletions.
diff --git a/.github/workflows/build-ocr.yml b/.github/workflows/build-ocr.yml
@@ -0,0 +1,53 @@
+name: Build & Upload OCR Binaries
+on:
+  workflow_call:
+    outputs:
+      output-file:
+        description: "The first output string"
+        value: ${{ jobs.build.outputs.output_artifacts }}
+  workflow_dispatch:
+
+jobs:
+  build:
+    strategy:
+      matrix:
+        include:
+          - os: macos-latest
+            name: macos
+            cmd: >
+              pyinstaller -F -w -n main-macos ./OCR/ocr/main.py &&
+              cd dist/ &&
+              zip -r9 main-macos main-macos
+            out_file: main-macos.zip
+          - os: windows-latest
+            name: windows
+            cmd: pyinstaller -F -w -n main-windows ./OCR/ocr/main.py
+            out_file: main-windows.exe
+          - os: ubuntu-latest
+            name: ubuntu
+            cmd: >
+              pyinstaller -F -w -n main-ubuntu ./OCR/ocr/main.py &&
+              cd dist/ &&
+              zip -r9 main-ubuntu main-ubuntu
+            out_file: main-ubuntu.zip
+    runs-on: ${{ matrix.os }}
+    outputs:
+      output_artifacts: ${{ steps.artifacts.outputs.matrix.out_file }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.10"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt pyinstaller
+          pip install docopt
+      - name: Build binaries for all OS's
+        run: ${{ matrix.cmd }}
+      - name: Upload Artifacts To Workflow
+        uses: actions/upload-artifact@v4
+        id: artifacts
+        with:
+          name: main-${{ matrix.name }}
+          path: ./dist/${{ matrix.out_file}}
diff --git a/.github/workflows/release-ocr.yml b/.github/workflows/release-ocr.yml
@@ -1,47 +1,38 @@
 name: Release MDE-OCR artifacts
+run-name: Release MDE-OCR artifacts - by @${{ github.actor }}
 on:
-    # workflow_dispatch:
-    #     inputs:
-    #         tag:
-    #             description: 'target environment'
-    #             required: true
-    push:
-        branches:
-          - idwa-ocr-ci-for-executable
-        paths:
-          - .github/workflows/release-ocr.yml
-          - .github/workflows/build-ocr.yml
-          - OCR/**
-        # tags:
-        #     - 'v*'
+    workflow_dispatch:
+        inputs:
+            tag:
+                description: 'Version tag for new release'
+                required: true
 jobs:
   create-release:
     name: Create Release
-
     runs-on: [ubuntu-latest]
     permissions:
         contents: write
     steps:
     - uses: actions/checkout@v4
     - name: Create tag
-      uses: actions/github-script@v5
+      uses: actions/github-script@v7
       with:
         script: |
             github.rest.git.createRef({
                 owner: context.repo.owner,
                 repo: context.repo.repo,
-                ref: 'refs/tags/1.0.0',
+                ref: 'refs/tags/${{ github.event.inputs.tag }}',
                 sha: context.sha
             })
     - name: Create release
       id: create_release
       env:
         GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        tag: ${{ github.ref_name }}
+        tag: ${{ github.event.inputs.tag }}
       run: |
         gh release create "$tag" \
             --repo="$GITHUB_REPOSITORY" \
-            --title="MDE-OCR ${tag#v}" \
+            --title="MDE-OCR ${tag}" \
             --generate-notes
     - name: Output Release URL File
       run: echo "${{ steps.create_release.outputs.upload_url }}" > release_url.txt
@@ -62,9 +53,8 @@ jobs:
           with:
             path: artifacts
             merge-multiple: true
-        - name: Upload release binaries
-          uses: alexellis/[email protected]
-          env:
-            GITHUB_TOKEN: ${{ github.token }}
+        - name: Release Upload Assets
+          uses: jaywcjlove/github-action-upload-assets@main
           with:
-            asset_paths: '["./artifacts/*"]'
+            tag: ${{ github.event.inputs.tag }}
+            asset-path: '["./artifacts/*"]'
diff --git a/OCR/README.md b/OCR/README.md
@@ -29,6 +29,10 @@ Run main, hoping to convert this to a cli at some point
 poetry run main
 ```
 
+To build the OCR service into an executable artifact
+```shell
+poetry run build
+```
 
 Adding new dependencies
 ```shell

diff --git a/OCR/ocr/pyinstaller.py b/OCR/ocr/pyinstaller.py
@@ -0,0 +1,23 @@
+import PyInstaller.__main__
+from pathlib import Path
+
+HERE = Path(__file__).parent.absolute()
+path_to_main = str(HERE / "main.py")
+
+
+# This function installs/packages the main OCR function as an executable.
+# You could also use the commandline. Using `pyinstaller ./OCR/ocr/main.py -F -w` works the same is the function below.
+# If you need to add asset paths, follow the example below.
+def install():
+    PyInstaller.__main__.run(
+        [
+            path_to_main,
+            "--onefile",
+            "--windowed",
+            # SOURCE:DESTINATION
+            # "--add-data=ocr/assets/form_filled.png:assets/",
+            # "--add-data=ocr/assets/form_segmention_template.png:assets/",
+            # "--add-data=ocr/assets/labels.json:assets/",
+            # other pyinstaller options...
+        ]
+    )
diff --git a/OCR/poetry.lock b/OCR/poetry.lock
diff --git a/OCR/pyproject.toml b/OCR/pyproject.toml
@@ -34,6 +34,7 @@ build-backend = "poetry.core.masonry.api"
 
 [tool.poetry.scripts]
 main = "ocr.main:main"
+build = "ocr.pyinstaller:install"
 
 [tool.ruff]
 line-length = 118

diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,7 @@
+numpy==1.26.4
+opencv-python==4.9.0.80
+python-dotenv==1.0.1
+Pillow>=10.3.0
+torch==1.13.1
+docopt==0.6.2
+git+https://github.com/huggingface/transformers.git