Updating (#183)

* Update README.md * Update subgen.py * Fix detect language flow for non-bazarr * Bump version * Fix pyav error catch * Fix for LRC putting newlines inappropriately * Hopefully fix blank exceptions next() may have been exhausted in previous version too early. Trying a list instead and other catches. * version bump because I'm an idiot * Take 2 on has_subtitle_language_in_file * Added plex ability to queue future episodes. PLEX_QUEUE_NEXT_EPISODE and PLEX_QUEUE_SERIES * Update README.md * Removed debugging statements * Fix subtitle naming for translate Always default the subtitle name to eng, unless namesublang set to something else. * Fix semicolon because i'm editing in a browser on my phone... * Fix subtitle naming logic * Version bump and fix default actions if translating to english * Update launcher.py * Fixed deprecated call to transcribe_stable -> transcribe * Clarify Bazarr setup * Update README.md * Potential fix for garbage collector * Fix LRC generation not being skipped properly when it already exists. * Move LRC check elsewhere * somehow deleted the name of a function... * Add Afar * Fix Afar typo... * Fix for monitor files * Properly de-duplicate the queue and processing * Define the queue properly... * Fix where task_queue is defined. * Don't purge model with active transcriptions. * Add mka audio extension. * Clean up some logging More to come... * Double log line removed * renamed function and cleaned up readability * fix for lrc files * Attempt to make a ctranslate2 image with compute 5 capability Should support older GPUs * Create build_GPU_Compute5.yml * Update build_GPU_Compute5.yml * Update Dockerfile.compute5 * Update Dockerfile.compute5 * Update build_GPU_Compute5.yml * Update build_GPU_Compute5.yml * Update build_GPU_Compute5.yml * Create Dockerfile.compute5.0 * Update build_GPU_Compute5.yml * Update Dockerfile.compute5.0 * Delete Dockerfile.compute5.0 * Delete Dockerfile.compute5 * Delete .github/workflows/build_GPU_Compute5.yml * attempt to make the image smaller * Update Dockerfile.cpu alpine doesn't have torch * Update Dockerfile.cpu * Update Dockerfile.cpu * Update Dockerfile.cpu * Update Dockerfile * Update Dockerfile * Update Dockerfile.cpu * Update Dockerfile * Update Dockerfile.cpu * Update Dockerfile.cpu * Update Dockerfile * Update Dockerfile * Update Dockerfile.cpu * Update Dockerfile * Update Dockerfile * Update Dockerfile.cpu * Update Dockerfile * Print out which file we're actively working on and updated Queue functions * Remove unused function * Update calver.yml
McCloudS · Feb 6, 2025 · 1de6a10 · 1de6a10
1 parent 2c7f526
commit 1de6a10
Show file tree

Hide file tree

Showing 7 changed files with 426 additions and 159 deletions.
diff --git a/.github/workflows/calver.yml b/.github/workflows/calver.yml
@@ -16,23 +16,17 @@ jobs:
       - name: Checkout code
         uses: actions/checkout@v3
         with:
-          # Fetch only the latest commit initially
-          fetch-depth: 1
+          # Fetch the full history, it's important here
+          fetch-depth: 0
           ref: main
 
-      - name: Fetch commits for this month
-        run: |
-          # Fetch commits starting from the first day of the current month
-          YEAR=$(date +%Y)
-          MONTH=$(date +%m)
-          git fetch --shallow-since="$YEAR-$MONTH-01"
-
       - name: Calculate version
         id: version
         run: |
           # Calculate the commit count for this month
           YEAR=$(date +%Y)
           MONTH=$(date +%m)
+          # count commits since start of the month, limiting scope
           COMMIT_COUNT=$(git rev-list --count HEAD --since="$YEAR-$MONTH-01")
           echo "COMMIT_COUNT=$COMMIT_COUNT"
           echo "VERSION=${YEAR}.${MONTH}.${COMMIT_COUNT}" >> $GITHUB_ENV
@@ -56,5 +50,10 @@ jobs:
           # Amend the most recent commit, reusing the previous commit message
           git commit --amend --reuse-message=HEAD --author="${GIT_AUTHOR_NAME} <${GIT_AUTHOR_EMAIL}>"
 
-          # Push the amended commit
-          git push --force
+          # Attempt a regular push first.  If it fails because of remote changes, use --force-with-lease cautiously.
+          git push origin HEAD:main
+
+          # Alternative:  Use --force-with-lease if a regular push fails
+          # This is much safer than --force, but still requires care
+          # If this fails as well (e.g., very recent conflict), you'll need manual intervention.
+          # git push --force-with-lease origin HEAD:main
diff --git a/Dockerfile b/Dockerfile
@@ -1,23 +1,41 @@
+# Stage 1: Builder
+FROM nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04 AS builder
+
+WORKDIR /subgen
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3 \
+    python3-pip \
+    ffmpeg \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy requirements and install Python dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy application code
+COPY . .
+
+# Stage 2: Runtime
 FROM nvidia/cuda:12.3.2-cudnn9-runtime-ubuntu22.04
 
 WORKDIR /subgen
 
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/requirements.txt /subgen/requirements.txt
+# Copy necessary files from the builder stage
+COPY --from=builder /subgen/launcher.py .
+COPY --from=builder /subgen/subgen.py .
+COPY --from=builder /subgen/language_code.py .
+COPY --from=builder /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.10/dist-packages
 
-RUN apt-get update \
-    && apt-get install -y \
-        python3 \
-        python3-pip \
-        ffmpeg \
-        git \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/* \
-    && pip3 install -r requirements.txt
+# Install runtime dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    python3 \
+    && rm -rf /var/lib/apt/lists/*
 
 ENV PYTHONUNBUFFERED=1
 
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/launcher.py /subgen/launcher.py
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/subgen.py /subgen/subgen.py
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/language_code.py /subgen/language_code.py
-
-CMD [ "bash", "-c", "python3 -u launcher.py" ]
+# Set command to run the application
+CMD ["python3", "launcher.py"]
diff --git a/Dockerfile.cpu b/Dockerfile.cpu
@@ -1,23 +1,32 @@
-FROM python:3.11-slim-bullseye
+# === Stage 1: Build dependencies and install packages ===
+FROM python:3.11-slim-bullseye AS builder
 
 WORKDIR /subgen
 
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/requirements.txt /subgen/requirements.txt
+# Install required build dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    git \
+    && rm -rf /var/lib/apt/lists/*
 
-RUN apt-get update \
-    && apt-get install -y \
-        python3 \
-        python3-pip \
-        ffmpeg \
-        git \
-    && apt-get clean \
-    && rm -rf /var/lib/apt/lists/* \
-    && pip install -r requirements.txt
+# Copy and install dependencies
+COPY requirements.txt .
+RUN pip install --no-cache-dir --prefix=/install torch torchaudio --extra-index-url https://download.pytorch.org/whl/cpu && pip install --no-cache-dir --prefix=/install -r requirements.txt --extra-index-url https://download.pytorch.org/whl/cpu
 
-ENV PYTHONUNBUFFERED=1
+# === Stage 2: Create a minimal runtime image ===
+FROM python:3.11-slim-bullseye AS runtime
 
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/launcher.py /subgen/launcher.py
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/subgen.py /subgen/subgen.py
-ADD https://raw.githubusercontent.com/McCloudS/subgen/main/language_code.py /subgen/language_code.py
+WORKDIR /subgen
+
+# Install only required runtime dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+
+# Copy only necessary files from builder stage
+COPY --from=builder /install /usr/local
+
+# Copy source code
+COPY launcher.py subgen.py language_code.py /subgen/
 
-CMD [ "bash", "-c", "python3 -u launcher.py" ]
+CMD ["python3", "launcher.py"]
diff --git a/README.md b/README.md
@@ -4,6 +4,10 @@
 <details>
 <summary>Updates:</summary>
 
+23 Dec: Added PLEX_QUEUE_NEXT_EPISODE and PLEX_QUEUE_SERIES.  Will automatically start generating subtitles for the next episode in your series, or queue the whole series.  
+
+4 Dec: Added more ENV settings: DETECT_LANGUAGE_OFFSET, PREFERRED_AUDIO_LANGUAGES, SKIP_IF_AUDIO_TRACK_IS, ONLY_SKIP_IF_SUBGEN_SUBTITLE, SKIP_UNKNOWN_LANGUAGE, SKIP_IF_LANGUAGE_IS_NOT_SET_BUT_SUBTITLES_EXIST, SHOULD_WHISPER_DETECT_AUDIO_LANGUAGE
+
 30 Nov 2024: Signifcant refactoring and handling by Muisje.  Added language code class for more robustness and flexibility and ability to separate audio tracks to make sure you get the one you want.  New ENV Variables: SUBTITLE_LANGUAGE_NAMING_TYPE, SKIP_IF_AUDIO_TRACK_IS, PREFERRED_AUDIO_LANGUAGE, SKIP_IF_TO_TRANSCRIBE_SUB_ALREADY_EXIST
 
     There will be some minor hiccups, so please identify them as we work through this major overhaul.
@@ -117,7 +121,15 @@ If you want to use a GPU, you need to map it accordingly.
 
 #### Unraid
 
-While Unraid doesn't have an app or template for quick install, with minor manual work, you can install it.  See [https://github.com/McCloudS/subgen/issues/37](https://github.com/McCloudS/subgen/discussions/137) for pictures and steps.
+While Unraid doesn't have an app or template for quick install, with minor manual work, you can install it.  See [https://github.com/McCloudS/subgen/discussions/137](https://github.com/McCloudS/subgen/discussions/137) for pictures and steps.
+
+## Bazarr
+
+You only need to confiure the Whisper Provider as shown below: <br>
+![bazarr_configuration](https://wiki.bazarr.media/Additional-Configuration/images/whisper_config.png) <br>
+The Docker Endpoint is the ip address and port of your subgen container (IE http://192.168.1.111:9000) See https://wiki.bazarr.media/Additional-Configuration/Whisper-Provider/ for more info.  **127.0.0.1 WILL NOT WORK IF YOU ARE RUNNING BAZARR IN A DOCKER CONTAINER!** I recomend not enabling using the Bazarr provider with other webhooks in Subgen, or you will likely be generating duplicate subtitles. If you are using Bazarr, path mapping isn't necessary, as Bazarr sends the file over http.
+
+**The defaults of Subgen will allow it to run in Bazarr with zero configuration.  However, you will probably want to change, at a minimum, `TRANSCRIBE_DEVICE` and `WHISPER_MODEL`.**
 
 ## Plex
 
@@ -131,12 +143,6 @@ Emby was really nice and provides good information in their responses, so we don
 
 Remember, Emby and Subgen need to be able to see the exact same files at the exact same paths, otherwise you need `USE_PATH_MAPPING`.
 
-## Bazarr
-
-You only need to confiure the Whisper Provider as shown below: <br>
-![bazarr_configuration](https://wiki.bazarr.media/Additional-Configuration/images/whisper_config.png) <br>
-The Docker Endpoint is the ip address and port of your subgen container (IE http://192.168.1.111:9000) See https://wiki.bazarr.media/Additional-Configuration/Whisper-Provider/ for more info.  I recomend not enabling this with other webhooks, or you will likely be generating duplicate subtitles. If you are using Bazarr, path mapping isn't necessary, as Bazarr sends the file over http.
-
 ## Tautulli
 
 Create the webhooks in Tautulli with the following settings:
@@ -221,6 +227,15 @@ The following environment variables are available in Docker.  They will default
 | SKIP_IF_AUDIO_TRACK_IS | '' | Takes a pipe separated `\|` list of 3 letter language codes to skip if the file has audio in that language.  This could be used to skip generating subtitles for a language you don't want, like, I speak English, don't generate English subtitles (for example: 'eng\|deu')|
 | PREFERRED_AUDIO_LANGUAGE | 'eng' | If there are multiple audio tracks in a file, it will prefer this setting |
 | SKIP_IF_TO_TRANSCRIBE_SUB_ALREADY_EXIST | True | Skips generation of subtitle if a file matches our desired language already. |
+| DETECT_LANGUAGE_OFFSET | 0 | Allows you to shift when to run detect_language, geared towards avoiding introductions or songs. |
+| PREFERRED_AUDIO_LANGUAGES | 'eng' | Pipe separated list |
+| SKIP_IF_AUDIO_TRACK_IS | '' | Takes a pipe separated list of ISO 639-2 languages. Skips generation of subtitle if the file has the audio file listed. |
+| ONLY_SKIP_IF_SUBGEN_SUBTITLE | False | Skips generation of subtitles if the file has "subgen" somewhere in the same |
+| SKIP_UNKNOWN_LANGUAGE | False | Skips generation if the file has an unknown language |
+| SKIP_IF_LANGUAGE_IS_NOT_SET_BUT_SUBTITLES_EXIST | False | Skips generation if file doesn't have an audio stream marked with a language |
+| SHOULD_WHISPER_DETECT_AUDIO_LANGUAGE | False | Should Whisper try to detect the language if there is no audio language specified via force langauge |
+| PLEX_QUEUE_NEXT_EPISODE | False | Will queue the next Plex series episode for subtitle generation if subgen is triggered. |
+| PLEX_QUEUE_SERIES | False | Will queue the whole Plex series for subtitle generation if subgen is triggered. |
 
 ### Images:
 `mccloud/subgen:latest` is GPU or CPU <br>

diff --git a/language_code.py b/language_code.py
@@ -2,6 +2,7 @@
 
 class LanguageCode(Enum):
     # ISO 639-1, ISO 639-2/T, ISO 639-2/B, English Name, Native Name
+    AFAR = ("aa", "aar", "aar", "Afar", "Afar") 
     AFRIKAANS = ("af", "afr", "afr", "Afrikaans", "Afrikaans")
     AMHARIC = ("am", "amh", "amh", "Amharic", "አማርኛ")
     ARABIC = ("ar", "ara", "ara", "Arabic", "العربية")

diff --git a/launcher.py b/launcher.py
@@ -100,7 +100,7 @@ def main():
 
     # Construct the argument parser
     parser = argparse.ArgumentParser(prog="python launcher.py", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
-    parser.add_argument('-d', '--debug', default=False, action='store_true', help="Enable console debugging")
+    parser.add_argument('-d', '--debug', default=True, action='store_true', help="Enable console debugging")
     parser.add_argument('-i', '--install', default=False, action='store_true', help="Install/update all necessary packages")
     parser.add_argument('-a', '--append', default=False, action='store_true', help="Append 'Transcribed by whisper' to generated subtitle")
     parser.add_argument('-u', '--update', default=False, action='store_true', help="Update Subgen")