Merge branch 'main' into jeromehardaway/update-code

Vets-Who-Code · Oct 24, 2024 · 262fb31 · 262fb31
2 parents 4cc2e6f + e4aaafd
commit 262fb31
Show file tree

Hide file tree

Showing 21 changed files with 134 additions and 49 deletions.
diff --git a/.DS_Store b/.DS_Store
diff --git a/.coveragerc b/.coveragerc
@@ -0,0 +1,4 @@
+[run]
+omit =
+    tests/*
+    */tests/*
diff --git a/.github/workflows/unit-test.yml b/.github/workflows/unit-test.yml
@@ -0,0 +1,23 @@
+name: Run Unit Tests with Pytest
+
+on: [ push, pull_request ]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v4
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.12'
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt
+
+      - name: Run tests with coverage report
+        run: |
+          pytest --cov --cov-report=term-missing
diff --git a/.gitignore b/.gitignore
@@ -8,6 +8,8 @@ __pycache__/
 # C extensions
 *.so
 
+.DS_Store
+
 # Distribution / packaging
 .Python
 build/

diff --git a/README.md b/README.md
@@ -7,12 +7,11 @@ VetsAI is an AI-powered virtual assistant designed to help veterans navigate emp
 - **Chat Assistant**: Ask questions and receive advice on job searching and career transitions.
 - **Military Job Code Translation**: Provide a military job code (e.g., MOS, AFSC) to get suggestions for related civilian careers.
 - **Document Upload**: Upload employment-related documents (PDF or DOCX), and VetsAI will process the content to assist with career suggestions.
-- **OpenAI Integration**: Uses OpenAI’s GPT-4 to generate responses based on the conversation context.
+- **OpenAI Integration**: Uses OpenAI's GPT-4 to generate responses based on the conversation context.
 
 ## Prerequisites
 
 To run this application, ensure you have the following installed:
-
 - Python 3.8 or later
 - A virtual environment (recommended)
 
@@ -22,62 +21,61 @@ To run this application, ensure you have the following installed:
    ```bash
    git clone <repository-url>
    cd <repository-directory>
+   ```
 
-	2.	Set up a virtual environment:
-
-python -m venv venv
-source venv/bin/activate  # For macOS/Linux
-.\venv\Scripts\activate  # For Windows
-
-
-	3.	Install dependencies:
-
-pip install -r requirements.txt
-
-
-	4.	Set up environment variables:
-	•	Create a .env file in the root of your project.
-	•	Add your OpenAI API key to the .env file:
-
-OPENAI_API_KEY=your-openai-api-key
-
-
+2. **Set up a virtual environment**:
+   ```bash
+   python -m venv venv
+   source venv/bin/activate  # For macOS/Linux
+   .\venv\Scripts\activate  # For Windows
+   ```
 
-Running the App
+3. **Install dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
 
-	1.	Run the Streamlit app:
+4. **Set up environment variables**:
+   - Create a .env file in the root of your project.
+   - Add your OpenAI API key to the .env file:
+     ```
+     OPENAI_API_KEY=your-openai-api-key
+     ```
 
-streamlit run app.py
+## Running the App
 
+1. **Run the Streamlit app**:
+   ```bash
+   streamlit run app.py
+   ```
 
-	2.	Access the app:
-Open your web browser and navigate to http://localhost:8501.
+2. **Access the app**:
+   Open your web browser and navigate to http://localhost:8501.
 
-Usage
+## Usage
 
-	•	Chat: Ask questions about job searching, resume building, and military job code translations.
-	•	Upload Resume: Upload a resume (PDF or DOCX), and VetsAI will process the text for further assistance.
-	•	Military Job Codes: Enter your military job code (e.g., MOS, AFSC) to get suggestions for civilian careers.
+- **Chat**: Ask questions about job searching, resume building, and military job code translations.
+- **Upload Resume**: Upload a resume (PDF or DOCX), and VetsAI will process the text for further assistance.
+- **Military Job Codes**: Enter your military job code (e.g., MOS, AFSC) to get suggestions for civilian careers.
 
-File Structure
+## File Structure
 
-	•	app.py: Main application script.
-	•	data/employment_transitions/job_codes/: Directory containing military job code files.
-	•	requirements.txt: Python package dependencies.
+- `app.py`: Main application script.
+- `data/employment_transitions/job_codes/`: Directory containing military job code files.
+- `requirements.txt`: Python package dependencies.
 
-Dependencies
+## Dependencies
 
 The following Python libraries are required to run this app:
-
-	•	streamlit: For the web interface.
-	•	httpx: To make HTTP requests to OpenAI’s API.
-	•	nest-asyncio: To allow nested event loops for async operations.
-	•	better-profanity: To filter profane language.
-	•	PyPDF2: For extracting text from PDF files.
-	•	python-docx: For reading DOCX files.
-	•	python-dotenv: To load environment variables from a .env file.
-	•	openai: To interact with OpenAI’s API.
-
-License
-
-This project is licensed under the MIT License.
+- `streamlit`: For the web interface.
+- `httpx`: To make HTTP requests to OpenAI's API.
+- `nest-asyncio`: To allow nested event loops for async operations.
+- `better-profanity`: To filter profane language.
+- `PyPDF2`: For extracting text from PDF files.
+- `python-docx`: For reading DOCX files.
+- `python-dotenv`: To load environment variables from a .env file.
+- `openai`: To interact with OpenAI's API.
+
+## License
+
+This project is licensed under the MIT License.
diff --git a/data/.DS_Store b/data/.DS_Store
diff --git a/data/employment_transitions/.DS_Store b/data/employment_transitions/.DS_Store
diff --git a/data/employment_transitions/job_codes/.DS_Store b/data/employment_transitions/job_codes/.DS_Store
diff --git a/requirements.txt b/requirements.txt
@@ -15,4 +15,4 @@ tiktoken>=0.5.1
 cachetools>=5.3.2
 dataclasses-json>=0.6.1
 asyncio>=3.4.3
-aiohttp>=3.9.1
+aiohttp>=3.9.1
diff --git a/streamlit_app.py b/streamlit_app.py
@@ -428,5 +428,6 @@ def main():
             save_feedback(feedback)
             st.success("Thank you for your feedback!")
 
+
 if __name__ == "__main__":
     main()
diff --git a/tests/__init__.py b/tests/__init__.py
diff --git a/tests/conftest.py b/tests/conftest.py
@@ -0,0 +1,25 @@
+import io
+import pytest
+import os
+
+TEST_RESOURCE_DIR = f"{os.path.dirname(__file__)}/resources"
+
+
+def load_resource_file(file_name):
+    with open(file_name, "rb") as file:
+        data = io.BytesIO(file.read())
+    return data
+
+
+@pytest.fixture(scope="module")
+def file_resources():
+    library = {}
+    for filename in os.listdir(TEST_RESOURCE_DIR):
+        library[filename.split(".")[0]] = load_resource_file(f"{TEST_RESOURCE_DIR}/{filename}")
+    yield library
+
+
+def pytest_configure(config):
+    config.addinivalue_line(
+        "markers", "slow: marks tests as slow (deselect with '-m \"not slow\"')"
+    )
diff --git a/tests/resources/docx_blank.docx b/tests/resources/docx_blank.docx
diff --git a/tests/resources/docx_text_and_media.docx b/tests/resources/docx_text_and_media.docx
diff --git a/tests/resources/docx_text_only.docx b/tests/resources/docx_text_only.docx
diff --git a/tests/resources/docx_unicode_sample.docx b/tests/resources/docx_unicode_sample.docx
diff --git a/tests/resources/pdf_blank.pdf b/tests/resources/pdf_blank.pdf
diff --git a/tests/resources/pdf_text_and_media.pdf b/tests/resources/pdf_text_and_media.pdf
diff --git a/tests/resources/pdf_text_only.pdf b/tests/resources/pdf_text_only.pdf
diff --git a/tests/resources/pdf_unicode_sample.pdf b/tests/resources/pdf_unicode_sample.pdf
diff --git a/tests/test_streamlit_app.py b/tests/test_streamlit_app.py
@@ -0,0 +1,32 @@
+from streamlit_app import extract_text_from_pdf, extract_text_from_word
+import pytest
+
+
+class TestDOCXExtraction:
+    def test_extract_text_from_word_with_only_text(self, file_resources):
+        assert extract_text_from_word(file_resources["docx_text_only"]) == "This document has text!"
+
+    def test_extract_text_from_word_with_empty_file(self, file_resources):
+        assert extract_text_from_word(file_resources["docx_blank"]) == ""
+
+    def test_extract_text_from_word_with_non_text_contents(self, file_resources):
+        assert extract_text_from_word(file_resources["docx_text_and_media"]) == "This document has text!"
+
+    def test_extract_text_from_word_with_special_characters(self, file_resources):
+        assert extract_text_from_word(file_resources["docx_unicode_sample"])
+
+
+class TestPDFExtraction:
+    def test_extract_text_from_pdf_with_only_text(self, file_resources):
+        assert extract_text_from_pdf(file_resources["pdf_text_only"]) == "This document has text!"
+
+    def test_extract_text_from_pdf_with_empty_file(self, file_resources):
+        assert extract_text_from_pdf(file_resources["pdf_blank"]) == ""
+
+    def test_extract_text_from_pdf_with_non_text_contents(self, file_resources):
+        # PyPDF2 will pull the text from charts also, so we cannot use == to compare
+        assert "This document has text!" in extract_text_from_pdf(file_resources["pdf_text_and_media"])
+
+    @pytest.mark.slow
+    def test_extract_text_from_pdf_with_special_characters(self, file_resources):
+        assert extract_text_from_pdf(file_resources["pdf_unicode_sample"])
-Original file line number
+Diff line change
@@ Expand Up / @@ -8,6 +8,8 @@ __pycache__/ @@
     # C extensions
     *.so
+    .DS_Store
     # Distribution / packaging
     .Python
     build/
@@ Expand Down @@