microsoft · SANTHOSH-SACHIN · Feb 22, 2025
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
@@ -8,13 +8,13 @@ How to set up your local machine.
 
 ## Backend (Python)
 
-- **Create a Virtual Environment**  
+- **Create a Virtual Environment**
     ```bash
     python -m venv venv
     .\venv\Scripts\activate
     ```
 
-- **Install Dependencies**  
+- **Install Dependencies**
     ```bash
     pip install -r requirements.txt
     ```
@@ -23,6 +23,11 @@ How to set up your local machine.
     - required fields for different providers are different, please refer to the [LiteLLM setup](https://docs.litellm.ai/docs#litellm-python-sdk) guide for more details.
         - currently only endpoint, model, api_key, api_base, api_version are supported.
     - this helps data formulator to automatically load the API keys when you run the app, so you don't need to set the API keys in the app UI.
+- **Run the app (using Docker - Recommended for development)**
+   ```bash
+   docker-compose up --build
+   ```
+   This will build the Docker image and start the application in a container.  Any changes you make to the code will be automatically reflected in the running application thanks to the volume mount in `docker-compose.yml`.
 
 - **Run the app**
     - **Windows**
@@ -37,8 +42,8 @@ How to set up your local machine.
 
 ## Frontend (TypeScript)
 
-- **Install NPM packages**  
-    
+- **Install NPM packages**
+
     ```bash
     yarn
     ```
@@ -60,7 +65,7 @@ How to set up your local machine.
     ```bash
     yarn build
     ```
-    This builds the app for production to the `py-src/data_formulator/dist` folder.  
+    This builds the app for production to the `py-src/data_formulator/dist` folder.
 
     Then, build python package:
 
@@ -74,15 +79,15 @@ How to set up your local machine.
 
     You can then install the build result wheel (testing in a virtual environment is recommended):
     ```bash
-    # replace <version> with the actual build version. 
-    pip install dist/data_formulator-<version>-py3-none-any.whl 
+    # replace <version> with the actual build version.
+    pip install dist/data_formulator-<version>-py3-none-any.whl
     ```
 
     Once installed, you can run Data Formulator with:
     ```bash
     data_formulator
     ```
-    or 
+    or
     ```bash
     python -m data_formulator
     ```

diff --git a/DOCKER.md b/DOCKER.md
@@ -0,0 +1,124 @@
+# Docker Setup and Deployment
+
+This document provides detailed instructions on setting up and deploying Data Formulator using Docker.
+
+## Prerequisites
+
+*   Docker installed and running on your system. You can download Docker from [https://www.docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop).
+*   Docker Compose installed.  Docker Desktop includes Docker Compose.  If you're not using Docker Desktop, follow the instructions at [https://docs.docker.com/compose/install/](https://docs.docker.com/compose/install/).
+
+## Building the Docker Image
+
+The `Dockerfile` contains instructions for building a Docker image that includes both the frontend and backend of Data Formulator.  It uses a multi-stage build process to minimize the final image size.
+
+To build the image, run the following command from the root directory of the project:
+
+```bash
+docker build -t data-formulator .
+```
+
+## Running the Docker Container
+
+To run the container, run the following command from the root directory of the project:
+
+```bash
+docker run -p 5000:5000 data-formulator
+```
+
+This will create a Docker image named `data-formulator`.
+
+## Running Data Formulator with Docker
+
+### Using `docker run`
+
+You can run Data Formulator directly using the `docker run` command.  This is useful for quick testing or simple deployments.
+
+```bash
+
+docker run -p 5000:5000 -e OPENAI_API_KEY=your-openai-key -e AZURE_API_KEY=your-azure-key ... data-formulator
+```
+
+*   `-p 5000:5000`: This maps port 5000 on your host machine to port 5000 inside the container.  Data Formulator runs on port 5000 by default.
+*   `-e VAR=value`: This sets environment variables inside the container.  You **must** provide your API keys and other configuration settings using environment variables.  See the [Configuration](#configuration) section in `README.md` for a complete list of supported environment variables.  Replace placeholders like `your-openai-key` with your actual API keys.
+*   `data-formulator`: This is the name of the Docker image you built earlier.
+
+### Using Docker Compose (Recommended)
+
+Docker Compose simplifies the process of running multi-container applications.  Data Formulator, while technically a single service, benefits from Docker Compose for managing environment variables and simplifying the startup process.
+
+The `docker-compose.yml` file defines the Data Formulator service.  Here's a breakdown:
+
+```yaml
+
+version: '3.8'
+
+services:
+  data-formulator:
+    build:
+      context: .
+      dockerfile: Dockerfile
+    ports:
+      - "5000:5000"
+    environment:
+      - FLASK_APP=py-src/data_formulator/app.py
+      - FLASK_RUN_PORT=5000
+      - FLASK_RUN_HOST=0.0.0.0
+#Add your API keys here as environment variables, e.g.:
+      - OPENAI_API_KEY=${OPENAI_API_KEY}
+      - AZURE_API_KEY=${AZURE_API_KEY}
+      - AZURE_API_BASE=${AZURE_API_BASE}
+      - AZURE_API_VERSION=${AZURE_API_VERSION}
+      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
+      - OLLAMA_API_BASE=${OLLAMA_API_BASE}
+volumes:
+      - .:/app # Mount the current directory for development
+```
+
+*   `version: '3.8'`: Specifies the Docker Compose file version.
+*   `services: data-formulator:`: Defines a service named `data-formulator`.
+*   `build:`: Specifies how to build the image.
+    *   `context: .`:  Uses the current directory as the build context.
+    *   `dockerfile: Dockerfile`: Uses the `Dockerfile` in the current directory.
+*   `ports: - "5000:5000"`: Maps port 5000 on the host to port 5000 in the container.
+*   `environment:`: Sets environment variables inside the container.  This is where you should put your API keys.  You can either hardcode them here (not recommended for production) or use variable substitution from your shell environment (e.g., `- OPENAI_API_KEY=${OPENAI_API_KEY}`).
+*  `volumes: - .:/app`: This line mounts the project root directory to `/app` inside the container. This is very useful during development, as any changes you make to your code will be immediately reflected inside the running container without needing to rebuild the image.  **For production deployments, you should remove or comment out this line.**
+
+To run Data Formulator using Docker Compose:
+
+1.  **Set your API keys as environment variables in your shell:**
+
+    ```bash
+    export OPENAI_API_KEY=your-openai-key
+    export AZURE_API_KEY=your-azure-key
+    # ... set other API keys as needed
+    ```
+
+    Or, create a `.env` file in the project root directory and add your API keys there:
+
+    ```
+    OPENAI_API_KEY=your-openai-key
+    AZURE_API_KEY=your-azure-key
+    # ... other API keys
+    ```
+    Docker Compose will automatically read environment variables from a `.env` file in the same directory as the `docker-compose.yml` file.  **Do not commit your `.env` file to version control.**  It's included in the `.gitignore` file.
+
+2.  **Run Docker Compose:**
+
+    ```bash
+    docker-compose up --build
+    ```
+
+    *   `up`: Starts the services defined in `docker-compose.yml`.
+    *   `--build`:  Forces a rebuild of the image, even if one already exists.  Use this if you've made changes to the `Dockerfile` or your application code.
+
+    The first time you run this, Docker will download the necessary base images and build the Data Formulator image.  Subsequent runs will be faster, especially if you use the volume mount for development.
+
+3.  **Access Data Formulator:**
+
+    Open your web browser and go to `http://localhost:5000`.
+
+## Stopping Data Formulator
+
+To stop the Data Formulator container(s) when running with Docker Compose, press `Ctrl+C` in the terminal where `docker-compose up` is running.  You can also run:
+
+
diff --git a/Dockerfile b/Dockerfile
@@ -0,0 +1,46 @@
+# Use a multi-stage build to reduce final image size
+
+# Stage 1: Build the frontend
+FROM node:18 AS frontend-builder
+
+WORKDIR /app
+
+# Copy package.json and yarn.lock first to leverage Docker cache
+COPY package.json yarn.lock ./
+RUN yarn install --frozen-lockfile
+
+# Copy the rest of the frontend code
+COPY . .
+
+# Build the frontend
+RUN yarn build
+
+# Stage 2: Build the backend and create the final image
+FROM python:3.12-slim
+
+WORKDIR /app
+
+# Copy built frontend from the previous stage
+COPY --from=frontend-builder /app/py-src/data_formulator/dist /app/py-src/data_formulator/dist
+
+# Copy backend code
+COPY py-src /app/py-src
+COPY requirements.txt /app/
+COPY pyproject.toml /app/
+COPY README.md /app/
+COPY LICENSE /app/
+
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+
+# Copy the entrypoint script
+COPY docker-entrypoint.sh /app/
+
+# Make the entrypoint script executable
+RUN chmod +x /app/docker-entrypoint.sh
+
+# Expose the port the app runs on
+EXPOSE 5000
+
+# Set the entrypoint
+ENTRYPOINT ["/app/docker-entrypoint.sh"]
diff --git a/README.md b/README.md
@@ -3,7 +3,7 @@
 </h1>
 
 <div>
-    
+
 [![arxiv](https://img.shields.io/badge/Paper-arXiv:2408.16119-b31b1b.svg)](https://arxiv.org/abs/2408.16119)&ensp;
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)&ensp;
 [![YouTube](https://img.shields.io/badge/YouTube-white?logo=youtube&logoColor=%23FF0000)](https://youtu.be/3ndlwt0Wi3c)&ensp;
@@ -22,7 +22,7 @@ Transform data and create rich visualizations iteratively with AI 🪄. Try Data
 
 ## News 🔥🔥🔥
 
-- [02-20-2025] Data Formulator 0.1.6 released! 
+- [02-20-2025] Data Formulator 0.1.6 released!
   - Now supports working with multiple datasets at once! Tell Data Formulator which data tables you would like to use in the encoding shelf, and it will figure out how to join the tables to create a visualization to answer your question. 🪄
   - Checkout the demo at [[https://github.com/microsoft/data-formulator/releases/tag/0.1.6]](https://github.com/microsoft/data-formulator/releases/tag/0.1.6).
   - Update your Data Formulator to the latest version to play with the new features.
@@ -37,11 +37,11 @@ Transform data and create rich visualizations iteratively with AI 🪄. Try Data
   - We added a few visualization challenges with the sample datasets. Can you complete them all? [[try them out!]](https://github.com/microsoft/data-formulator/issues/53#issue-2641841252)
   - Comment in the issue when you did, or share your results/questions with others! [[comment here]](https://github.com/microsoft/data-formulator/issues/53)
 
-- [10-11-2024] Data Formulator python package released! 
+- [10-11-2024] Data Formulator python package released!
   - You can now install Data Formulator using Python and run it locally, easily. [[check it out]](#get-started).
   - Our Codespaces configuration is also updated for fast start up ⚡️. [[try it now!]](https://codespaces.new/microsoft/data-formulator?quickstart=1)
   - New experimental feature: load an image or a messy text, and ask AI to parse and clean it for you(!). [[demo]](https://github.com/microsoft/data-formulator/pull/31#issuecomment-2403652717)
-  
+
 - [10-01-2024] Initial release of Data Formulator, check out our [[blog]](https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/) and [[video]](https://youtu.be/3ndlwt0Wi3c)!
 
 
@@ -50,23 +50,23 @@ Transform data and create rich visualizations iteratively with AI 🪄. Try Data
 
 **Data Formulator** is an application from Microsoft Research that uses large language models to transform data, expediting the practice of data visualization.
 
-Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs* for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI. 
+Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs* for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.
 
 ## Get Started
 
 Play with Data Formulator with one of the following options:
 
 - **Option 1: Install via Python PIP**
-  
+
   Use Python PIP for an easy setup experience, running locally (recommend: install it in a virtual environment).
-  
+
   ```bash
   # install data_formulator
   pip install data_formulator
 
   # start data_formulator
-  data_formulator 
-  
+  data_formulator
+
   # alternatively, you can run data formulator with this command
   python -m data_formulator
   ```
@@ -76,15 +76,45 @@ Play with Data Formulator with one of the following options:
   *Update: you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*
 
 - **Option 2: Codespaces (5 minutes)**
-  
+
   You can also run Data Formulator in Codespaces; we have everything pre-configured. For more details, see [CODESPACES.md](CODESPACES.md).
-  
+
   [![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)
 
 - **Option 3: Working in the developer mode**
-  
+
   You can build Data Formulator locally if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).
 
+## Deployment with Docker
+
+You can easily deploy Data Formulator using Docker. This is the recommended way for production deployments.
+
+1.  **Build the Docker image:**
+
+    ```bash:README.md
+    docker build -t data-formulator .
+    ```
+
+2.  **Run the Docker container:**
+
+    ```bash
+    docker run -p 5000:5000 -e OPENAI_API_KEY=your-openai-key -e AZURE_API_KEY=your-azure-key ... data-formulator
+    ```
+
+    Replace `your-openai-key`, `your-azure-key`, etc., with your actual API keys.  See the [Configuration](#configuration) section for details on setting environment variables.
+
+    Alternatively, use Docker Compose:
+
+    ```bash
+    docker-compose up --build
+    ```
+
+3.  **Access Data Formulator:**
+
+    Open your browser and go to `http://localhost:5000`.
+
+For more detailed instructions and configuration options, see [DOCKER.md](DOCKER.md).
+
 
 ## Using Data Formulator
 
@@ -112,7 +142,7 @@ https://github.com/user-attachments/assets/160c69d2-f42d-435c-9ff3-b1229b5bddba
 
 https://github.com/user-attachments/assets/c93b3e84-8ca8-49ae-80ea-f91ceef34acb
 
-Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel. 
+Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel.
 
 ## Developers' Guide
 
@@ -123,7 +153,7 @@ Follow the [developers' instructions](DEVELOPMENT.md) to build your new data ana
 
 ```
 @article{wang2024dataformulator2iteratively,
-      title={Data Formulator 2: Iteratively Creating Rich Visualizations with AI}, 
+      title={Data Formulator 2: Iteratively Creating Rich Visualizations with AI},
       author={Chenglong Wang and Bongshin Lee and Steven Drucker and Dan Marshall and Jianfeng Gao},
       year={2024},
       booktitle={ArXiv preprint arXiv:2408.16119},
@@ -160,8 +190,8 @@ or contact [[email protected]](mailto:[email protected]) with any addi
 
 ## Trademarks
 
-This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft 
-trademarks or logos is subject to and must follow 
+This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
+trademarks or logos is subject to and must follow
 [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
 Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
 Any use of third-party trademarks or logos are subject to those third-party's policies.