Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Support for Issue Docker Support #85 #112

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ How to set up your local machine.

## Backend (Python)

- **Create a Virtual Environment**
- **Create a Virtual Environment**
```bash
python -m venv venv
.\venv\Scripts\activate
```

- **Install Dependencies**
- **Install Dependencies**
```bash
pip install -r requirements.txt
```
Expand All @@ -23,6 +23,11 @@ How to set up your local machine.
- required fields for different providers are different, please refer to the [LiteLLM setup](https://docs.litellm.ai/docs#litellm-python-sdk) guide for more details.
- currently only endpoint, model, api_key, api_base, api_version are supported.
- this helps data formulator to automatically load the API keys when you run the app, so you don't need to set the API keys in the app UI.
- **Run the app (using Docker - Recommended for development)**
```bash
docker-compose up --build
```
This will build the Docker image and start the application in a container. Any changes you make to the code will be automatically reflected in the running application thanks to the volume mount in `docker-compose.yml`.

- **Run the app**
- **Windows**
Expand All @@ -37,8 +42,8 @@ How to set up your local machine.

## Frontend (TypeScript)

- **Install NPM packages**
- **Install NPM packages**

```bash
yarn
```
Expand All @@ -60,7 +65,7 @@ How to set up your local machine.
```bash
yarn build
```
This builds the app for production to the `py-src/data_formulator/dist` folder.
This builds the app for production to the `py-src/data_formulator/dist` folder.

Then, build python package:

Expand All @@ -74,15 +79,15 @@ How to set up your local machine.

You can then install the build result wheel (testing in a virtual environment is recommended):
```bash
# replace <version> with the actual build version.
pip install dist/data_formulator-<version>-py3-none-any.whl
# replace <version> with the actual build version.
pip install dist/data_formulator-<version>-py3-none-any.whl
```

Once installed, you can run Data Formulator with:
```bash
data_formulator
```
or
or
```bash
python -m data_formulator
```
Expand Down
124 changes: 124 additions & 0 deletions DOCKER.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Docker Setup and Deployment

This document provides detailed instructions on setting up and deploying Data Formulator using Docker.

## Prerequisites

* Docker installed and running on your system. You can download Docker from [https://www.docker.com/products/docker-desktop](https://www.docker.com/products/docker-desktop).
* Docker Compose installed. Docker Desktop includes Docker Compose. If you're not using Docker Desktop, follow the instructions at [https://docs.docker.com/compose/install/](https://docs.docker.com/compose/install/).

## Building the Docker Image

The `Dockerfile` contains instructions for building a Docker image that includes both the frontend and backend of Data Formulator. It uses a multi-stage build process to minimize the final image size.

To build the image, run the following command from the root directory of the project:

```bash
docker build -t data-formulator .
```

## Running the Docker Container

To run the container, run the following command from the root directory of the project:

```bash
docker run -p 5000:5000 data-formulator
```

This will create a Docker image named `data-formulator`.

## Running Data Formulator with Docker

### Using `docker run`

You can run Data Formulator directly using the `docker run` command. This is useful for quick testing or simple deployments.

```bash

docker run -p 5000:5000 -e OPENAI_API_KEY=your-openai-key -e AZURE_API_KEY=your-azure-key ... data-formulator
```

* `-p 5000:5000`: This maps port 5000 on your host machine to port 5000 inside the container. Data Formulator runs on port 5000 by default.
* `-e VAR=value`: This sets environment variables inside the container. You **must** provide your API keys and other configuration settings using environment variables. See the [Configuration](#configuration) section in `README.md` for a complete list of supported environment variables. Replace placeholders like `your-openai-key` with your actual API keys.
* `data-formulator`: This is the name of the Docker image you built earlier.

### Using Docker Compose (Recommended)

Docker Compose simplifies the process of running multi-container applications. Data Formulator, while technically a single service, benefits from Docker Compose for managing environment variables and simplifying the startup process.

The `docker-compose.yml` file defines the Data Formulator service. Here's a breakdown:

```yaml

version: '3.8'

services:
data-formulator:
build:
context: .
dockerfile: Dockerfile
ports:
- "5000:5000"
environment:
- FLASK_APP=py-src/data_formulator/app.py
- FLASK_RUN_PORT=5000
- FLASK_RUN_HOST=0.0.0.0
#Add your API keys here as environment variables, e.g.:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- AZURE_API_KEY=${AZURE_API_KEY}
- AZURE_API_BASE=${AZURE_API_BASE}
- AZURE_API_VERSION=${AZURE_API_VERSION}
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
- OLLAMA_API_BASE=${OLLAMA_API_BASE}
volumes:
- .:/app # Mount the current directory for development
```

* `version: '3.8'`: Specifies the Docker Compose file version.
* `services: data-formulator:`: Defines a service named `data-formulator`.
* `build:`: Specifies how to build the image.
* `context: .`: Uses the current directory as the build context.
* `dockerfile: Dockerfile`: Uses the `Dockerfile` in the current directory.
* `ports: - "5000:5000"`: Maps port 5000 on the host to port 5000 in the container.
* `environment:`: Sets environment variables inside the container. This is where you should put your API keys. You can either hardcode them here (not recommended for production) or use variable substitution from your shell environment (e.g., `- OPENAI_API_KEY=${OPENAI_API_KEY}`).
* `volumes: - .:/app`: This line mounts the project root directory to `/app` inside the container. This is very useful during development, as any changes you make to your code will be immediately reflected inside the running container without needing to rebuild the image. **For production deployments, you should remove or comment out this line.**

To run Data Formulator using Docker Compose:

1. **Set your API keys as environment variables in your shell:**

```bash
export OPENAI_API_KEY=your-openai-key
export AZURE_API_KEY=your-azure-key
# ... set other API keys as needed
```

Or, create a `.env` file in the project root directory and add your API keys there:

```
OPENAI_API_KEY=your-openai-key
AZURE_API_KEY=your-azure-key
# ... other API keys
```
Docker Compose will automatically read environment variables from a `.env` file in the same directory as the `docker-compose.yml` file. **Do not commit your `.env` file to version control.** It's included in the `.gitignore` file.

2. **Run Docker Compose:**

```bash
docker-compose up --build
```

* `up`: Starts the services defined in `docker-compose.yml`.
* `--build`: Forces a rebuild of the image, even if one already exists. Use this if you've made changes to the `Dockerfile` or your application code.

The first time you run this, Docker will download the necessary base images and build the Data Formulator image. Subsequent runs will be faster, especially if you use the volume mount for development.

3. **Access Data Formulator:**

Open your web browser and go to `http://localhost:5000`.

## Stopping Data Formulator

To stop the Data Formulator container(s) when running with Docker Compose, press `Ctrl+C` in the terminal where `docker-compose up` is running. You can also run:


46 changes: 46 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Use a multi-stage build to reduce final image size

# Stage 1: Build the frontend
FROM node:18 AS frontend-builder

WORKDIR /app

# Copy package.json and yarn.lock first to leverage Docker cache
COPY package.json yarn.lock ./
RUN yarn install --frozen-lockfile

# Copy the rest of the frontend code
COPY . .

# Build the frontend
RUN yarn build

# Stage 2: Build the backend and create the final image
FROM python:3.12-slim

WORKDIR /app

# Copy built frontend from the previous stage
COPY --from=frontend-builder /app/py-src/data_formulator/dist /app/py-src/data_formulator/dist

# Copy backend code
COPY py-src /app/py-src
COPY requirements.txt /app/
COPY pyproject.toml /app/
COPY README.md /app/
COPY LICENSE /app/

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the entrypoint script
COPY docker-entrypoint.sh /app/

# Make the entrypoint script executable
RUN chmod +x /app/docker-entrypoint.sh

# Expose the port the app runs on
EXPOSE 5000

# Set the entrypoint
ENTRYPOINT ["/app/docker-entrypoint.sh"]
62 changes: 46 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
</h1>

<div>

[![arxiv](https://img.shields.io/badge/Paper-arXiv:2408.16119-b31b1b.svg)](https://arxiv.org/abs/2408.16119)&ensp;
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)&ensp;
[![YouTube](https://img.shields.io/badge/YouTube-white?logo=youtube&logoColor=%23FF0000)](https://youtu.be/3ndlwt0Wi3c)&ensp;
Expand All @@ -22,7 +22,7 @@ Transform data and create rich visualizations iteratively with AI 🪄. Try Data

## News 🔥🔥🔥

- [02-20-2025] Data Formulator 0.1.6 released!
- [02-20-2025] Data Formulator 0.1.6 released!
- Now supports working with multiple datasets at once! Tell Data Formulator which data tables you would like to use in the encoding shelf, and it will figure out how to join the tables to create a visualization to answer your question. 🪄
- Checkout the demo at [[https://github.com/microsoft/data-formulator/releases/tag/0.1.6]](https://github.com/microsoft/data-formulator/releases/tag/0.1.6).
- Update your Data Formulator to the latest version to play with the new features.
Expand All @@ -37,11 +37,11 @@ Transform data and create rich visualizations iteratively with AI 🪄. Try Data
- We added a few visualization challenges with the sample datasets. Can you complete them all? [[try them out!]](https://github.com/microsoft/data-formulator/issues/53#issue-2641841252)
- Comment in the issue when you did, or share your results/questions with others! [[comment here]](https://github.com/microsoft/data-formulator/issues/53)

- [10-11-2024] Data Formulator python package released!
- [10-11-2024] Data Formulator python package released!
- You can now install Data Formulator using Python and run it locally, easily. [[check it out]](#get-started).
- Our Codespaces configuration is also updated for fast start up ⚡️. [[try it now!]](https://codespaces.new/microsoft/data-formulator?quickstart=1)
- New experimental feature: load an image or a messy text, and ask AI to parse and clean it for you(!). [[demo]](https://github.com/microsoft/data-formulator/pull/31#issuecomment-2403652717)

- [10-01-2024] Initial release of Data Formulator, check out our [[blog]](https://www.microsoft.com/en-us/research/blog/data-formulator-exploring-how-ai-can-help-analysts-create-rich-data-visualizations/) and [[video]](https://youtu.be/3ndlwt0Wi3c)!


Expand All @@ -50,23 +50,23 @@ Transform data and create rich visualizations iteratively with AI 🪄. Try Data

**Data Formulator** is an application from Microsoft Research that uses large language models to transform data, expediting the practice of data visualization.

Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs* for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.
Data Formulator is an AI-powered tool for analysts to iteratively create rich visualizations. Unlike most chat-based AI tools where users need to describe everything in natural language, Data Formulator combines *user interface interactions (UI)* and *natural language (NL) inputs* for easier interaction. This blended approach makes it easier for users to describe their chart designs while delegating data transformation to AI.

## Get Started

Play with Data Formulator with one of the following options:

- **Option 1: Install via Python PIP**

Use Python PIP for an easy setup experience, running locally (recommend: install it in a virtual environment).

```bash
# install data_formulator
pip install data_formulator

# start data_formulator
data_formulator
data_formulator

# alternatively, you can run data formulator with this command
python -m data_formulator
```
Expand All @@ -76,15 +76,45 @@ Play with Data Formulator with one of the following options:
*Update: you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*

- **Option 2: Codespaces (5 minutes)**

You can also run Data Formulator in Codespaces; we have everything pre-configured. For more details, see [CODESPACES.md](CODESPACES.md).

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)

- **Option 3: Working in the developer mode**

You can build Data Formulator locally if you prefer full control over your development environment and the ability to customize the setup to your specific needs. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).

## Deployment with Docker

You can easily deploy Data Formulator using Docker. This is the recommended way for production deployments.

1. **Build the Docker image:**

```bash:README.md
docker build -t data-formulator .
```

2. **Run the Docker container:**

```bash
docker run -p 5000:5000 -e OPENAI_API_KEY=your-openai-key -e AZURE_API_KEY=your-azure-key ... data-formulator
```

Replace `your-openai-key`, `your-azure-key`, etc., with your actual API keys. See the [Configuration](#configuration) section for details on setting environment variables.

Alternatively, use Docker Compose:

```bash
docker-compose up --build
```

3. **Access Data Formulator:**

Open your browser and go to `http://localhost:5000`.

For more detailed instructions and configuration options, see [DOCKER.md](DOCKER.md).


## Using Data Formulator

Expand Down Expand Up @@ -112,7 +142,7 @@ https://github.com/user-attachments/assets/160c69d2-f42d-435c-9ff3-b1229b5bddba

https://github.com/user-attachments/assets/c93b3e84-8ca8-49ae-80ea-f91ceef34acb

Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel.
Repeat this process as needed to explore and understand your data. Your explorations are trackable in the **Data Threads** panel.

## Developers' Guide

Expand All @@ -123,7 +153,7 @@ Follow the [developers' instructions](DEVELOPMENT.md) to build your new data ana

```
@article{wang2024dataformulator2iteratively,
title={Data Formulator 2: Iteratively Creating Rich Visualizations with AI},
title={Data Formulator 2: Iteratively Creating Rich Visualizations with AI},
author={Chenglong Wang and Bongshin Lee and Steven Drucker and Dan Marshall and Jianfeng Gao},
year={2024},
booktitle={ArXiv preprint arXiv:2408.16119},
Expand Down Expand Up @@ -160,8 +190,8 @@ or contact [[email protected]](mailto:[email protected]) with any addi

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
Loading