Skip to content

[NAACL'25] DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Applications

Notifications You must be signed in to change notification settings

ntuspeechlab/DiaSynth

Repository files navigation

DiaSynth: Synthetic Dialogue Generation Framework [NAACL'25]

DiaSynth is a synthetic dialogue generation framework designed for low-resource dialogue applications. By leveraging Large Language Models (LLMs) and Chain-of-Thought (CoT) reasoning, DiaSynth creates high-quality, persona-driven dialogues for various domains.

🚀 Key Features: • Scalable Synthetic Dialogue Generation • Persona-Conditioned & Topic-Specific Conversations • Multi-Domain Dialogue Generation (SAMSum, DialogSum, etc.) • Pre-trained LLM Support (Phi-3, LLaMA, GPT-4o, etc.)

📥 Installation

1️⃣ Clone the Repository

git clone https://github.com/yourusername/DiaSynth.git
cd DiaSynth

2️⃣ Set Up Virtual Environment

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3️⃣ Install llama-cpp-python (for GGUF model)

CMAKE_ARGS="-DGGML_CUDA=on -DCUDAToolkit_ROOT=/usr/local/cuda-11.8" FORCE_CMAKE=1 pip install llama-cpp-python --no-cache-dir

4️⃣ Download & Configure Model Files

Download the Phi-3 GGUF Model • Phi-3-mini-128k-instruct (4GB) • Place it inside the folder:

mkdir -p models/phi3
mv model.gguf models/phi3/

Download Tokenizer • Download from microsoft/Phi-3-mini-128k-instruct • Save it in:

mkdir -p models/phi3/tokenizer
mv tokenizer.json models/phi3/tokenizer/

🚀 Usage

🔹 Generate Synthetic Dialogues

Run the following commands to generate synthetic dialogues using Phi-3:

python3.10 -m diasynth.main --topics_file_path "topics.txt" --n_sub_topics 6 --stage="scratch" \
    --csv_path CSV_PATH --dialogue_base "dialoguesum"

🔹 Modify Topics • Update topics.txt to include your custom topics for dialogue generation.

🎯 Features

✅ Flexible Topic Expansion – Generate subtopics for broader coverage.
✅ Persona-Based Conversations – Customize dialogues with realistic characters.
✅ Multi-Turn Dialogue Generation – Supports structured and free-flowing interactions.
✅ Scalable Data Synthesis – Generate thousands of dialogues efficiently.
✅ Pretrained Model Fine-tuning – Improve performance with synthetic data.

📊 Example Outputs

Example: Generated Dialogue (Healthcare Domain)

Doctor: Good morning! How can I assist you today?
Patient: I've been feeling dizzy and fatigued for the past few days.
Doctor: I see. Have you experienced any headaches or blurred vision?
Patient: Yes, occasionally. It gets worse in the afternoon.
Doctor: Based on your symptoms, we might need to check your blood pressure.

About

[NAACL'25] DiaSynth: Synthetic Dialogue Generation Framework for Low Resource Applications

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages