Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

This repository contains code for training and evaluating the models in the paper Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning.

Autoregressive language models, despite their impressive capabilities, struggle with complex reasoning and long-term planning tasks. Can we go beyond autoregression for these challanges?
First, what is planning essentially? We design a straightforward task to minimally illustrate planning, where we can also control the extent of planning through a term called Planning Distance. We find AR struggles a lot on this simple task.
Then, we delve into the comparison between the objective of autoregression and discrete diffusion, and demonstrate how discrete diffusion models effectively learn difficult subgoals that elude autoregressive models.
Based on above, we further introduce Multi-granularity Diffusion Modeling (MDM), which prioritizes subgoals based on difficulty during learning. We find MDM significantly outperforms AR on various more complex reasoning and planning challanges.

Setup

All required packages can be found in requirements.txt. You can install them in a new environment with

conda create -n diffusion python=3.9
conda activate diffusion
git clone [email protected]:HKUNLP/diffusion-vs-ar.git

cd diffusion-vs-ar
pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html

Usage

Training and evaluation commands are provided under the scripts directory. Download data from here first. The synthetic planning dataset can also be generated using data/synthetic_graph.py script.

# run AR (training from scratch)
bash scripts/sudoku/train-sft.sh

# run AR (finetuning from LLaMA)
bash scripts/sudoku/train-sft-llama-7b.sh

# run Diffusion (training from scratch)
bash scripts/sudoku/train-mdm.sh

(📌check our work on scaling diffusion langauge model by adapting from LLaMA at https://github.com/HKUNLP/DiffuLLaMA)

For experiment on different model size, change --model_name_or_path to model_config_tiny (~6M), model_config (~85M) or model_config_medium (~303M). A slightly larger learning rate (i.e., 1e-3) is used for the tiny model.

For experiment on different datasets, change --dataset (dataset name in data/dataset_info.json) and adjust --cutoff_len (make sure equal or larger than the largest token length on that dataset). For AR, make sure the --max_new_tokens during generation is also larger than that seen in the training time. Here are the cutoff_len and --max_new_tokens used in the paper:

	Minimal Planning	Countdown 3	Countdown 4	Countdown 5	Sudoku	3-SAT 5v	3-SAT 7v	3-SAT 9v
cutoff_len	75	37	64	74	164	258	285	325
max_new_tokens (sft)	24	24	32	54	82	10	14	18

Please refer to Appendix C.2 for other illustraion of implementation details.

Citation

If you find our code or data helpful, please cite us as follows

@article{ye2024beyond,
  title={Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning},
  author={Ye, Jiacheng and Gao, Jiahui and Gong, Shansan and Zheng, Lin and Jiang, Xin and Li, Zhenguo and Kong, Lingpeng},
  journal={arXiv preprint arXiv:2410.14157},
  year={2024}
}

The code framework is adapted from LLaMAFactory, thanks for their great work.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
model_config		model_config
model_config_medium		model_config_medium
model_config_tiny		model_config_tiny
scripts		scripts
src		src
README.md		README.md
requirements.txt		requirements.txt
simple_task.png		simple_task.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Setup

Usage

Citation

About

Releases

Packages

Languages

HKUNLP/diffusion-vs-ar

Folders and files

Latest commit

History

Repository files navigation

Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning

Setup

Usage

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages