Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/stanford-oval/suql into main
Browse files Browse the repository at this point in the history
  • Loading branch information
george1459 committed Aug 10, 2024
2 parents 13678bf + 9863193 commit 3e5bb37
Show file tree
Hide file tree
Showing 5 changed files with 1,963 additions and 8 deletions.
27 changes: 20 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,25 @@ To replicate our results on HybridQA and restaurants in our paper, see [paper_re
# Citation

If you find this work useful to you, please consider citing us.

```
@inproceedings{liu2024suql,
title={SUQL: Conversational Search over Structured and Unstructured Data with Large Language Models},
author={Shicheng Liu and Jialiang Xu and Wesley Tjangnaka and Sina J. Semnani and Chen Jie Yu and Monica S. Lam},
booktitle = {Findings of the Association for Computational Linguistics: NAACL 2024},
year={2024}
@inproceedings{liu-etal-2024-suql,
title = "{SUQL}: Conversational Search over Structured and Unstructured Data with Large Language Models",
author = "Liu, Shicheng and
Xu, Jialiang and
Tjangnaka, Wesley and
Semnani, Sina and
Yu, Chen and
Lam, Monica",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-naacl.283",
pages = "4535--4555",
abstract = "While most conversational agents are grounded on either free-text or structured knowledge, many knowledge corpora consist of hybrid sources.This paper presents the first conversational agent that supports the full generality of hybrid data access for large knowledge corpora, through a language we developed called SUQL ($\textbf{S}$tructured and $\textbf{U}$nstructured $\textbf{Q}$uery $\textbf{L}$anguage). Specifically, SUQL extends SQL with free-text primitives (${\small \text{SUMMARY}}$ and ${\small \text{ANSWER}}$), so information retrieval can be composed with structured data accesses arbitrarily in a formal, succinct, precise, and interpretable notation. With SUQL, we propose the first semantic parser, an LLM with in-context learning, that can handle hybrid data sources.Our in-context learning-based approach, when applied to the HybridQA dataset, comes within 8.9{\%} Exact Match and 7.1{\%} F1 of the SOTA, which was trained on 62K data samples. More significantly, unlike previous approaches, our technique is applicable to large databases and free-text corpora. We introduce a dataset consisting of crowdsourced questions and conversations on Yelp, a large, real restaurant knowledge base with structured and unstructured data. We show that our few-shot conversational agent based on SUQL finds an entity satisfying all user requirements 90.3{\%} of the time, compared to 63.4{\%} for a baseline based on linearization.",
}
```
```
6 changes: 5 additions & 1 deletion docs/paper_results.md
Original file line number Diff line number Diff line change
@@ -1 +1,5 @@
Coming next as we finish wrapping up this repo as a package!
# Restaurants dataset

We based our experiments on a total of 1828 restaurants from Yelp.com across 4 cities. The list of restaurants' ids used in our study can be found at `paper_data/restaurants_list.txt`. For instance, for `NX8VYnWFQ2ZY-H0HwvfPTw`, the corresponding restaurant is https://www.yelp.com/biz/NX8VYnWFQ2ZY-H0HwvfPTw. For each restaurant, the top 20 reviews and top 20 popular dishes were used for our experiments.

We collected 100 single-turn questions via Prolific. The results of the SUQL system can be found at `paper_data/restaurants_single_turn_SUQL.csv`. Here is a breakdown of what the columns mean: **Unique ID** (a unique row identifier), **Utterance** (the collected user question), **Predicted SUQL** (the predicted SUQL with gpt-3.5-turbo-0613), **Agent response** (the SUQL system's response), **1st entity**, **2nd entity**, **3rd entity** (the up-to-three returned entities), **Entity Annotation**,**Whether Wrong Parse** (whether this is a wrong SUQL parse), **Prolific ID** (prolific ID associated with the submission), **Structural Unstructural annotation** (whether the question requires only structured data or a combination, used for Table 4 in the paper). The false positives from the SUQL system (based on annotation) can be found in `paper_data/restaurants_single_turn_SUQL_fp.txt`
Loading

0 comments on commit 3e5bb37

Please sign in to comment.