Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat GPT as a model to reformulate queries. #37

Open
yogeswarl opened this issue Aug 13, 2023 · 15 comments
Open

Chat GPT as a model to reformulate queries. #37

yogeswarl opened this issue Aug 13, 2023 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@yogeswarl
Copy link
Member

This Idea involves us asking ChatGPT to generate relevant queries based off the documents we feed.

We will sample about 10,000 documents that have the following criteria from the refined querys

  • Highly relevant ( Map score from 0.75 - 1)
  • Relevant ( 0.5 - 0.74)
  • Somewhat Relevant (0.25-0.49)
  • Irrelevant(0 - 0.24)

Based on this data. We will be comparing how well ChatGPT can perform suggesting documents for these queries.

@yogeswarl yogeswarl self-assigned this Aug 13, 2023
@yogeswarl yogeswarl added the enhancement New feature or request label Aug 13, 2023
@hosseinfani
Copy link
Member

@yogeswarl
thanks for creating this issue.

Please note that we need chatgbt to generate the query reformulations. So, the last sentence, "chatgbt can perform suggesting documents for these queries" is not correct, to my understanding. right?

basically, we ask chatgbt in these ways:

1- here is the query q, please give us 10 reformulations/paraphrases of it?
2- here is the query q and its all relevant documents, give us 10 reformulations/paraphrases of the query?

It's like using T5 when it is trained and we asked for predictions.

@yogeswarl
Copy link
Member Author

pointer 2 is correct. This is what we will be doing!
Our T5 once trained will be fed with relevant documents and it will generate queries. That is what we will infer from chatGPT as well.

@hosseinfani
Copy link
Member

For the comparison, please do these variations:

1- [like a expander] here is the query q, please give us 10 reformulations/paraphrases of it?
2- [like pretrained t5] here are all relevant documents of the query q, give us 10 reformulations/paraphrases of the query?
3- [like fine-tunned t5] here is the query and all relevant documents of the query q, give us 10 reformulations/paraphrases of the query?

Thank you.

@yogeswarl
Copy link
Member Author

Understood.

@yogeswarl
Copy link
Member Author

yogeswarl commented Aug 25, 2023

Screenshot 2023-08-24 at 9 01 59 PM
Hello Dr. @hosseinfani I have written function to test out chatGPT's capabilities. I got a paid edition and still I have been having server errors. This issue needs to be handled gracefully. but it will occur every 15 minutes due to the overload in server.
Do you have any suggestions as to how to handle this issue.

@yogeswarl
Copy link
Member Author

Update: the above issue has been solved with the use of retrying package.

@yogeswarl
Copy link
Member Author

yogeswarl commented Aug 30, 2023

hello @hosseinfani ,
Here are the stats and graphical representation of GPT done on msmarco.passage
I can run for one more because the predictions take too long to complete.
Some stats:

query_category	query_length	mean_map
paraphrase_poor_gpt_query_mean_length	40.421	0.12944940000000002
paraphrase_poor_refined_query_mean_length	43.193	0.45138459999999997
paraphrase_poor_original_query_mean_length	37.004	0.0359629
paraphrase_somewhat_gpt_query_mean_length	40.398	0.4748991
paraphrase_somewhat_refined_query_mean_length	42.762	0.7004480999999999
paraphrase_somewhat_original_query_mean_length	34.964	0.29953789999999997
paraphrase_relevant_gpt_query_mean_length	41.969	0.7646921
paraphrase_relevant_refined_query_mean_length	42.971	0.8539836000000001
paraphrase_relevant_original_query_mean_length	39.078	0.8028525
finetune_poor_gpt_query_mean_length	60.017	0.5468902000000001
finetune_poor_refined_query_mean_length	43.193	0.45138459999999997
finetune_poor_original_query_mean_length	37.004	0.0359629
finetune_somewhat_gpt_query_mean_length	58.303	0.8195252000000001
finetune_somewhat_refined_query_mean_length	42.762	0.7004480999999999
finetune_somewhat_original_query_mean_length	34.964	0.29953789999999997
finetune_relevant_gpt_query_mean_length	56.367	0.8946335000000001
finetune_relevant_refined_query_mean_length	42.971	0.8539836000000001
finetune_relevant_original_query_mean_length	39.078	0.8028525
infer_poor_gpt_query_mean_length	55.03	0.6592458
infer_poor_refined_query_mean_length	43.193	0.45138459999999997
infer_poor_original_query_mean_length	37.004	0.0359629
infer_somewhat_gpt_query_mean_length	53.818	0.8387605000000001
infer_somewhat_refined_query_mean_length	42.762	0.7004480999999999
infer_somewhat_original_query_mean_length	34.964	0.29953789999999997
infer_relevant_gpt_query_mean_length	51.396	0.9012704
infer_relevant_refined_query_mean_length	42.971	0.8539836000000001
infer_relevant_original_query_mean_length	39.078	0.8028525

Some graphical representation:
finetune_plot
infer_plot
paraphrase_plot

@yogeswarl
Copy link
Member Author

I am running another set of poor,somewhat and relevant for user reformulation for aol title url.

@hosseinfani
Copy link
Member

@yogeswarl
can you please explain what the categories are and put a paragraph of analysis here?

@yogeswarl
Copy link
Member Author

We had 3 thresholds: "poor" where original queries were from 0,0.24, "somewhat" = 0.25,0.49, "relevant" = 0.5,1.0
I went with the 3 categories you asked for.
ChatGPT as an inference model -> pass only the documents
ChatGPT as a paraphrase model -> pass only the queries
ChatGPT as a finetuning model -> pass docs and query in tab separated.

Inference and Finetuned model performed better than T5 and original queries.
Two issues arise with chatGPT: one is the time to run the model. it runs through an inference API so it is painstakingly slow.
One prediction takes approximately 2 -3s according to the tqdm library. But t5 does one prediction under a second.

The stats are also posted in the above comment with the average mean query length and mean map

@yogeswarl
Copy link
Member Author

There should be only one barplot for this. I am going to make it more optimized

@hosseinfani
Copy link
Member

@yogeswarl
thank you. Can you find the research paper for the chatgpt? we need to see why it's better than t5. Is it due to architecture or training dataset or ...

@yogeswarl
Copy link
Member Author

@hosseinfani
https://arxiv.org/pdf/2005.14165.pdf
here is the paper. I will give a quick summary about this by tomorrow evening.

@yogeswarl
Copy link
Member Author

yogeswarl commented Sep 1, 2023

From the paper I was able to delve these as why ChatGPT is better and few more things I did that were not considered in chatGPT.

  1. Unlike T5 for which we set a max input length of 512. I fed the whole document collection to chatGPT.
  2. Unlike T5, chatGPT is a conversational large language model for which the parameter size is 175B!! compared to t5 base which is 220 Million alone.
  3. ChatGPT is specifically designed for generating human-like text in a conversational context. It's fine-tuned on dialogue data to provide contextually relevant response. T5, on the other hand, is a more general-purpose model that approaches various NLP tasks as text-to-text tasks.

One option I can think of here is to stop the number of words chatGPT can see, (i.e 512 words) and compare them both with respect to T5.

One problem with chatGPT is that it cannot limit the number of characters to the point like T5. the maximum output length is always much greater than both the mean of t5 and original query
@hosseinfani . Should I redo this for fine-tune and inference

@yogeswarl
Copy link
Member Author

plot
plot
Made plots much smaller and cleaner

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants