- 🔥🔥 (Oct'24)
TimeSeriesExam
was accepted to to the NeurIPS'24 Workshop on Foundation Models for Time Series: Exploring New Frontiers, and ICAIF'24 Workshop on Time Series in the Age of Large Models as a spotlight papers!
Large Language Models (LLMs) have recently demonstrated a remarkable ability to model time series data. These capabilities can be partly explained if LLMs understand basic time series concepts. However, our knowledge of what these models understand about time series data remains relatively limited. To address this gap, we introduce TimeSeriesExam, a configurable and scalable multiple-choice question exam designed to assess LLMs across five core time series understanding categories: pattern recognition, noise understanding, similarity analysis, anomaly detection, and causality analysis.
Figure. 1: Accuracy of latest LLMs on the TimeSeriesExam.
Closed-source LLMs outperform open-source ones in simple understanding tasks, but most models struggle with complex reasoning tasks.
Time series in the dataset are created from a combination of diverse baseline Time series objects. The baseline objects cover linear/non-linear signals and cyclic patterns.
Figure. 2: The pipeline enables diversity by combining different components to create numerous synthetic time series with varying properties.
This step ensures you have the necessary tools and libraries to run the evaluation scripts.
These commands create a new conda environment named ts_exam with Python 3.12.0, activate the newly created environment, and install the required libraries listed in the requirements.txt
file using pip:
> conda create -n "ts_exam" python=3.12.0
> conda activate ts_exam
> pip install -r requirements.txt
If you're using a closed-source model like GPT-4, you'll need an API key to interact with its service. Here are some security best practices to follow when managing your API key:
- Store Securely: Don't embed your API key directly in the code or scripts. Consider using environment variables or secure credential management tools.
- Minimize Exposure: Limit who has access to your API key and avoid sharing it publicly.
- Monitor Usage: Keep track of API key usage to identify any suspicious activity.
We recommend that you refer to the best practices outlined in OpenAI's documentation.
This inference repository uses two bash scripts located in the evaluate
directory for evaluating datasets. To run a specific evaluation script, navigate to the project directory in your terminal and execute the following command, replacing evaluate_file_name.sh
with the actual script name:
> sh evaluate/evaluate_file_name.sh
We provide description for hyperparameters that can be changed for evaluation. You can set them in the bash file provided above.
data_file_path
(string): Path to the JSON file containing the QA dataset.- We provide dataset created after each round of improvement. In the paper we primarily evaluated the last round dataset (round 3). These datasets are put under
output/round_idx_folder/qa_dataset.json
model_name
(string): The model to evaluate.
Note
We currently support 4 closed-source and 3 open-weight models:
- OpenAI's GPT-4o mini ("gpt-4o-mini") and GPT-4o ("gpt-4o"),
- Anthropic's Claude 3.5 Sonnet ("claude-3-5-sonnet-20240620"),
- Google's Gemini-1.5 Pro ("gemini-1.5-pro"),
- OpenBMB's MiniCPM-V 2.6 ("openbmb/MiniCPM-V-2_6"), and
- Microsoft's Phi-3.5-vision ("microsoft/Phi-3.5-vision-instruct") and Phi-3.5-mini ("microsoft/Phi-3.5-mini-instruct")
seed
(integer): Random seed to control randomness during generation.max_tokens
(integer): Maximum number of new tokens the model can generate for the answer.temperature
(float): Controls the randomness of the generated text. Higher values lead to more surprising outputs.
output_file_path
(string): Path to the JSON file where the results will be saved.
image_cache_dir
(string, optional): Path to a directory where intermediate images generated during inference will be saved.
ts_tokenizer_name
(string, optional): Choose between 'image' or 'plain_text' depending on the input data format. Defaults to 'plain_text'.add_question_hint
(boolean, optional): If True, a question hint will be provided to the model as additional context.add_concepts
(boolean, optional): If True, a list of relevant concepts will be provided to the model as additional context.add_examples
(boolean, optional): If True andadd_concepts
is also True, example time series illustrating the concepts will be provided to the model.
To integrate a new model, follow these steps:
- Open
evaluate/evaluation_utils.py
. - Define custom
query
andformat
functions for your model, following the structure of the existing functions in this file. These functions determine how queries are sent to the model and how responses are formatted for evaluation.
- Go to the file
evaluate/llm_api.py
. - Import the
query
andformat
functions fromevaluate/evaluation_utils.py
- Add the model’s details to the specified global variable in this file. This step registers your model so it can be accessed and used within the system.
If you find this work helpful, please consider citing our paper:
@inproceedings{caitimeseriesexam,
title={TimeSeriesExam: A Time Series Understanding Exam},
author={Cai, Yifu and Choudhry, Arjun and Goswami, Mononito and Dubrawski, Artur},
booktitle={NeurIPS Workshop on Time Series in the Age of Large Models}
}
MIT License
Copyright (c) 2024 Auton Lab, Carnegie Mellon University
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
See MIT LICENSE for details.