Skip to content
/ AutoTS Public
forked from winedarksea/AutoTS

Automated Time Series Forecasting

License

Notifications You must be signed in to change notification settings

lewuyou/AutoTS

 
 

Repository files navigation

AutoTS

AutoTS 是一个 Python 时间序列包,旨在大规模快速部署高精度预测。

2023 年,AutoTS 在 M6 预测竞赛中获胜,在 12 个月的股市预测中提供了最高绩效的投资决策。

有数十种预测模型可用于sklearn风格的.fit().predict()。 其中包括朴素、统计、机器学习和深度学习模型。 此外,在sklearn风格的.fit().transform().inverse_transform()中,还有超过 30 种特定于时间序列的变换。 所有这些功能都直接在 Pandas Dataframes 上运行,无需转换为专有对象。

所有模型都支持预测多元(多个时间序列)输出,并且还支持概率(上限/下限)预测。 大多数模型可以轻松扩展到数万甚至数十万个输入系列。 许多模型还支持传入用户定义的外生回归量。

这些模型均设计用于集成到 AutoML 特征搜索中,该搜索可通过遗传算法自动查找给定数据集的最佳模型、预处理和集成。

水平(Horizontal)和马赛克风格(mosaic style)的组合是旗舰组合类型,允许每个系列接收最准确的模型,同时仍然保持可扩展性。

指标和交叉验证选项的组合、应用子集和加权的能力、回归器生成工具、模拟预测模式、事件风险预测、实时数据集、模板导入和导出、绘图以及数据整形参数的集合使 可用的功能集。

目录

安装

pip install autots

这包括基本模型的依赖项,但某些模型和方法需要附加包

请注意,还有其他几个项目选择了类似的名称,因此请确保您使用的是正确的 AutoTS 代码、论文和文档。.

基本使用

AutoTS 的输入数据预计采用 格式: (long or wide

  • wide 格式是一个带有pandas.DatetimeIndexpandas.DataFrame,每列都是一个不同的series。
  • long 格式包含三列:
    • Date(最好已经是 pandas 识别的 日期时间 格式)。
    • Series ID. 对于单个时间序列,series_id 可以 = None
    • Value
  • 对于 long 数据,每个数据的列名称都会作为date_colid_colvalue_col传递给``.fit()`。 wide 数据不需要参数。

Lower-level 的函数仅针对 wide类型数据而设计。

# 其他载入选项: _hourly, _monthly, _weekly, _yearly, or _live_daily
from autots import AutoTS, load_daily

# 示例数据集可用于*长*导入形状或*宽*导入形状
long = False
df = load_daily(long=long)

model = AutoTS(
    forecast_length=21,
    frequency="infer",
    prediction_interval=0.9,
    ensemble=None,
    model_list="superfast",  # "fast", "default", "fast_parallel"
    transformer_list="fast",  # "superfast",
    drop_most_recent=1,
    max_generations=4,
    num_validations=2,
    validation_method="backwards"
)
model = model.fit(
    df,
    date_col='datetime' if long else None,
    value_col='value' if long else None,
    id_col='series_id' if long else None,
)

prediction = model.predict()
# 绘制一个样本
prediction.plot(model.df_wide_numeric,
                series=model.df_wide_numeric.columns[0],
                start_date="2019-01-01")
# 打印最佳模型的详细信息
print(model)

# 点预测 dataframe
forecasts_df = prediction.forecast
# 预测上限和下限
forecasts_up, forecasts_low = prediction.upper_forecast, prediction.lower_forecast

# 所有尝试的模型结果的准确性
model_results = model.results()
# 并从交叉验证中汇总
validation_results = model.results("validation")

The lower-level API, in particular the large section of time series transformers in the scikit-learn style, can also be utilized independently from the AutoML framework.

Check out extended_tutorial.md for a more detailed guide to features.

Also take a look at the production_example.py

Tips for Speed and Large Data:

  • Use appropriate model lists, especially the predefined lists:
    • superfast (simple naive models) and fast (more complex but still faster models, optimized for many series)
    • fast_parallel (a combination of fast and parallel) or parallel, given many CPU cores are available
      • n_jobs usually gets pretty close with ='auto' but adjust as necessary for the environment
    • 'scalable' is the best list to avoid crashing when many series are present. There is also a transformer_list = 'scalable'
    • see a dict of predefined lists (some defined for internal use) with from autots.models.model_list import model_lists
  • Use the subset parameter when there are many similar series, subset=100 will often generalize well for tens of thousands of similar series.
    • if using subset, passing weights for series will weight subset selection towards higher priority series.
    • if limited by RAM, it can be distributed by running multiple instances of AutoTS on different batches of data, having first imported a template pretrained as a starting point for all.
  • Set model_interrupt=True which passes over the current model when a KeyboardInterrupt ie crtl+c is pressed (although if the interrupt falls between generations it will stop the entire training).
  • Use the result_file method of .fit() which will save progress after each generation - helpful to save progress if a long training is being done. Use import_results to recover.
  • While Transformations are pretty fast, setting transformer_max_depth to a lower number (say, 2) will increase speed. Also utilize transformer_list == 'fast' or 'superfast'.
  • Check out this example of using AutoTS with pandas UDF.
  • Ensembles are obviously slower to predict because they run many models, 'distance' models 2x slower, and 'simple' models 3x-5x slower.
    • ensemble='horizontal-max' with model_list='no_shared_fast' can scale relatively well given many cpu cores because each model is only run on the series it is needed for.
  • Reducing num_validations and models_to_validate will decrease runtime but may lead to poorer model selections.
  • For datasets with many records, upsampling (for example, from daily to monthly frequency forecasts) can reduce training time if appropriate.
    • this can be done by adjusting frequency and aggfunc but is probably best done before passing data into AutoTS.
  • It will be faster if NaN's are already filled. If a search for optimal NaN fill method is not required, then fill any NaN with a satisfactory method before passing to class.
  • Set runtime_weighting in metric_weighting to a higher value. This will guide the search towards faster models, although it may come at the expense of accuracy.
  • Memory shortage is the most common cause of random process/kernel crashes. Try testing a data subset and using a different model list if issues occur. Please also report crashes if found to be linked to a specific set of model parameters (not AutoTS parameters but the underlying forecasting model params). Also crashes vary significantly by setup such as underlying linpack/blas so seeing crash differences between environments can be expected.

How to Contribute:

  • Give feedback on where you find the documentation confusing
  • Use AutoTS and...
    • Report errors and request features by adding Issues on GitHub
    • Posting the top model templates for your data (to help improve the starting templates)
    • Feel free to recommend different search grid parameters for your favorite models
  • And, of course, contributing to the codebase directly on GitHub.

AutoTS Process

flowchart TD
    A[Initiate AutoTS Model] --> B[Import Template]
    B --> C[Load Data]
    C --> D[Split Data Into Initial Train/Test Holdout]
    D --> E[Run Initial Template Models]
    E --> F[Evaluate Accuracy Metrics on Results]
    F --> G[Generate Score from Accuracy Metrics]
    G --> H{Max Generations Reached or Timeout?}

    H -->|No| I[Evaluate All Previous Templates]
    I --> J[Genetic Algorithm Combines Best Results and New Random Parameters into New Template]
    J --> K[Run New Template Models and Evaluate]
    K --> G

    H -->|Yes| L[Select Best Models by Score for Validation Template]
    L --> M[Run Validation Template on Additional Holdouts]
    M --> N[Evaluate and Score Validation Results]
    N --> O{Create Ensembles?}
    
    O -->|Yes| P[Generate Ensembles from Validation Results]
    P --> Q[Run Ensembles Through Validation]
    Q --> N

    O -->|No| R[Export Best Models Template]
    R --> S[Select Single Best Model]
    S --> T[Generate Future Time Forecast]
    T --> U[Visualize Results]

    R --> B[Import Best Models Template]
Loading

Also known as Project CATS (Catlin's Automated Time Series) hence the logo.

About

Automated Time Series Forecasting

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.1%
  • Jupyter Notebook 1.9%