-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding regressors #188
Comments
@drew6050 There are a couple of ways to do this, it is definitely confusing.
|
If it is confusing then that highlights the need for more documentation! Digging into reported issues on Github to find a link to an example script not accompanied with detailed explanations is not nice experience. |
That is valid, my documentation is not the best. Another useful example: |
HI @winedarksea, could you pls upload the C:/Users/Colin/Downloads/as_export_small.csv ? |
@nagydavid from autots import load_daily
df_long = load_daily(long=True)
df = long_to_wide(
df_long,
date_col='datetime', # name of datetime column
value_col='value', # name of target values column
id_col="series_id", # name of ID column, should be composite of levels, if multiple
) |
Hello @winedarksea, and thank you for the clarifications thus far. To aid our understanding, I'd like to restate the information provided, incorporating some example data tables for both wide and long formats, including regressor series. For wide format data, each series (including regressors) is represented as a separate column, e.g.
In long format data, the regressor series are appended as rows, e.g.
By integrating regressors as described, we enable (some) models to correlate the original series with the regressor series, thus potentially enhancing forecast accuracy. Question 1: To confirm, the primary utility of adding these regressor series is to allow models to exploit correlations between the original series and the regressor series - is this correct? Question 2: Regarding the application of regressors for specific series-to-extra information relationships (e.g., linking "product-a" directly to "brand-b"), could you elaborate on whether this is possible within AutoTS? This would involve adding regressors that are not just additional time series but carry categorical or entity-specific information relevant to the primary time series. Understanding that documentation and detailed explanations are time-consuming to produce and may not always be fully appreciated, it's discussions like these that often provide invaluable insights and learning opportunities for practitioners in the field. Thanks again for your efforts in developing AutoTS and supporting its community. |
yes, your data view there is correct on long vs wide. for Question 1: yes, the general idea behind adding regressors is adding external information that can help explain the behavior of the series. Sometimes the model is able to extract insights from general market data or other high level indicators, but generally regressors are only significantly valuable when they provide clear, direct insight into business drivers. An example I know is where knowing the number of school children that will be out of school on holiday, by distance from the business, adds a lot of predictive power to a business driven by kids and families visiting. for Question 2: regressor_per_series and static_categorical (which becomes static_regressor and categorical_groups) are only available for Cassandra (regressor per series, categorical groups), MultivariateRegression (regressor per series and static regressor), WindowRegression (static regressor) and NeuralForecast (static regressor and regressor per series) in the lower level api approach. See regressor_search.py posted above. But you can combine all your regressors into a single df and let the model see which helps which series with a future_regressor for the high level AutoTS model search. Generally regressors don't help as much as people hope. Focus on adding a few quality features that you know impact that business rather than just trying to feed in as much data as possible. There isn't enough history and there is way too much noise in most time series to find the deep hidden patterns people sometimes hope exist in massive regressor sets. |
Hello. I love this library.
I’m trying to add additional regressors that I don’t know future values for. The documentation example is for wide data when the future is known. But, the last paragraph says:
“Additional regressors can be passed through as additional time series to forecast as part of df_long. Some models here can utilize the additional information they provide to help improve forecast quality. To prevent forecast accuracy for considering these additional series too heavily, input series weights that lower or remove their forecast accuracy from consideration.”
Does this mean that just by having additional columns in the df_long, like below, they will be considered as additional regressors on the value_col?
I see a couple of other have asked similar questions, but I still don’t see this answer. Also, a simple example in the documentation of how to implement this paragraph would help so much!
The text was updated successfully, but these errors were encountered: