remove underperforming variant

lucidrains · Jan 6, 2024 · 544ec67 · 544ec67
1 parent 0c41146
commit 544ec67
Show file tree

Hide file tree

Showing 3 changed files with 3 additions and 291 deletions.
diff --git a/README.md b/README.md
@@ -14,6 +14,8 @@ The official implementation has been released <a href="https://github.com/thuml/
 
 - <a href="https://stability.ai/">StabilityAI</a> and <a href="https://huggingface.co/">🤗 Huggingface</a> for the generous sponsorship, as well as my other sponsors, for affording me the independence to open source current artificial intelligence techniques.
 
+- <a href="https://github.com/gdevos010">Greg DeVos</a> for sharing <a href="https://github.com/lucidrains/iTransformer/issues/20">experiments</a> he ran on `iTransformer` and some of the improvised variants
+
 ## Install
 
 ```bash
@@ -112,35 +114,6 @@ preds = model(time_series)
 #       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))
 ```
 
-### iTransformer with Normalization Statistics Conditioning
-
-Reversible instance normalization, but all statistics across variates are concatted and projected into a conditioning vector for FiLM conditioning after each layernorm in the transformer.
-
-```python
-import torch
-from iTransformer import iTransformerNormConditioned
-
-# using solar energy settings
-
-model = iTransformerNormConditioned(
-    num_variates = 137,
-    lookback_len = 96,                  # or the lookback length in the paper
-    dim = 256,                          # model dimensions
-    depth = 6,                          # depth
-    heads = 8,                          # attention heads
-    dim_head = 64,                      # head dimension
-    pred_length = (12, 24, 36, 48),     # can be one prediction, or many
-    num_tokens_per_variate = 1,         # experimental setting that projects each variate to more than one token. the idea is that the network can learn to divide up into time tokens for more granular attention across time. thanks to flash attention, you should be able to accommodate long sequence lengths just fine
-)
-
-time_series = torch.randn(2, 96, 137)  # (batch, lookback len, variates)
-
-preds = model(time_series)
-
-# preds -> Dict[int, Tensor[batch, pred_length, variate]]
-#       -> (12: (2, 12, 137), 24: (2, 24, 137), 36: (2, 36, 137), 48: (2, 48, 137))
-```
-
 ## Todo
 
 - [x] beef up the transformer with latest findings

diff --git a/iTransformer/iTransformerNormConditioned.py b/iTransformer/iTransformerNormConditioned.py
diff --git a/setup.py b/setup.py
@@ -3,7 +3,7 @@
 setup(
   name = 'iTransformer',
   packages = find_packages(exclude=[]),
-  version = '0.5.2',
+  version = '0.5.3',
   license='MIT',
   description = 'iTransformer - Inverted Transformer Are Effective for Time Series Forecasting',
   author = 'Phil Wang',