-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into release-0.1.3
- Loading branch information
Showing
35 changed files
with
4,972 additions
and
1,867 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/bin/bash | ||
# | ||
files_to_check="docs/src/recipes/*.ipynb" | ||
files_to_check+=" docs/src/user_guide.ipynb" | ||
files_to_check+=" docs/src/tutorials/getting_started.ipynb" | ||
|
||
|
||
for path in `git diff --name-only --staged $files_to_check` | ||
do | ||
echo "Pre-commit: Clearing outputs for $path" | ||
jupyter nbconvert --clear-output "$path" --to notebook --inplace | ||
git add $path | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
# Changelog | ||
|
||
## HEAD (might become 0.1.3) | ||
|
||
This is the first operational version of Temporian for users. The list whole and | ||
detailed list of features is too long to be listed. The top features are: | ||
|
||
### Feature | ||
|
||
- Pypi release. | ||
- 72 operators. | ||
- Execution in eager, compiled mode, and graph mode. | ||
- IO Support for Pandas, CSV, Numpy and TensorFlow datasets. | ||
- Static and interactive plotting. | ||
- Documentation (3 minutes intro, user guide and API references). | ||
- 5 tutorials. | ||
|
||
### Fix | ||
|
||
- Proper error message when using distributed training on more than 2^31 | ||
(i.e., ~2B) examples while compiling YDF with 32-bits example index. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -29,6 +29,14 @@ To create a new release, follow these steps: | |
|
||
### Environment Setup | ||
|
||
After cloning the repository, please manually install the git hooks: | ||
|
||
```shell | ||
git clone [email protected]:google/temporian.git | ||
|
||
cp .git-hooks/* .git/hooks | ||
``` | ||
|
||
Install [Poetry](https://python-poetry.org/), which we use to manage Python dependencies and virtual environments. | ||
|
||
Temporian requires Python `3.9.0` or greater. We recommend using [PyEnv](https://github.com/pyenv/pyenv#installation) to install and manage multiple Python versions. Once PyEnv is available, install a supported Python version (e.g. 3.9.6) by running: | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../CHANGELOG.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c74f7111-1b6c-4454-9770-3f67eeadaca6", | ||
"metadata": {}, | ||
"source": [ | ||
"# Unify events with identical timestamps\n", | ||
"\n", | ||
"This recipe shows how to avoid having duplicated timestamps in an `EventSet`. Events with identical timestamps are aggregated with a moving window operation (e.g: sum, average, max, min), preserving the original timestamp values (which may be non-uniform).\n", | ||
"\n", | ||
"\n", | ||
"For example, assume we've asynchronous sensor measurements, potentially from different sources. If there are two measurements at the same exact timestamp, we want to unify them and take their average value." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c63a41a5-bd95-4588-bad3-83691bd0acd0", | ||
"metadata": {}, | ||
"source": [ | ||
"## Example data\n", | ||
"\n", | ||
"Let's define some events with non-uniform timestamps to illustrate the use case. Some of the timestamps are repeated, those are the ones that we'll unify.\n", | ||
"\n", | ||
"But, we've to be careful because there are events very close in time, but not actually duplicated. We don't want to interfere with those." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "a5e57e74-ca80-4292-834a-e7cfd9185b2f", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import temporian as tp\n", | ||
"\n", | ||
"sensor_evset = tp.event_set(timestamps=[1.1, 2.01, 2.02, 2.02, 3.5, 3.51, 3.51, 4.5, 5.0],\n", | ||
" features={\"y\": [1., 2., 3., 4., 5., 6., 7., 8., 9.],\n", | ||
" \"z\": [10., 20., 30., 40., 50., 60., 70., 80., 90.]\n", | ||
" }\n", | ||
" )\n", | ||
"sensor_evset.plot()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "4b875a15-cd14-49bd-a83e-301f9c7aef17", | ||
"metadata": {}, | ||
"source": [ | ||
"## Solution\n", | ||
"\n", | ||
"In order to unify only the events with the exact same timestamp, we need to:\n", | ||
"1. Get the list of unique timestamps.\n", | ||
"2. Aggregate events at the exact same timestamp, making sure the moving window doesn't overlap with nearby measurements.\n", | ||
"\n", | ||
"### 1. Get unique timestamps\n", | ||
"\n", | ||
"The first step is to create a new sampling removing the duplicated timestamps at `2.02` and `3.51`:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "037b1aa0-c808-4905-81bf-e182b221468e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Remove duplicated timestamps\n", | ||
"unique_t = sensor_evset.unique_timestamps()\n", | ||
"unique_t" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "bea5a3e8-2975-4cda-a479-ba9efa219339", | ||
"metadata": {}, | ||
"source": [ | ||
"### 2. Moving window with shortest length\n", | ||
"\n", | ||
"To create a moving window that doesn't overlap with two different timestamps at any point, it must be smaller than the smallest possible step. But we want a solution that works for any resolution, from daily sales to nano-second sensor measurements.\n", | ||
"\n", | ||
"In `tp.duration.shortest`, we've defined the shortest possible interval that can be represented with a `float64` timestamp at maximum resolution:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c64fbeba-5efc-437e-baac-9b38a536fd9d", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"shortest_length = tp.duration.shortest\n", | ||
"shortest_length" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "81d58e94-bd23-4d9e-bb53-88c299af65f5", | ||
"metadata": {}, | ||
"source": [ | ||
"Pretty small, right? Since null durations are not allowed, this is as close to zero as we can get. It's guaranteed that you'll never overlap two different timestamps using this.\n", | ||
"\n", | ||
"Now we just need to run the aggregation function that we need, providing this small number as `window_length` and the unique timestamps as `sampling`:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "813f1562-8c95-4b22-a40f-90b7ee751e61", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"unified_evset = sensor_evset.simple_moving_average(window_length=shortest_length, sampling=unique_t)\n", | ||
"unified_evset" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "72e0d5a5-e2e7-406c-84c6-dd8ac717128d", | ||
"metadata": {}, | ||
"source": [ | ||
"Of course, instead of the average value, other moving window operations like `moving_min` or `moving_max` could make more sense depending on the use case. If multiple measurements are expected at each timestamp, you could also want the moving standard deviation to get a confidence interval.\n", | ||
"\n", | ||
"Also, keep in mind that this exact procedure would work well in an `EventSet` with multiple indexes, removing the duplicated timestamps in each index separately.\n", | ||
"\n", | ||
"But let's keep the example simple for now 🙂" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f3a9c6e6-5cba-4950-900f-1878e87a98be", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
Oops, something went wrong.