Skip to content

Commit

Permalink
Merge branch 'main' into release-0.1.3
Browse files Browse the repository at this point in the history
  • Loading branch information
ianspektor committed Aug 23, 2023
2 parents ab36285 + 89f8265 commit 26c1542
Show file tree
Hide file tree
Showing 35 changed files with 4,972 additions and 1,867 deletions.
13 changes: 13 additions & 0 deletions .git-hooks/pre-commit
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#!/bin/bash
#
files_to_check="docs/src/recipes/*.ipynb"
files_to_check+=" docs/src/user_guide.ipynb"
files_to_check+=" docs/src/tutorials/getting_started.ipynb"


for path in `git diff --name-only --staged $files_to_check`
do
echo "Pre-commit: Clearing outputs for $path"
jupyter nbconvert --clear-output "$path" --to notebook --inplace
git add $path
done
10 changes: 8 additions & 2 deletions .github/workflows/test_notebooks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,11 @@ jobs:
- name: Install notebook dependencies
run: poetry run pip install -r ./docs/src/tutorials/requirements.txt

- name: Execute all notebooks
run: poetry run ./tools/run_notebooks.sh `ls docs/src/*.ipynb` `ls docs/src/tutorials/*.ipynb`
- name: Execute User Guide
run: poetry run ./tools/run_notebooks.sh `ls docs/src/*.ipynb`

- name: Execute all recipes
run: poetry run ./tools/run_notebooks.sh `ls docs/src/recipes/*.ipynb`

- name: Execute all tutorials
run: poetry run ./tools/run_notebooks.sh `ls docs/src/tutorials/*.ipynb`
21 changes: 21 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# Changelog

## HEAD (might become 0.1.3)

This is the first operational version of Temporian for users. The list whole and
detailed list of features is too long to be listed. The top features are:

### Feature

- Pypi release.
- 72 operators.
- Execution in eager, compiled mode, and graph mode.
- IO Support for Pandas, CSV, Numpy and TensorFlow datasets.
- Static and interactive plotting.
- Documentation (3 minutes intro, user guide and API references).
- 5 tutorials.

### Fix

- Proper error message when using distributed training on more than 2^31
(i.e., ~2B) examples while compiling YDF with 32-bits example index.
8 changes: 8 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,14 @@ To create a new release, follow these steps:

### Environment Setup

After cloning the repository, please manually install the git hooks:

```shell
git clone [email protected]:google/temporian.git

cp .git-hooks/* .git/hooks
```

Install [Poetry](https://python-poetry.org/), which we use to manage Python dependencies and virtual environments.

Temporian requires Python `3.9.0` or greater. We recommend using [PyEnv](https://github.com/pyenv/pyenv#installation) to install and manage multiple Python versions. Once PyEnv is available, install a supported Python version (e.g. 3.9.6) by running:
Expand Down
3 changes: 3 additions & 0 deletions docs/.readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,10 @@ build:

pre_build:
- tools/run_notebooks.sh docs/src/user_guide.ipynb
- tools/run_notebooks.sh $(ls docs/src/recipes/*.ipynb)
- tools/run_notebooks.sh docs/src/tutorials/getting_started.ipynb
# These are too slow
# - tools/run_notebooks.sh docs/src/tutorials/*.ipynb

mkdocs:
configuration: docs/mkdocs.yml
2 changes: 2 additions & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,10 @@ nav:
- Home: index.md
- 3 minutes to Temporian: 3_minutes.md
- User Guide: user_guide.ipynb
- Recipes: recipes/
- Tutorials: tutorials/
- API Reference: reference/ # generated by gen-files + literate-nav
- Changelog: CHANGELOG.md

# Plugins
plugins:
Expand Down
1 change: 1 addition & 0 deletions docs/src/CHANGELOG.md
159 changes: 159 additions & 0 deletions docs/src/recipes/aggregate_duplicated.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "c74f7111-1b6c-4454-9770-3f67eeadaca6",
"metadata": {},
"source": [
"# Unify events with identical timestamps\n",
"\n",
"This recipe shows how to avoid having duplicated timestamps in an `EventSet`. Events with identical timestamps are aggregated with a moving window operation (e.g: sum, average, max, min), preserving the original timestamp values (which may be non-uniform).\n",
"\n",
"\n",
"For example, assume we've asynchronous sensor measurements, potentially from different sources. If there are two measurements at the same exact timestamp, we want to unify them and take their average value."
]
},
{
"cell_type": "markdown",
"id": "c63a41a5-bd95-4588-bad3-83691bd0acd0",
"metadata": {},
"source": [
"## Example data\n",
"\n",
"Let's define some events with non-uniform timestamps to illustrate the use case. Some of the timestamps are repeated, those are the ones that we'll unify.\n",
"\n",
"But, we've to be careful because there are events very close in time, but not actually duplicated. We don't want to interfere with those."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a5e57e74-ca80-4292-834a-e7cfd9185b2f",
"metadata": {},
"outputs": [],
"source": [
"import temporian as tp\n",
"\n",
"sensor_evset = tp.event_set(timestamps=[1.1, 2.01, 2.02, 2.02, 3.5, 3.51, 3.51, 4.5, 5.0],\n",
" features={\"y\": [1., 2., 3., 4., 5., 6., 7., 8., 9.],\n",
" \"z\": [10., 20., 30., 40., 50., 60., 70., 80., 90.]\n",
" }\n",
" )\n",
"sensor_evset.plot()"
]
},
{
"cell_type": "markdown",
"id": "4b875a15-cd14-49bd-a83e-301f9c7aef17",
"metadata": {},
"source": [
"## Solution\n",
"\n",
"In order to unify only the events with the exact same timestamp, we need to:\n",
"1. Get the list of unique timestamps.\n",
"2. Aggregate events at the exact same timestamp, making sure the moving window doesn't overlap with nearby measurements.\n",
"\n",
"### 1. Get unique timestamps\n",
"\n",
"The first step is to create a new sampling removing the duplicated timestamps at `2.02` and `3.51`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "037b1aa0-c808-4905-81bf-e182b221468e",
"metadata": {},
"outputs": [],
"source": [
"# Remove duplicated timestamps\n",
"unique_t = sensor_evset.unique_timestamps()\n",
"unique_t"
]
},
{
"cell_type": "markdown",
"id": "bea5a3e8-2975-4cda-a479-ba9efa219339",
"metadata": {},
"source": [
"### 2. Moving window with shortest length\n",
"\n",
"To create a moving window that doesn't overlap with two different timestamps at any point, it must be smaller than the smallest possible step. But we want a solution that works for any resolution, from daily sales to nano-second sensor measurements.\n",
"\n",
"In `tp.duration.shortest`, we've defined the shortest possible interval that can be represented with a `float64` timestamp at maximum resolution:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c64fbeba-5efc-437e-baac-9b38a536fd9d",
"metadata": {},
"outputs": [],
"source": [
"shortest_length = tp.duration.shortest\n",
"shortest_length"
]
},
{
"cell_type": "markdown",
"id": "81d58e94-bd23-4d9e-bb53-88c299af65f5",
"metadata": {},
"source": [
"Pretty small, right? Since null durations are not allowed, this is as close to zero as we can get. It's guaranteed that you'll never overlap two different timestamps using this.\n",
"\n",
"Now we just need to run the aggregation function that we need, providing this small number as `window_length` and the unique timestamps as `sampling`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "813f1562-8c95-4b22-a40f-90b7ee751e61",
"metadata": {},
"outputs": [],
"source": [
"unified_evset = sensor_evset.simple_moving_average(window_length=shortest_length, sampling=unique_t)\n",
"unified_evset"
]
},
{
"cell_type": "markdown",
"id": "72e0d5a5-e2e7-406c-84c6-dd8ac717128d",
"metadata": {},
"source": [
"Of course, instead of the average value, other moving window operations like `moving_min` or `moving_max` could make more sense depending on the use case. If multiple measurements are expected at each timestamp, you could also want the moving standard deviation to get a confidence interval.\n",
"\n",
"Also, keep in mind that this exact procedure would work well in an `EventSet` with multiple indexes, removing the duplicated timestamps in each index separately.\n",
"\n",
"But let's keep the example simple for now 🙂"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3a9c6e6-5cba-4950-900f-1878e87a98be",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Loading

0 comments on commit 26c1542

Please sign in to comment.