-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #277 from google/recipes-variable-window-length
Recipe with tick_calendar and since_last(steps=2)
- Loading branch information
Showing
9 changed files
with
331 additions
and
132 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,182 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c74f7111-1b6c-4454-9770-3f67eeadaca6", | ||
"metadata": {}, | ||
"source": [ | ||
"# Aggregate events by calendar features (month/year)\n", | ||
"\n", | ||
"[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/google/temporian/blob/last-release/docs/src/recipes/aggregate_calendar.ipynb)\n", | ||
"\n", | ||
"In this recipe we'll learn how to aggregate events based on calendar features (e.g: monthly, yearly).\n", | ||
"\n", | ||
"For example, suppose we want to calculate total monthly sales, having one event per month that accumulates all sales of the past month.\n", | ||
"\n", | ||
"Here we'll use a more general use case: for **every month, show the sales of the past 2 months**. This covers the previous case as well, only by changing a parameter's value (`steps=1`)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "de274e0e-ab5a-46a0-b4b9-f1026b43076c", | ||
"metadata": {}, | ||
"source": [ | ||
"## Example data\n", | ||
"\n", | ||
"Let's create some sale events with datetime samplings." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "10a56d43-d011-4e72-aed4-8d460d58c337", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import pandas as pd\n", | ||
"import temporian as tp\n", | ||
"\n", | ||
"sales_data = pd.DataFrame(\n", | ||
" data=[\n", | ||
" # sale timestamp, price, cost\n", | ||
" [\"2020-01-01 13:04\", 3.0, 1.0], # January\n", | ||
" [\"2020-01-15 15:24\", 7.0, 3.0],\n", | ||
" [\"2020-02-01 13:45\", 3.0, 1.0], # February\n", | ||
" [\"2020-02-20 16:10\", 7.0, 3.0],\n", | ||
" [\"2020-03-10 10:00\", 10.0, 5.0], # March\n", | ||
" [\"2020-03-28 10:10\", 4.0, 2.0],\n", | ||
" [\"2020-04-15 19:35\", 3.0, 1.0], # April\n", | ||
" [\"2020-05-25 18:30\", 18.0, 2.0], # May\n", | ||
" ],\n", | ||
" columns=[\n", | ||
" \"timestamp\",\n", | ||
" \"unit_price\",\n", | ||
" \"unit_cost\",\n", | ||
" ],\n", | ||
")\n", | ||
"\n", | ||
"sales_evset = tp.from_pandas(sales_data)\n", | ||
"sales_evset.plot()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "c8f6189a-ca80-4869-b8a6-35356ce42c02", | ||
"metadata": {}, | ||
"source": [ | ||
"## Solution\n", | ||
"We want to calculate every month, the accumulated sales from the last 2 months. So this is what we can do:\n", | ||
"1. Create a tick on the first day of every month.\n", | ||
"1. Use a `moving_sum` with variable window length, at each tick covering the duration since the last 2 months.\n", | ||
"\n", | ||
"### 1. Create a tick every month" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "9e930b36-10f6-487f-8466-278a1a08956b", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Period to cover. Includes the first day of the month after the last event.\n", | ||
"time_span = tp.event_set(timestamps=[\"2020-01-01 00:00:00\", \"2020-06-01 00:00:00\"])\n", | ||
"\n", | ||
"# Tick first day of every month (equivalent: set mday=1)\n", | ||
"monthly_ticks = time_span.tick_calendar(month='*')\n", | ||
"\n", | ||
"monthly_ticks.plot()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "aad79b2d-3fea-4e11-8d68-1e5d78f3b655", | ||
"metadata": {}, | ||
"source": [ | ||
"### 2. Moving sum with variable window length\n", | ||
"\n", | ||
"The `window_length` argument can be an `EventSet` with one single feature, which specifies the duration (in seconds) of the window at each timestamp.\n", | ||
"\n", | ||
"Using the `since_last()` operator, we get exactly that: an `EventSet` with the duration (in seconds) since the last previous event, or since the number events indicated by the `steps` parameter. For example, using `steps=1` (default), would accumulate events by month, and using `steps=6` means a rolling sum over the previous 6 months.\n", | ||
"\n", | ||
"We want the last 2 months calculated every month, so tick every month and use `since_last(steps=2)`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "7d92dff9-b4b6-42d1-8848-b37ff768ef7e", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Same sampling as monthly_ticks, create single feature with the duration of the last 2 months (in seconds)\n", | ||
"monthly_window_lengths = monthly_ticks.since_last(steps=2)\n", | ||
"\n", | ||
"# Remove 01/01 and 01/02 (not enough previous data)\n", | ||
"monthly_window_lengths = monthly_window_lengths.filter(monthly_window_lengths.notnan())\n", | ||
"\n", | ||
"# Use since_last() feature as variable window length\n", | ||
"moving_sum = sales_evset.moving_sum(window_length=monthly_window_lengths)\n", | ||
"\n", | ||
"moving_sum" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "706f50a9-72ab-4673-aa38-8c3dacdcf49d", | ||
"metadata": {}, | ||
"source": [ | ||
"## (Optional) Rename and plot\n", | ||
"\n", | ||
"Finally, we can rename features to match their actual meaning after aggregation.\n", | ||
"\n", | ||
"In this case we also calculate and plot the daily profit." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "664550bb-12ac-43c2-8216-4308843178b5", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Rename aggregated features\n", | ||
"monthly_sales = moving_sum.rename({\"unit_price\": \"monthly_revenue\", \"unit_cost\": \"monthly_cost\"})\n", | ||
"\n", | ||
"# Profit = revenue - cost\n", | ||
"monthly_profit = (monthly_sales[\"monthly_revenue\"] - monthly_sales[\"monthly_cost\"]).rename(\"monthly_profit\")\n", | ||
"\n", | ||
"monthly_profit.plot(style='line', interactive=True, width_px=600)\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "f3a9c6e6-5cba-4950-900f-1878e87a98be", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.4" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.