Skip to content

Commit

Permalink
Merge pull request #154 from google/improve-reference
Browse files Browse the repository at this point in the history
Improve reference
  • Loading branch information
ianspektor authored Jun 14, 2023
2 parents fdb045e + 9c7f3dd commit c4ff7ba
Show file tree
Hide file tree
Showing 114 changed files with 2,911 additions and 2,605 deletions.
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ tmp
build_package
.ipynb_checkpoints
tmp_*
.cache/
.env

# benchmark outputs
Expand All @@ -18,7 +19,7 @@ profile.*
site/

# Build outputs
build
build/
dist
setup.py
temporian.egg-info
Expand Down
9 changes: 7 additions & 2 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,17 @@
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"black-formatter.path": [
"${workspaceFolder}/.venv/bin/black"
],
"editor.tabSize": 4,
"editor.formatOnSave": true,
"python.linting.flake8Enabled": false,
"python.linting.pylintEnabled": true,
"python.linting.enabled": true,
"editor.rulers": [80],
"editor.rulers": [
80
],
"editor.codeActionsOnSave": {
"source.organizeImports": false
},
Expand Down Expand Up @@ -82,4 +87,4 @@
"span": "cpp",
"algorithm": "cpp"
}
}
}
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,22 +43,22 @@ date,feature_1,feature_2
2023-02-01,30.0,5.0
```

Check the [Getting Started tutorial](https://temporian.readthedocs.io/en/latest/tutorials/getting_started/) to try it out!
Check the [Getting Started tutorial](https://temporian.readthedocs.io/en/stable/tutorials/getting_started/) to try it out!

## Key features

These are what set Temporian apart.

- **Simple and powerful API**: Temporian exports high level operations making processing complex programs short and ready to read.
- **Flexible data model**: Temporian models temporal data as a sequence of events, supporting non-uniform sampling timestamps seamlessly.
- **Prevents modeling errors**: Temporian programs are guaranteed not to have future leakage unless the user calls the `leak` function, ensuring that models are not trained on future data.
- **Prevents modeling errors**: Temporian programs are guaranteed not to have future leakage unless explicitly specified, ensuring that models are not trained on future data.
- **Iterative development**: Temporian can be used to develop preprocessing pipelines in Colab or local notebooks, allowing users to visualize results each step of the way to identify and correct errors early on.
- **Efficient and well-tested implementations**: Temporian contains efficient and well-tested implementations of a variety of temporal data processing functions. For instance, our implementation of window operators is **x2000** faster than the same function implemented with NumPy.
- **Wide range of preprocessing functions**: Temporian contains a wide range of preprocessing functions, including moving window operations, lagging, calendar features, arithmetic operations, index manipulation and propagation, resampling, and more. For a full list of the available operators, see the [operators documentation](https://temporian.readthedocs.io/en/latest/reference/temporian/core/operators/all_operators/).
- **Wide range of preprocessing functions**: Temporian contains a wide range of preprocessing functions, including moving window operations, lagging, calendar features, arithmetic operations, index manipulation and propagation, resampling, and more. For a full list of the available operators, see the [operators documentation](https://temporian.readthedocs.io/en/stable/reference/).

## Documentation

The official documentation is available at [temporian.readthedocs.io](https://temporian.readthedocs.io/en/latest/).
The official documentation is available at [temporian.readthedocs.io](https://temporian.readthedocs.io/en/stable/).

## Contributing

Expand Down
120 changes: 71 additions & 49 deletions docs/gen_ref_pages.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,85 +5,107 @@
"""

from pathlib import Path
from typing import Set, Tuple

import mkdocs_gen_files

nav = mkdocs_gen_files.Nav()

SRC_PATH = Path("temporian")

paths = set()
# Stores symbol and path of each public API member
members: Set[Tuple[str, Path]] = set()

non_parsable_imports = []

with open("temporian/__init__.py", "r", encoding="utf8") as f:
lines = f.read().splitlines()
# We need to be able to parse other files to allow wildcard imports
# Storing pair of (prefix, path) to parse in a stack
files_to_parse = [(None, SRC_PATH / "__init__.py")]

while files_to_parse:
prefix, file = files_to_parse.pop()

with open(file, "r", encoding="utf8") as f:
lines = f.read().splitlines()

for line in lines:
words = line.split(" ")

# It is an import statement
if words[0] == "from":
# Remove trailing "as <name>" if it exists and save symbol's name
symbol = None
if words[-2] == "as":
# If symbol was renamed to a private name, skip it
if words[-1].startswith("_"):
continue

for line in lines:
words = line.split(" ")
symbol = words[-1]
words = words[:-2]

# It is an import statement
if words[0] == "from":
# Remove trailing "as <name>" if it exists
if words[-2] == "as":
# If symbol was renamed to a private name, skip it
if words[-1].startswith("_"):
continue
# `words` is now in the form "from module.submodule import symbol"
if words[-2] == "import":
name = words[-1]

words = words[:-2]
# We only allow wildcard imports from modules explicitly named
# api_symbols to prevent unwanted names in the public API
if name == "*":
module_path = Path(words[1].replace(".", "/")).with_suffix(
".py"
)
if module_path.stem == "api_symbols":
new_prefix = (
(prefix + ".") if prefix else ""
) + module_path.parent.name
files_to_parse.append((new_prefix, module_path))
continue

# It is a single-symbol import like "from <module> import <symbol>"
if words[-2] == "import":
module_path = Path(words[1].replace(".", "/"))
non_parsable_imports.append(line)
continue

# Check if the import is a dir module
module_path_with_suffix = module_path / words[-1]
if module_path_with_suffix.exists():
module_path = module_path_with_suffix
# If symbol wasn't renamed, use its imported name
if symbol is None:
symbol = name

# Check if the import is a file module
module_path_with_suffix = module_path / (words[-1] + ".py")
if module_path_with_suffix.exists():
module_path = module_path_with_suffix.with_suffix("")
path = Path(words[1].replace(".", "/")) / name

# If it's not a module import it is a normal symbol import
# (function, class, etc.) so we add its whole module to the docs
if prefix:
symbol = prefix + "." + symbol

paths.add(module_path)
members.add((symbol, path))

else:
non_parsable_imports.append(line)
# It is a multi-symbol import statement, error will be raised below
else:
non_parsable_imports.append(line)

if non_parsable_imports:
raise RuntimeError(
"`gen_ref_pages` failed to parse the following import statements in"
f" the top-level __init__.py file: {non_parsable_imports}. Import"
" statements in the top-level module must import a single symbol each,"
" in the form `from <module> import <symbol>` or `from <module> import"
" <symbol> as <name>`."
" in the form `from <module> import <symbol>`, `from <module> import"
" <symbol> as <name>`, or `from <module> import *`."
)

for path in sorted(paths):
if path.parent.name not in ["test", "tests"]:
module_path = path.relative_to(SRC_PATH.parent).with_suffix("")
doc_path = path.relative_to(SRC_PATH.parent).with_suffix(".md")
full_doc_path = Path("reference", doc_path)
nav["temporian"] = "index.md"

parts = list(module_path.parts)
for symbol, path in sorted(members):
symbol_path = Path(symbol.replace(".", "/"))
symbol_name = symbol_path.name
src_path = SRC_PATH / symbol_name

if parts[-1] == "__init__":
parts = parts[:-1]
doc_path = doc_path.with_name("index.md")
full_doc_path = full_doc_path.with_name("index.md")
elif parts[-1] == "__main__":
continue
doc_path = SRC_PATH / symbol_path
parts = list(doc_path.parts)
doc_path = doc_path.with_suffix(".md")
full_doc_path = Path("reference", doc_path)

nav[parts] = doc_path.as_posix()
nav[parts] = doc_path.as_posix()

with mkdocs_gen_files.open(full_doc_path, "w") as fd:
identifier = ".".join(parts)
print("::: " + identifier, file=fd)
with mkdocs_gen_files.open(full_doc_path, "w") as fd:
identifier = ".".join(list(src_path.parts))
print("::: " + identifier, file=fd)

mkdocs_gen_files.set_edit_path(full_doc_path, path)
mkdocs_gen_files.set_edit_path(full_doc_path, path)

with mkdocs_gen_files.open("reference/index.md", "w") as nav_file:
with mkdocs_gen_files.open("reference/SUMMARY.md", "w") as nav_file:
nav_file.writelines(nav.build_literate_nav())
33 changes: 18 additions & 15 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
site_name: Temporian
site_url: https://temporian.readthedocs.io/en/stable/
site_description: A Python package for feature engineering of temporal data.

repo_url: https://github.com/google/temporian
edit_uri_template: tree/main/temporian/{path_noext}

# Theme
theme:
Expand Down Expand Up @@ -48,14 +52,8 @@ nav:
plugins:
- search
- exclude-search:
# Avoid this since it excludes all ipynb
exclude_unreferenced: false
# Exclude reference's Index page
exclude:
- reference
# Include all pages inside reference
ignore:
- reference/*
- reference/SUMMARY
- autorefs
- gen-files:
scripts:
Expand All @@ -65,6 +63,7 @@ plugins:
- gen_ref_pages.py
- literate-nav:
nav_file: SUMMARY.md
- social
- mkdocs-jupyter:
# Execute notebooks when building docs (set to true when temporian runs in a notebook w/o the start_notebook script).
execute: false
Expand All @@ -76,24 +75,28 @@ plugins:
default_handler: python
handlers:
python:
paths: [temporian]
paths: [..]
import:
- https://docs.python.org/3/objects.inv
options:
# https://mkdocstrings.github.io/python/usage/#globallocal-options
docstring_style: google
heading_level: 1
members_order: source
show_source: true
show_root_heading: true
show_category_heading: true
show_submodules: true
show_source: false
show_submodules: false
merge_init_into_class: false
show_signature: true
separate_signature: true
show_signature_annotations: false
show_if_no_docstring: false # TODO: TBD if we want this enabled
group_by_category: false
show_signature_annotations: true
show_if_no_docstring: false
group_by_category: true
show_category_heading: false
show_root_heading: true
# show_root_toc_entry: false
# show_symbol_type_heading: false
# preload_modules: [temporian.core.operators]
# allow_inspection: true

# Customization for Markdown
markdown_extensions:
Expand Down
4 changes: 3 additions & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,6 @@ mkdocs-section-index==0.3.5
mkdocs-jupyter===0.24.1
mkdocs-exclude-search===0.6.5
black==22.12.0
griffe==0.26.0
griffe==0.26.0
cairosvg==2.7.0
pillow==9.5.0
24 changes: 12 additions & 12 deletions docs/src/3_minutes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@

This is a _very_ quick introduction to how Temporian works. For a complete tour of its capabilities, please refer to the [User Guide](../user_guide).

## Events and `EventSets`
## Events and [`EventSets`][temporian.EventSet]

The most basic unit of data in Temporian is an **event**. An event consists of a timestamp and a set of feature values.

Events are not handled individually. Instead, events are grouped together into an **[EventSet](../reference/temporian/implementation/numpy/data/event_set)**.
Events are not handled individually. Instead, events are grouped together into an **[EventSet][temporian.EventSet]**.

`EventSet`s are the main data structure in Temporian, and represent **[multivariate time sequences](../user_guide/#what-is-temporal-data)**. Note that "multivariate" indicates that each event in the time sequence holds several feature values, and "sequence" indicates that the events are not necessarily sampled at a uniform rate (in which case we would call it a time "series").
[`EventSets`][temporian.EventSet] are the main data structure in Temporian, and represent **[multivariate time sequences](../user_guide/#what-is-temporal-data)**. Note that "multivariate" indicates that each event in the time sequence holds several feature values, and "sequence" indicates that the events are not necessarily sampled at a uniform rate (in which case we would call it a time "series").

You can create an `EventSet` from a pandas DataFrame, NumPy arrays, CSV files, and more. Here is an example of an `EventSet` containing four events and three features:
You can create an [`EventSet`][temporian.EventSet] from a pandas DataFrame, NumPy arrays, CSV files, and more. Here is an example of an [`EventSet`][temporian.EventSet] containing four events and three features:

```python
>>> evset = tp.event_set(
Expand All @@ -25,21 +25,21 @@ You can create an `EventSet` from a pandas DataFrame, NumPy arrays, CSV files, a

```

An `EventSet` can hold one or several time sequences, depending on what its **[index](../user_guide/#index-horizontal-and-vertical-operators)** is.
An [`EventSet`][temporian.EventSet] can hold one or several time sequences, depending on what its **[index](../user_guide/#index-horizontal-and-vertical-operators)** is.

If the `EventSet` has no index, it will hold a single time sequence, which means that all events will be considered part of the same group and will interact with each other when operators are applied to the `EventSet`.
If the [`EventSet`][temporian.EventSet] has no index, it will hold a single time sequence, which means that all events will be considered part of the same group and will interact with each other when operators are applied to the [`EventSet`][temporian.EventSet].

If the `EventSet` has one (or many) indexes, it will hold one time sequence for each unique value (or unique combination of values) of the indexes, the events will be grouped by their index value, and operators applied to the `EventSet` will be applied to each time sequence independently.
If the [`EventSet`][temporian.EventSet] has one (or many) indexes, it will hold one time sequence for each unique value (or unique combination of values) of the indexes, the events will be grouped by their index value, and operators applied to the [`EventSet`][temporian.EventSet] will be applied to each time sequence independently.

## Graph, `Nodes` and Operators
## Graph, [`Nodes`][temporian.Node] and Operators

There are two big phases in any Temporian script: graph **definition** and **evaluation**. This is a common pattern in computing libraries, and it allows us to perform optimizations before the graph is evaluated, share Temporian programs across different platforms, and more.

A graph is created by using **operators**. For example, the [`tp.simple_moving_average()`](../reference/temporian/core/operators/window/simple_moving_average) operator computes the [simple moving average](https://en.wikipedia.org/wiki/Moving_average) of each feature in an `EventSet`. You can find documentation for all available operators [here](../reference/temporian/core/operators/all_operators).
A graph is created by using **operators**. For example, the [`tp.simple_moving_average()`][temporian.simple_moving_average] operator computes the [simple moving average](https://en.wikipedia.org/wiki/Moving_average) of each feature in an [`EventSet`][temporian.EventSet]. You can find documentation for all available operators [here](../reference/).

Note that when calling operators you are only defining the graph - i.e., you are telling Temporian what operations you want to perform on your data, but those operations are not yet being performed.

Operators are not applied directly to `EventSet`s, but to **[Nodes](../reference/temporian/core/data/node)**. You can think of a `Node` as the placeholder for an `EventSet` in the graph. When applying operators to `Node`s, you get back new `Node`s that are placeholders for the results of those operations. You can create arbitrarily complex graphs by combining operators and nodes.
Operators are not applied directly to [`EventSets`][temporian.EventSet], but to **[Nodes][temporian.Node]**. You can think of a [`Node`][temporian.Node] as the placeholder for an [`EventSet`][temporian.EventSet] in the graph. When applying operators to [`Nodes`][temporian.Node], you get back new [`Nodes`][temporian.Node] that are placeholders for the results of those operations. You can create arbitrarily complex graphs by combining operators and nodes.

```python
>>> # Obtain the Node corresponding to the EventSet we created above
Expand All @@ -53,14 +53,14 @@ Operators are not applied directly to `EventSet`s, but to **[Nodes](../reference

<!-- TODO: add image of the generated graph -->

Your graph can now be run by calling [`evaluate()`](../reference/temporian/core/data/node/#temporian.core.data.node.Node.evaluate) on any `Node` in the graph, which will perform all necessary operations and return the resulting `EventSet`.
Your graph can now be run by calling [`.evaluate()`][temporian.Node.evaluate] on any [`Node`][temporian.Node] in the graph, which will perform all necessary operations and return the resulting [`EventSet`][temporian.EventSet].

```python
>>> result = addition_lagged.evaluate(evset)

```

Note that you need to pass the `EventSet`s that correspond to the source `Node`s in the graph to `evaluate()` (since those are not part of the graph definition). Also, several `Node`s can be evaluated at the same time by calling [`tp.evaluate()`](../reference/temporian/core/evaluation/#temporian.core.evaluation.evaluate) directly.
Note that you need to pass the [`EventSets`][temporian.EventSet] that correspond to the source [`Nodes`][temporian.Node] in the graph to [`.evaluate()`][temporian.Node.evaluate] (since those are not part of the graph definition). Also, several [`Nodes`][temporian.Node] can be evaluated at the same time by calling [`tp.evaluate()`][temporian.evaluate] directly.

🥳 Congratulations! You're all set to write your first pieces of Temporian code.

Expand Down
8 changes: 8 additions & 0 deletions docs/src/css/custom.css
Original file line number Diff line number Diff line change
@@ -1,4 +1,12 @@
[data-md-color-scheme="temporian"] {
--md-primary-fg-color--light: #24201D;
--md-primary-fg-color--dark: #24201D;
background-color: red;
--md-default-bg-color: red;
}

/* Change background color when dark mode */
[data-md-color-scheme="slate"] {
--md-default-bg-color: #121212;
--md-code-bg-color: #212121;
}
5 changes: 5 additions & 0 deletions docs/src/reference/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# API Reference

Temporian's API Reference.

<!-- TODO: write reference home page. Include table of operators. -->
Loading

0 comments on commit c4ff7ba

Please sign in to comment.