Temporian is an open source project - we warmly welcome and appreciate any kind of contribution! π€πΌ
This guide describes how to contribute, and will help you set up your environment and create your first submission.
Check out the open GitHub issues to see what we're working on, and what we need help with.
Look especially for the good first issue
label, which indicates issues that are suitable for new contributors.
If you'd like help or additional guidance to contribute, please join our Discord.
All submissions, including submissions by project members, require review. We use GitHub Pull Requests for this purpose. Consult GitHub Help for more information on using pull requests.
All new contributions must pass all the tests and checks performed by GitHub actions, and any changes to docstrings must respect the docstring guidelines.
You can't push directly to our repository so you'll have to create a Fork, visit this guide for more information.
After forking the repo you should clone your repository:
git clone https://github.com/YOUR-USERNAME/temporian
And manually install the git hooks:
cp .git-hooks/* .git/hooks
Install Poetry, which we use to manage Python dependencies and virtual environments.
Temporian requires Python 3.9.0
or greater. We recommend using PyEnv to install and manage multiple Python versions. Once PyEnv is available, install a supported Python version (e.g. 3.9.6) by running:
pyenv install 3.9.6
After both Poetry and an adequate Python version have been installed, you can proceed to install the virtual environment and the required dependencies.
Configure poetry to create the virtual environment in the project's root directory (some vscode settings depend on this) by executing:
poetry config virtualenvs.in-project true
Before installing the package you need to install bazel (in Mac we recommend installing bazelisk with brew):
brew install bazelisk
Navigate to the project's root and create a Poetry environment with the correct PyEnv version and all needed dependencies with:
pyenv which python | xargs poetry env use
poetry install
Finally, activate the virtual environment by executing:
poetry shell
All tests must pass for your contribution to be accepted.
Run all tests with bazel:
bazel test --config=linux //...:all --test_output=errors
You can use the Bazel test flag --test_output=streamed
to see the test logs in realtime.
If developing and testing C++ code, the --compilation_mode=dbg
flag enables additional assertions that are otherwise disabled.
Note that these tests also include docstring examples, using the builtin doctest
module.
See the Adding code examples section for more information.
Live preview your local changes to the documentation with
mkdocs serve -f docs/mkdocs.yml
Benchmarking and profiling of pre-configured scripts is available as follow:
bazel run -c opt --config=linux //benchmark:profile_time -- [name]
bazel run -c opt --config=linux //benchmark:profile_memory -- [name] [-p]
where [name]
is the name of one of the python scripts in
benchmark/scripts, e.g. bazel run -c opt --config=linux benchmark:profile_time -- basic
.
-p
flag displays memory over time plot instead of line-by-line memory
consumption.
bazel run -c opt --config=linux //benchmark:benchmark_time
We provide a utility script that generates placeholder files, modifies existing ones, and prints needed modifications to develop and make available a new operator. From the project's root, run:
tools/create_operator.py --operator <name>
so for example, to create the EventSet.map()
operator, you'd run tools/create_operator.py --operator map
.
Here are some key files you'll need to modify (and write the operator's logic in):
- temporian/core/event_set_ops.py or temporian/init.py, depending on if the operator is available in the
EventSet
class (likeEventSet.since_last()
) or in the globaltp
module (liketp.glue()
). - Write the operator's core logic in
temporian/core/operators/<name>.py
.- The core logic is that related to the operator's definition in the graph, checks, and normalization during initialization. It doesn't interact with the actual data contained within the
EventSet
. - Example: temporian/core/operators/since_last.py.
- The core logic is that related to the operator's definition in the graph, checks, and normalization during initialization. It doesn't interact with the actual data contained within the
- Write the operator's implementation in
temporian/implementation/numpy/operators/<name>.py
.- The implementation is what actually executes the operator's logic on an
EventSet
's data. - Example of a Python-only operator: temporian/implementation/numpy/operators/since_last.py.
- Example of a C++ operator: temporian/implementation/numpy/operators/resample.py and temporian/implementation/numpy_cc/operators/resample.cc.
- The implementation is what actually executes the operator's logic on an
- Write unit tests for the operator in
temporian/core/operators/test/test_<name>.py
. - Add the operator to the docs in
docs/src/reference/temporian/operators/<name>.md
.- The docs are generated automatically by mkdocstrings from the operator's docstring.
- Example: docs/src/reference/temporian/operators/since_last.md.
Groups of operator with a similar implementation as grouped together. For instance, temporian/core/operators/window
contains moving window operators (e.g., EventSet.simple_moving_average()
) and temporian/core/operators/binary
contains operators taking two features as input (e.g. EventSet.subtract()
).
Read the script's output to see in detail all other files that need to be modified to finish setting up the operator!
Any doctest code examples in temporian/*.py
or docs/*.md
, will be executed and tested using the python's built-in doctest module.
For example, the following piece of code would be executed, and the outputs must match the expected result indicated:
>>> evset = tp.event_set(
... timestamps=["2020-01-01", "2020-02-02"],
... )
>>> print(evset)
indexes: []
features: []
events:
(2 events):
timestamps: [...]
...
Note from this example:
- If the
>>>
indicator is not present, the code will not be run or tested. - Multi-line statements need a preceding
...
instead of>>>
. - All the lines immediately following
>>>
or...
and before a blank line, are the expected outputs. - You should always leave a blank line before closing the code block, to indicate the end of the test.
- The
...
inside the expected result is used to match anything. Here, the exact timestamps and the latest line (which includes memory usage information) don't need exact match.
You cannot use ...
in the first matching line to ignore the whole output (it's ambiguous with multi-lines).
In that case, you may use the SKIP
flag as follows:
>>> print("hello") # doctest:+SKIP
This result doesn't need to match
Exceptions can also be expected, but it's better to avoid being too specific with the expected result:
>>> node["f1"] + node["f2"]
Traceback (most recent call last):
...
ValueError: ... corresponding features should have the same dtype. ...
Finally, note that globals like tp
, pd
and np
are always included in the execution context, no need to import them.
To check if your examples are correct, you may run:
# Test anything in temporian/*.py and docs/*.md
bazel test --config=linux //temporian/test:doc_test --test_output=streamed
In case of unexpected outputs, the result is printed and compared to the expected values, so that they can be fixed.
Contributions to this project must be accompanied by a Contributor License Agreement.
You (or your employer) retain the copyright to your contribution, this simply gives us permission to use and redistribute your contributions as part of the project. Head over to https://cla.developers.google.com/ to see your current agreements on file or sign a new one.
You generally only need to submit a CLA once, so if you've already submitted one (even if it was for a different project), you probably won't need to do it again.