Embeddings Cookbook

This repository contains code examples for running machine learning models using Resonate Embeddings as features. Embeddings are type of dimensionality reduction performed by large neural networks that capture key features of raw dataset in a manner that makes them more useful for other machine learning models, including other neural networks, recommender systems, gradient boosting, and segmentation. Embeddings are often easier to work with than raw data, particularly in the realm of consumer behaviors, because they are lower dimensional and dense.

This respository includes examples for training models with optuna and scikit-learn. We find that appropriate hyperparameter tuning is necessary to work with these data successfully, so we recommend adopting these patterns within your own stack. This repository does not include any data.

Overview

Resonate embeddings are a synthesis of up to 90 days of online behavior made available at the individual level. These embeddings may be used in isolation or in tandome with first party datasets, are may improve the predictive quality of your models, as well as their scale.

Getting Started

Prerequisites

Ensure you have the following Python packages installed:

sklearn
optuna (for hyperparameter optimization)
Any other dependencies required by your scripts

Installation

Clone this repository:

git clone https://github.com/resonate/resonate-embeddings-cookbook.git
cd embeddings-cookbook

Creating virtual environment(Optional)

Install the necessary Python packages in a virtual environment (helps in getting necessary package versions in place):

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

Scikit-learn Example

To run the scikit-learn model:

python -m sklearn_cookbook.train_sklearn \
  --input-path {input_data} \
  --output-path {output_path} \
  --evkey {evkey} \
  --embeddings-path {embeddings_path} \
  --feature-selection False

Example Values

input_path: Path to the input data (local or S3), containing labels and IDs.
embeddings_path: Path to the embeddings (local or S3), containing IDs and N-dimensional embeddings.
output_path: Path to write the output data (local or S3).
evkey: An example key, e.g., E205932615. This is a model identifier and needed for record keeping.
evaluations: Number of evaluations for Optuna to explore the hyperparameter space (e.g., 150, but can be higher or lower).

Input Requirements

Input Path

This is a parquet file that contains the label matrix for a binary classification model. The schema for this file is:

rid: ID for each data point.
evkey: A model identifier to facilitate good governance of experiments and model use cases.
label: Binary label indicating the outcome for the observation (e.g., this record churned or did not churn).

Embeddings Path

This is a parquet file that contains the embeddings for a set of rids. The schema for this file is:

rid: ID for each data point.
bottleneck: 512-dimension embedding vector as a numpy array.

Contributing

We welcome contributions to this project. Please submit a pull request or open an issue to discuss any changes.

License

This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
append-etl		append-etl
sklearn_cookbook		sklearn_cookbook
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embeddings Cookbook

Overview

Getting Started

Prerequisites

Installation

Creating virtual environment(Optional)

Usage

Scikit-learn Example

Example Values

Input Requirements

Input Path

Embeddings Path

Contributing

License

About

Releases

Packages

Contributors 3

Languages

License

resonate/resonate-embeddings-cookbook

Folders and files

Latest commit

History

Repository files navigation

Embeddings Cookbook

Overview

Getting Started

Prerequisites

Installation

Creating virtual environment(Optional)

Usage

Scikit-learn Example

Example Values

Input Requirements

Input Path

Embeddings Path

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages