Skip to content

resonate/resonate-embeddings-cookbook

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Embeddings Cookbook

This repository contains code examples for running machine learning models using Resonate Embeddings as features. Embeddings are type of dimensionality reduction performed by large neural networks that capture key features of raw dataset in a manner that makes them more useful for other machine learning models, including other neural networks, recommender systems, gradient boosting, and segmentation. Embeddings are often easier to work with than raw data, particularly in the realm of consumer behaviors, because they are lower dimensional and dense.

This respository includes examples for training models with optuna and scikit-learn. We find that appropriate hyperparameter tuning is necessary to work with these data successfully, so we recommend adopting these patterns within your own stack. This repository does not include any data.

Overview

Resonate embeddings are a synthesis of up to 90 days of online behavior made available at the individual level. These embeddings may be used in isolation or in tandome with first party datasets, are may improve the predictive quality of your models, as well as their scale.

Getting Started

Prerequisites

Ensure you have the following Python packages installed:

  • sklearn
  • optuna (for hyperparameter optimization)
  • Any other dependencies required by your scripts

Installation

Clone this repository:

git clone https://github.com/resonate/resonate-embeddings-cookbook.git
cd embeddings-cookbook

Creating virtual environment(Optional)

Install the necessary Python packages in a virtual environment (helps in getting necessary package versions in place):

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

Scikit-learn Example

To run the scikit-learn model:

python -m sklearn_cookbook.train_sklearn \
  --input-path {input_data} \
  --output-path {output_path} \
  --evkey {evkey} \
  --embeddings-path {embeddings_path} \
  --feature-selection False

Example Values

  • input_path: Path to the input data (local or S3), containing labels and IDs.
  • embeddings_path: Path to the embeddings (local or S3), containing IDs and N-dimensional embeddings.
  • output_path: Path to write the output data (local or S3).
  • evkey: An example key, e.g., E205932615. This is a model identifier and needed for record keeping.
  • evaluations: Number of evaluations for Optuna to explore the hyperparameter space (e.g., 150, but can be higher or lower).

Input Requirements

Input Path

This is a parquet file that contains the label matrix for a binary classification model. The schema for this file is:

  • rid: ID for each data point.
  • evkey: A model identifier to facilitate good governance of experiments and model use cases.
  • label: Binary label indicating the outcome for the observation (e.g., this record churned or did not churn).

Embeddings Path

This is a parquet file that contains the embeddings for a set of rids. The schema for this file is:

  • rid: ID for each data point.
  • bottleneck: 512-dimension embedding vector as a numpy array.

Contributing

We welcome contributions to this project. Please submit a pull request or open an issue to discuss any changes.

License

This project is licensed under the BSD-3-Clause License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •