Skip to content

Commit

Permalink
add structure overview
Browse files Browse the repository at this point in the history
  • Loading branch information
joshuajerin committed Nov 10, 2024
1 parent 5dbf270 commit 06853aa
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 11 deletions.
51 changes: 45 additions & 6 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,43 @@ Before you begin, ensure you have the following tools installed on your system:

**Note**: This guide assumes you are using PostgreSQL version 17. If you are using a different version, adjust the commands accordingly.

## Project Structure Overview

Understanding the layout of the `pg_vectorize` project will help you navigate the codebase and contribute effectively. Here's a brief overview:

### `extension` Directory

Contains the PostgreSQL extension code, focusing on database operations and vectorized data processing within PostgreSQL.

**Key Components**:

- **Job Management**: Utilities for creating and executing jobs, including background workers.
- **Configuration Settings**: GUC settings to customize the extension's behavior (e.g., API keys, batch sizes).
- **Triggers and Scheduling**: Defines triggers and cron jobs for automated data processing.
- **Embedding and Search Operations**: Integrates with external AI services to generate embeddings and enables vector search capabilities.

**When to Contribute Here**:

- Modifying database-specific logic or background job management.
- Working on PostgreSQL interactions with external embedding services.
- Enhancing in-database vector search functionalities.

### `core` Directory

Contains core logic and abstractions that support vectorization and embedding functionalities.

**Key Components**:

- **Providers**: API clients and wrappers for external AI/embedding services.
- **Data Structures**: Definitions of embeddings and related types.
- **Embedding Processing**: Logic for handling embedding requests and responses.

**When to Contribute Here**:

- Adding support for new AI/embedding providers.
- Modifying embedding generation logic.
- Implementing enhancements that are platform-independent.

## Setting Up Your Development Environment

### 1. Initialize PGRX
Expand Down Expand Up @@ -105,7 +142,7 @@ Expected output:

```text
List of installed extensions
Name | Version | Schema | Description
Name | Version | Schema | Description
------------+---------+------------+------------------------------------------
pg_cron | 1.6 | pg_catalog | Job scheduler for PostgreSQL
pgmq | 1.1.1 | pgmq | A lightweight message queue.
Expand All @@ -126,7 +163,7 @@ SHOW vectorize.embedding_service_url;
Expected output:

```text
vectorize.embedding_service_url
vectorize.embedding_service_url
---------------------------------
http://localhost:3000/v1/embeddings
(1 row)
Expand Down Expand Up @@ -157,7 +194,7 @@ SELECT * FROM products LIMIT 2;
Expected output:

```text
product_id | product_name | description | last_updated_at
product_id | product_name | description | last_updated_at
------------+--------------+--------------------------------------------------------+------------------------------
1 | Pencil | Utensil used for writing and often works best on paper | 2023-07-26 17:20:43.639351
2 | Laptop Stand | Elevated platform for laptops, enhancing ergonomics | 2023-07-26 17:20:43.639351
Expand All @@ -181,7 +218,7 @@ SELECT vectorize.table(
Expected output:

```text
table
table
-------------------------------
Successfully created job: product_search_hf
(1 row)
Expand All @@ -201,7 +238,7 @@ SELECT * FROM vectorize.search(
Expected output:

```text
search_results
search_results
---------------------------------------------
{"product_id":13,"product_name":"Phone Charger","similarity_score":0.8147812194590133}
{"product_id":6,"product_name":"Backpack","similarity_score":0.774306211384604}
Expand All @@ -211,7 +248,9 @@ Expected output:

## Accessing the Tembo Embedding Service

You can explore the Tembo Embedding Service API documentation at [http://localhost:3000/docs](http://localhost:3000/docs). This service allows you to experiment with different [Hugging Face models](https://huggingface.co/models?search=sentence-transformers) for your vector searches.
When running the Tembo Embedding Service locally, you can view its endpoints in the Swagger UI at [http://localhost:3000/docs](http://localhost:3000/docs). This UI allows you to test and interact with available endpoints directly from your browser.

For the full **API documentation** for `pg_vectorize`, refer to the hosted version at [https://tembo.io/pg_vectorize/](https://tembo.io/pg_vectorize/), which includes function references, and more usage examples.

## Troubleshooting and Tips

Expand Down
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,7 @@ This project relies heavily on the work by [pgvector](https://github.com/pgvecto

`pg_vectorize` powers the [VectorDB Stack](https://tembo.io/docs/product/stacks/ai/vectordb) on [Tembo Cloud](https://cloud.tembo.io/) and is available in all hobby tier instances.

**API Documentation**: [https://tembo.io/pg\_vectorize/](https://tembo.io/pg_vectorize/)

**Source**: [https://github.com/tembo-io/pg\_vectorize](https://github.com/tembo-io/pg_vectorize)
**Source**: [https://github.com/tembo-io/pg_vectorize](https://github.com/tembo-io/pg_vectorize)

## Features

Expand Down Expand Up @@ -61,7 +59,7 @@ To get started with `pg_vectorize`, you have two main options depending on wheth

- Rust and [pgrx toolchain](https://github.com/pgcentralfoundation/pgrx)
- Postgres extensions:
- [pg\_cron](https://github.com/citusdata/pg_cron) ^1.5
- [pg_cron](https://github.com/citusdata/pg_cron) ^1.5
- [pgmq](https://github.com/tembo-io/pgmq) ^1
- [pgvector](https://github.com/pgvector/pgvector) ^0.5.0

Expand Down Expand Up @@ -294,4 +292,12 @@ This will produce the following output (an array of numerical values representin

```text
{0.0028769304, -0.005826319, -0.0035932811, ...}
```
```

## Contributing

We welcome contributions from the community! If you're interested in contributing to `pg_vectorize`, please check out our [Contributing Guide](CONTRIBUTING.md). Your contributions help make this project better for everyone.

## Community Support

If you encounter any issues or have any questions, feel free to join our [Tembo Community Slack](https://join.slack.com/t/tembocommunity/shared_invite/zt-2u3ctm86u-XzcyL76T7o~7Mpnt6KUx1g). We're here to help!

0 comments on commit 06853aa

Please sign in to comment.