Skip to content

Commit

Permalink
chore(eng-docs): add docs for knowledge gaps (#3976)
Browse files Browse the repository at this point in the history
* chore(bakcend): add command to apply migrations

* chore(eng-docs): wip for enmbeddings

* chore(end): added in new CHANGELOG.md to create:package

* chore(eng-docs): update embeddings doc

* chore(eng-docs): md highlights

* chore(eng-docs): document NPM publish process

* chore(eng-docs): typo fixes

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
  • Loading branch information
krisantrobus and kodiakhq[bot] authored Jul 5, 2024
1 parent 5f11acd commit 0d311e0
Show file tree
Hide file tree
Showing 6 changed files with 63 additions and 2 deletions.
3 changes: 2 additions & 1 deletion apps/backend/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
"scripts": {
"start": "yarn supabase start",
"stop": "yarn supabase stop",
"generate:db-types": "yarn supabase gen types typescript --local > supabase/schema.gen.ts"
"generate:db-types": "yarn supabase gen types typescript --local > supabase/schema.gen.ts",
"db:reset": "yarn supabase db reset"
},
"devDependencies": {
"supabase": "^1.136.3"
Expand Down
2 changes: 1 addition & 1 deletion internal-docs/engineering/doc-site/docsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The same happens with Github discussions. The discussion is a page and is split

We can then perform a similarity search using the user input as the query.

Embeddings are generated using the the [embedding script](../../../packages/paste-website/scripts/search/).
Embeddings are generated using the the [embedding script](../../../packages/paste-website/scripts/search/). For a more detailed explanation of generating embeddings and running locally reference [generating-embeddings](./generating-embeddings.md).

## Production vs Previews (staging)

Expand Down
42 changes: 42 additions & 0 deletions internal-docs/engineering/doc-site/generating-embeddings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Generating Embeddings

Embeddings are what is used for our [Doc Search](./docsearch.md) functionality. OpenAI embeddings are a technique that uses machine learning and big data to convert unstructured data into structured vector spaces.

In our use case it converts plain text such as search criteria, mdx headers and GitHub discussion titles. It uses the model `text-embedding-ada-002` and outputs a similar structure to: `[-0.005330325,0.018767769,0.00020701668,-0.0011101937, ...]`

## Local Development

In order to develop locally you will need to startup a local instance of Supabase. The code for this is found in `/apps/backend`. Follow [this](../../../apps/backend/README.md) document to get setup.

After you have it setup you should be able to access Supabase at: http://127.0.0.1:54323. If you have no tables, you have not applied migrations. You can run ```yarn workspace @twilio-paste/backend db:reset``` from the root of the project.

**Note**: if you see an error for vector packages go into [20230928013336_initial_schema](../../../apps/backend/supabase/migrations/20230928013336_initial_schema.sql) and change the following **without committing**:

```sql
create extension if not exists "vector" with schema "public" version '0.5.0';
/* to */
create extension if not exists "vector" with schema "public";
```

### Environment Variables

In order to do any GH action or assistant development on the site you will need to set environment variables in ```packages/paste-website/.env```.

```
OPENAI_API_KEY="" // USE YOUR PERSONALTOKEN FOR LOCAL DEV
SUPABASE_URL="http://127.0.0.1:54321" // PRINTED TO CONSOLE AFTER STARTING CONTAINER
SUPABASE_KEY="" // PRINTED TO CONSOLE AFTER STARTING CONTAINER
GH_SERVICE_ACC_DISCUSSIONS_TOKEN="" // IN 1Password UNDER github.com ENTRY
```

### Generating Data

The best way to generate data is to run the nightly embed script `generate:embeddings`. This will update the tables: `page` and `page_section`.

## Table Structure

While there are other tables the only ones that concern the embeddings creation are:
- **page**: Stores the metadata of the entry. Key columns are the checksum (used to determine whether to update the record), path (either the url of the page or the github discussion), type (github-discussion or markdown)
- **page_sections**: contains the search embeddings. Key columns are content (plain text headings/titles), embedding (the vector spaces created from OpenAI), slug (toString of content or the discussion/answer in GitHub).

Both tables are related with page being the parent. They are joined by `page.id on page_section.page_id`.
13 changes: 13 additions & 0 deletions internal-docs/engineering/publishing-npm-package.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Publishing NPM Package

Paste core uses [changesets](https://github.com/changesets/changesets) to manage versions and changelogs. It has great support for mono-repos and multipackage repositories, ideal for `@twilio-paste/core`.

Changesets has a great [GitHub action](https://github.com/changesets/action) that will manage the release by creating a PR, periodically pulling changes from main. No code is published to NPM until this PR is merged, which is controlled by the team.

The PR will always be called `Version Packages` and lists all the changes that have been made since the last release. The description will also update with the entries in the changesets from the PRs merged to easily see what will be getting released.

There is a step in the GitHub Action [on_merge_to_main](../../.github/workflows/on_merge_to_main.yml) with the name `Create Pull Request or Publish to npm`. This defines commands to run from [package.json](../../package.json) for what operation.

- version: this removes all of the temporary changeset files which are generated during development. It aggregates them all to a changelog entry.
- publish: responsible for publishing the package to NPM.
- commit: "chore(release): version packages" the commit message on squash and merge.
5 changes: 5 additions & 0 deletions plopfile.js
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,11 @@ module.exports = function (plop) {
path: "packages/paste-core/{{component-type}}/{{kebabCase component-name}}/tsconfig.json",
templateFile: "tools/plop-templates/tsconfig.hbs",
},
{
type: "add",
path: "packages/paste-core/{{component-type}}/{{kebabCase component-name}}/CHANGELOG.md",
templateFile: "tools/plop-templates/CHANGELOG.hbs",
},
],
});
};
Empty file.

0 comments on commit 0d311e0

Please sign in to comment.