Skip to content

Commit

Permalink
AI Vector Search - Using Vector Embedding Models with Nodejs (#130)
Browse files Browse the repository at this point in the history
* created lab

* Updates

* Updated Spelling

* Deleted bad folder speling

* Moved to workshops

* added livelabs folder

* Fixed formatting

* Fixed Indentations

* Spelling Changes

* Fixed Image format problem

* Fixed Broken Images

* Spelling

* changes to lab

* Fixed Image scaling

* Deleted Spelling issue folder
  • Loading branch information
ZackaryRice authored May 14, 2024
1 parent 1fdff65 commit f5b9fc7
Show file tree
Hide file tree
Showing 101 changed files with 31 additions and 2,289 deletions.
14 changes: 4 additions & 10 deletions ai-vector-embedding-nodejs/introduction/introduction.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,6 @@ There are many different types of vector embedding models.
![Introduction Part 1 Image 4](images/intro104.png =60%x*)
Figure 4.

**OpenAI** makes embedding models and Large Language Models (LLMs) including ChatGPT.
* These embedding models are accessed via REST API calls or SDKs [eg Node.js and Python].
* You need to pay to use OpenAI embedding models.
* An API key: **OPENAI\_API\_KEY** is required to use the OpenAI embedding models.

[Find more information on OpenAI embedding models including how to setup and manage the OPENAI\_API\_KEY](https://openai.com/blog/new-embedding-models-and-api-updates)

**Cohere** makes embedding models and Large Language Models (LLMs) including embed-v3.
* These embedding models are accessed via REST API calls or SDKs [eg Node.js and Python].
Expand All @@ -60,14 +54,15 @@ There are many different types of vector embedding models.

**Hugging Face** is a repository for thousands of open source machine learning models.
* Hugging Face has many open source vector embedding models.
* These embedding models can be accessed via REST APIs, or local SDKs [Python].
* The **Transformers** and **Sentence Transformers** are very popular embedding models which use local Python libraries. Xenova is a different type of embedding model on Hugging Face.
* These embedding models can be accessed via REST APIs, or local SDKs [Nodejs].
* The **Transformers** and **Sentence Transformers** are very popular embedding models which use local Nodejs libraries. Xenova is a different type of embedding model on Hugging Face.

**Xenova** uses the ONNX Runtime for executing embedding models from Hugging Face. Xenova has converted popular Python **Transformer** and **Sentence Transformer** embedding models into [ONNX formatted](https://onnx.ai/) files.
* The ONNX fomatted files can be executed in the [ONNX Runtime](https://onnxruntime.ai/) via APIs.
* Open Neural Network Exchange (ONNX) is an open format for representing machine learning models.
* The Xenova [Transformer.js](https://huggingface.co/docs/transformers.js/en/index) library is a JavaScript wrapper to the JavaScript API to the ONNX Runtime which is made to look similar to the Python Transformer library.


![Introduction Part 1 Image 5](images/intro105.png " ")
Figure 5.

Expand Down Expand Up @@ -158,7 +153,6 @@ For the labs to work, we will be using the following pattern:


There are many different embedding models. At the time of this lab creation:
* OpenAI has *3* recent models used in the sample code
* Cohere has *4* recent models used in the sample code
* Hugging Face has many. The sample code uses *25* models

Expand Down Expand Up @@ -187,5 +181,5 @@ In this workshop you will have an opportunity to use the following vector embedd

## Acknowledgements
* **Author** - Doug Hood, Product Manager
* **Contributors** - Sean Stacey, Outbound Product Manager
* **Contributors** - Sean Stacey, Outbound Product Manager, Zackary Rice, Software Developer
* **Last Updated By/Date** - Sean Stacey, April 2024
41 changes: 27 additions & 14 deletions ai-vector-embedding-nodejs/labs/labs.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
# Using Cohere Vector Embedding Models
## Introduction

In this lab we will learn how to use the OCI generative ai Cohere embedding models with Oracle Vectors.

In this lab we will learn how to use the Oracle OCI generative AI Cohere embedding models with Oracle Vectors. To connect to the generative AI services we will install the oci-common and oci-sdk libraries. For the purpose of this lab these libraries have already been installed on your virtual instance.

------------
Estimated Time: 25 minutes
Expand All @@ -20,9 +21,10 @@ In this lab, you will see the following Vector operations using nodejs:

## Task 1: Vectorizing a table with Cohere embedding

1. The first step is to vectorize the contents of our table using an embedding model by Cohere. To do this, you will need to create a nodejs program to vectorize our phrases using the Cohere embedding model libraries that we just installed.
1. The first step is to vectorize the contents of our table using an embedding model by Cohere. To do this, you will need to create a nodejs program to vectorize our phrases using the Oracle OCI generative AI Cohere services.

The file *vectorizeTableCohere.js* is already on virtual instance. Below is the contents of the file.

The file *vectorizetableCohere.js* is already on your machine. Below is the contents of the file.

```
<copy>
Expand Down Expand Up @@ -95,9 +97,8 @@ In this lab, you will see the following Vector operations using nodejs:
console.log('Using embedding model ' + embeddingModel);
// creates authentication for oci
try {
const client = new sdk.generativeaiinference.GenerativeAiInferenceClient({
authenticationDetailsProvider: provider
});
Expand Down Expand Up @@ -169,7 +170,9 @@ In this lab, you will see the following Vector operations using nodejs:
</copy>
```
2. now you are ready to run the *vectorizetableCohere.js* nodejs program. This can be done by performing the following:
2. now you are ready to run the *vectorizeTableCohere.js* nodejs program. This can be done by performing the following:
```
<copy>
Expand All @@ -183,7 +186,8 @@ In this lab, you will see the following Vector operations using nodejs:
To summarize what we've just done, the *vectorizeTableCohere.js* program connects to the Oracle database, retrieves the text from the INFO column of the MY\_DATA table, and vectorizes the "factoid" for each of the 150 rows. We are then storing the vectorized data as a vector in the column called: V. You will also notice that we used the *embed-english-light-v3.0* embedding model for this operation. In other words an English speaking embedding model, and it's version 3.0 of the light model.
3. Before we move onto performing Similarity Searches using OCI generative ai Cohere embedding models, we should take a look in the the Oracle database to see the updates made to the *MY\_DATA* table.
3. Before we move onto performing Similarity Searches using Oracle OCI generative AI Cohere embedding models, we should take a look in the the Oracle database to see the updates made to the *MY\_DATA* table.
3.a. Connect to your Oracle database as the user: **vector** with password: **vector**
Expand Down Expand Up @@ -227,9 +231,11 @@ In this lab, you will see the following Vector operations using nodejs:
## Task 2: Perform Similarity Search using Cohere
1. In this lab we will see how to perform a similarity search with the OCI Cohere embedding models in nodejs.
So far we have vectorized the data in the *MY\_DATA* table using the OCI generative ai Cohere embedding model, we can now start performing Similarity Searches using the Vectors in our table. Even though the data in our table has been vectorized we will still need to connect to OCI generative ai to vectorize our search phrase with the same embedding model. The search phrase is entered on the fly, vectorized and then used to search against the vectors in the database. We will create a nodejs program to do this.
1. In this lab we will see how to perform a similarity search with the Oracle OCI generative AI Cohere embedding models in nodejs.
So far we have vectorized the data in the *MY\_DATA* table using the Oracle OCI generative AI Cohere embedding models, we can now start performing Similarity Searches using the Vectors in our table. Even though the data in our table has been vectorized we will still need to connect to Oracle OCI generative AI Cohere embedding models to vectorize our search phrase with the same embedding model. The search phrase is entered on the fly, vectorized and then used to search against the vectors in the database. We will create a nodejs program to do this.
The file *similaritysearchCohere.js* is already on your machine. Below is the contents of the file.
Expand Down Expand Up @@ -519,7 +525,9 @@ In this lab, you will see the following Vector operations using nodejs:
You should see something similar to:
![Lab 1 Task 3 Step 7](images/nodejscohere10.pngs=60%x*)
![Lab 1 Task 3 Step 7](images/nodejscohere10.png =60%x*)
The word "Bombay" does not appear in our data set, but the results related to Mumbai are correct because "Bombay" is the former name for "Mumbai", and as such there is a strong correlation between the two names for the same geographic location.
Expand Down Expand Up @@ -563,7 +571,8 @@ In this lab, you will see the following Vector operations using nodejs:
This is where we can choose the embedding model. As mentioned earlier, we have been using the *embed-english-light-v3.0* - both to vectorize our data when we populated the MY\_DATA table, as well as when we performed our similarity searches.
We can switch to the "non-light" version by commenting out the line where we with *"embed-english-light-v3.0"* and uncommenting the line for "embed-english-v3.0".
**We can switch to the "non-light" version by commenting out the line where we with *"embed-english-light-v3.0"* and uncommenting the line for "embed-english-v3.0".**
Your modified program should look like this:
Expand All @@ -586,7 +595,9 @@ In this lab, you will see the following Vector operations using nodejs:
This is because, as we mentioned earlier, you cannot perform similarity search operations using different embedding models. In other words, in order for us to use the *embedding-english-v3.0* model, we will need to go back and re-vectorize the data in the MY\_DATA table so that it too uses the same embedding model.
In order to make this change we will need to revisit the *vectorizetableCohere.js* program and make the same code change to comment out the line for assigning the *"embed-english-light-v3.0"* and uncommenting the line for *"embed-english-v3.0"*.
In order to make this change we will need to revisit the *vectorizeTableCohere.js* program and make the same code change to comment out the line for assigning the *"embed-english-light-v3.0"* and uncommenting the line for *"embed-english-v3.0"*.
The program should look like this:
Expand All @@ -597,7 +608,7 @@ In this lab, you will see the following Vector operations using nodejs:
```
<copy>
node vectorizetableCohere.js
node vectorizeTableCohere.js
</copy>
```
Expand Down Expand Up @@ -645,7 +656,9 @@ In this lab, you will see the following Vector operations using nodejs:
## Summary
In this lab you have seen how easy it is to use Cohere with Nodejs and Oracle Vectors and Similarity Search. You are ready to move onto the next lab.
In this lab you have seen how easy it is to use Cohere with Nodejs and Oracle Vectors and Similarity Search.
</if>
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit f5b9fc7

Please sign in to comment.