Skip to content

Commit

Permalink
* docs(Getting Started.ipynb): update installation instructions and q…
Browse files Browse the repository at this point in the history
…uick start guide

* docs(Getting Started.ipynb): update section headings and explanations
* docs(Getting Started.ipynb): update section heading and explanation
  • Loading branch information
NirantK committed Aug 22, 2023
1 parent 0e039e5 commit 22a8260
Showing 1 changed file with 12 additions and 14 deletions.
26 changes: 12 additions & 14 deletions docs/Getting Started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,35 +9,33 @@
"\n",
"Here you will learn how to use the fastembed package to embed your data into a vector space. The package is designed to be easy to use and fast. It is built on top of the [ONNX](https://onnx.ai/) standard, which allows for fast inference on a variety of hardware (called Runtimes in ONNX). \n",
"\n",
"## Installation\n",
"## Quick Start\n",
"\n",
"To get started, install the fastembed package with pip."
"The fastembed package is designed to be easy to use. The main class is the `Embedding` class. It takes a list of strings as input and returns a list of vectors as output. The `Embedding` class is initialized with a model file."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a3b5612a",
"id": "ada95c6a",
"metadata": {},
"outputs": [],
"source": [
"!pip install fastembed"
"!pip install fastembed --upgrade # Install fastembed"
]
},
{
"cell_type": "markdown",
"id": "6b5b503b",
"id": "ed81d725",
"metadata": {},
"source": [
"# Quick Start\n",
"\n",
"The fastembed package is designed to be easy to use. The main class is the `Embedding` class. It takes a list of strings as input and returns a list of vectors as output. The `Embedding` class is initialized with a model file."
"Make the necessary imports, initialize the `Embedding` class, and embed your data into vectors:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8349e72d",
"id": "b61c6552",
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -63,7 +61,7 @@
"id": "8c49ae50",
"metadata": {},
"source": [
"## Explanation of what is happening"
"## What is happening under the hood?"
]
},
{
Expand Down Expand Up @@ -93,10 +91,10 @@
"source": [
"Notice that we are using the FlagEmbedding -- which is state of the Art and beats OpenAI's Embedding by a large margin. \n",
"\n",
"## Prepare your Documents\n",
"### Prepare your Documents\n",
"You can define a list of documents that you'd like to encode. These can be sentences, paragraphs, or even entire documents. \n",
"\n",
"### Format of the List:\n",
"#### Format of the List:\n",
"1. List of Strings: Your documents must be in a list, and each document must be a string.\n",
"2. For Retrieval Tasks: If you're working with queries and passages, you can add special labels to them:\n",
"- **Queries**: Add \"query:\" at the beginning of each query string.\n",
Expand Down Expand Up @@ -125,7 +123,7 @@
"id": "1cb3cc87",
"metadata": {},
"source": [
"## Load the Embedding Model Weights\n",
"### Load the Embedding Model Weights\n",
"Next, initialize the Embedding class with the desired parameters. Here, \"BAAI/bge-small-en\" is the pre-trained model name, and max_length=512 is the maximum token length for each document."
]
},
Expand All @@ -145,7 +143,7 @@
"id": "5549d501",
"metadata": {},
"source": [
"## Embed your Documents\n",
"### Embed your Documents\n",
"\n",
"Use the encode method of the embedding model to transform the documents into a List of np.array. The method returns a generator, so we cast it to a list to get the embeddings."
]
Expand Down

0 comments on commit 22a8260

Please sign in to comment.